Analysis Engines & Computational Logic

This manual provides the definitive technical standard for the computational logic and algorithms behind PlugZero Analytics. It details how we process raw data into actionable intelligence.

🏗️ 1. Architectural Pipeline

PlugZero operates a high-performance Python analytics pipeline. When an analysis is requested, the system follows a strict execution sequence.

Step 1: Data Ingestion & Unification

The backend function get_data_for_project (in plugzero_api/analysis/processors.py) orchestrates the loading of data.


# Location: plugzero_api/analysis/processors.py
def get_data_for_project(project_id, file_id=None):
    # If file_id is provided, only that specific file is loaded.
    # Otherwise, it merges all project's raw data files into a single DataFrame.

Step 2: Sanitization & JSON Compliance

Before any math is run, data is scrubbed using fuzzy matching to handle header typos. Numerical data is passed through a safe_float filter to prevent frontend rendering crashes due to NaN or Inf values.

📊 2. Statistical & Dimensional Engines

These deterministic engines rely on the Pandas library for high-speed calculation.

A. Basic Stats (`calculate_basic_stats`)

Analyzes a variable to determine its type and computes relevant metrics.

Numeric: Mean, Median, Mode, Count, and Standard Deviation.
Categorical: Frequency Count and Mode.

B. Segment Aggregation (`calculate_aggregation`)

Executes a groupby operation on a categorical dimension. Supports sum, mean, max, and min. Results are automatically sorted descending for optimal visualization.

C. Cross-Tabulation (`calculate_cross_tab`)

Calculates the mathematical frequency intersection of two categorical variables using pd.crosstab. Optimized for Heatmap rendering.

D. Hierarchical Drill-Downs (`calculate_drill_down`)

Recursively applies filters to a dataset to allow users to “click through” from high-level categories to raw data points. (Limited to 50 results for UI speed).

🔬 3. Machine Learning Engines (Advanced)

PlugZero implements Scikit-Learn models for multi-dimensional intelligence.

Engine	Algorithm	Primary Responsibility
Outlier Detection	`IsolationForest`	Detects multi-dimensional anomalies with a 5% default contamination.
Sentiment Analysis	`TextBlob` (NLP)	Assigns Polarity scores (-1.0 to 1.0) and classifies as Positive/Negative/Neutral.
Topic Clustering	`KMeans`	Uses `TfidfVectorizer` to find the top 3 semantic clusters in unstructured text.
Key Driver Analysis	`RandomForest`	Determines feature importance relative to a “Target Variable” (e.g., NPS or Sales).

🧠 4. Generative AI & Synthesis

Generative AI Integration (Gemini)

Model: gemini-2.0-flash.
Grounding: Computational results from the engines above are serialized into the LLM context window. This ensures AI “insights” are grounded in deterministic data, not hallucinations.
System Role: Configured as “PlugZero Intelligence Agent,” an expert researcher.

Executive Synthesis

SWOT Analysis: Synthesizes a 4-quadrant matrix by analyzing numeric metrics and sentiment clusters.
Project Scorecard: A cross-module aggregator that computes the PlugZero Health Score:
Formula: (File Count * 5) + (Responses * 0.2) + ((Sentiment + 1) * 25) + (Rank Improvement * 0.5)

📄 5. Automated Reporting

PowerPoint Engine

Located in reporting.py, this engine translates Analysis Results into interactive .pptx slides using python-pptx.

TITLE: Automatic project branding.
CONTENT: Injects charts and tables derived from the Pandas pipeline.
STATS: Auto-injects data from the Scorecard aggregator.

Technical Constraint: All background calculations are managed via Celery. If a calculation takes more than 10 seconds, the engine returns a task_id so the frontend can poll for completion.