Analysis Engines & Computational Logic
This manual provides the definitive technical standard for the computational logic and algorithms behind PlugZero Analytics. It details how we process raw data into actionable intelligence.
🏗️ 1. Architectural Pipeline
PlugZero operates a high-performance Python analytics pipeline. When an analysis is requested, the system follows a strict execution sequence.
Step 1: Data Ingestion & Unification
The backend function get_data_for_project (in plugzero_api/analysis/processors.py) orchestrates the loading of data.
# Location: plugzero_api/analysis/processors.py
def get_data_for_project(project_id, file_id=None):
# If file_id is provided, only that specific file is loaded.
# Otherwise, it merges all project's raw data files into a single DataFrame.Step 2: Sanitization & JSON Compliance
Before any math is run, data is scrubbed using fuzzy matching to handle header typos. Numerical data is passed through a safe_float filter to prevent frontend rendering crashes due to NaN or Inf values.
📊 2. Statistical & Dimensional Engines
These deterministic engines rely on the Pandas library for high-speed calculation.
A. Basic Stats (calculate_basic_stats)
Analyzes a variable to determine its type and computes relevant metrics.
- Numeric: Mean, Median, Mode, Count, and Standard Deviation.
- Categorical: Frequency Count and Mode.
B. Segment Aggregation (calculate_aggregation)
Executes a groupby operation on a categorical dimension. Supports sum, mean, max, and min. Results are automatically sorted descending for optimal visualization.
C. Cross-Tabulation (calculate_cross_tab)
Calculates the mathematical frequency intersection of two categorical variables using pd.crosstab. Optimized for Heatmap rendering.
D. Hierarchical Drill-Downs (calculate_drill_down)
Recursively applies filters to a dataset to allow users to “click through” from high-level categories to raw data points. (Limited to 50 results for UI speed).
🔬 3. Machine Learning Engines (Advanced)
PlugZero implements Scikit-Learn models for multi-dimensional intelligence.
| Engine | Algorithm | Primary Responsibility |
|---|---|---|
| Outlier Detection | IsolationForest | Detects multi-dimensional anomalies with a 5% default contamination. |
| Sentiment Analysis | TextBlob (NLP) | Assigns Polarity scores (-1.0 to 1.0) and classifies as Positive/Negative/Neutral. |
| Topic Clustering | KMeans | Uses TfidfVectorizer to find the top 3 semantic clusters in unstructured text. |
| Key Driver Analysis | RandomForest | Determines feature importance relative to a “Target Variable” (e.g., NPS or Sales). |
🧠 4. Generative AI & Synthesis
Generative AI Integration (Gemini)
- Model:
gemini-2.0-flash. - Grounding: Computational results from the engines above are serialized into the LLM context window. This ensures AI “insights” are grounded in deterministic data, not hallucinations.
- System Role: Configured as “PlugZero Intelligence Agent,” an expert researcher.
Executive Synthesis
- SWOT Analysis: Synthesizes a 4-quadrant matrix by analyzing numeric metrics and sentiment clusters.
- Project Scorecard: A cross-module aggregator that computes the PlugZero Health Score:
Formula: (File Count * 5) + (Responses * 0.2) + ((Sentiment + 1) * 25) + (Rank Improvement * 0.5)
📄 5. Automated Reporting
PowerPoint Engine
Located in reporting.py, this engine translates Analysis Results into interactive .pptx slides using python-pptx.
- TITLE: Automatic project branding.
- CONTENT: Injects charts and tables derived from the Pandas pipeline.
- STATS: Auto-injects data from the Scorecard aggregator.
Technical Constraint: All background calculations are managed via Celery. If a calculation takes more than 10 seconds, the engine returns a task_id so the frontend can poll for completion.