Skip to Content
User ManualsAnalysis Engines

Analysis Engines & Computational Logic

This manual provides the definitive technical standard for the computational logic and algorithms behind PlugZero Analytics. It details how we process raw data into actionable intelligence.


🏗️ 1. Architectural Pipeline

PlugZero operates a high-performance Python analytics pipeline. When an analysis is requested, the system follows a strict execution sequence.

Step 1: Data Ingestion & Unification

The backend function get_data_for_project (in plugzero_api/analysis/processors.py) orchestrates the loading of data.

# Location: plugzero_api/analysis/processors.py def get_data_for_project(project_id, file_id=None): # If file_id is provided, only that specific file is loaded. # Otherwise, it merges all project's raw data files into a single DataFrame.

Step 2: Sanitization & JSON Compliance

Before any math is run, data is scrubbed using fuzzy matching to handle header typos. Numerical data is passed through a safe_float filter to prevent frontend rendering crashes due to NaN or Inf values.


📊 2. Statistical & Dimensional Engines

These deterministic engines rely on the Pandas library for high-speed calculation.

A. Basic Stats (calculate_basic_stats)

Analyzes a variable to determine its type and computes relevant metrics.

  • Numeric: Mean, Median, Mode, Count, and Standard Deviation.
  • Categorical: Frequency Count and Mode.

B. Segment Aggregation (calculate_aggregation)

Executes a groupby operation on a categorical dimension. Supports sum, mean, max, and min. Results are automatically sorted descending for optimal visualization.

C. Cross-Tabulation (calculate_cross_tab)

Calculates the mathematical frequency intersection of two categorical variables using pd.crosstab. Optimized for Heatmap rendering.

D. Hierarchical Drill-Downs (calculate_drill_down)

Recursively applies filters to a dataset to allow users to “click through” from high-level categories to raw data points. (Limited to 50 results for UI speed).


🔬 3. Machine Learning Engines (Advanced)

PlugZero implements Scikit-Learn models for multi-dimensional intelligence.

EngineAlgorithmPrimary Responsibility
Outlier DetectionIsolationForestDetects multi-dimensional anomalies with a 5% default contamination.
Sentiment AnalysisTextBlob (NLP)Assigns Polarity scores (-1.0 to 1.0) and classifies as Positive/Negative/Neutral.
Topic ClusteringKMeansUses TfidfVectorizer to find the top 3 semantic clusters in unstructured text.
Key Driver AnalysisRandomForestDetermines feature importance relative to a “Target Variable” (e.g., NPS or Sales).

🧠 4. Generative AI & Synthesis

Generative AI Integration (Gemini)

  • Model: gemini-2.0-flash.
  • Grounding: Computational results from the engines above are serialized into the LLM context window. This ensures AI “insights” are grounded in deterministic data, not hallucinations.
  • System Role: Configured as “PlugZero Intelligence Agent,” an expert researcher.

Executive Synthesis

  • SWOT Analysis: Synthesizes a 4-quadrant matrix by analyzing numeric metrics and sentiment clusters.
  • Project Scorecard: A cross-module aggregator that computes the PlugZero Health Score:
    Formula: (File Count * 5) + (Responses * 0.2) + ((Sentiment + 1) * 25) + (Rank Improvement * 0.5)

📄 5. Automated Reporting

PowerPoint Engine

Located in reporting.py, this engine translates Analysis Results into interactive .pptx slides using python-pptx.

  • TITLE: Automatic project branding.
  • CONTENT: Injects charts and tables derived from the Pandas pipeline.
  • STATS: Auto-injects data from the Scorecard aggregator.

Technical Constraint: All background calculations are managed via Celery. If a calculation takes more than 10 seconds, the engine returns a task_id so the frontend can poll for completion.


Last updated on