Database Architecture & Data Strategy

This manual defines the data structure, relationships, and management protocols for the PlugZero Intelligence platform.

🏗️ 1. Architectural Philosophy: The Hub-and-Spoke Model

PlugZero uses a Hub-and-Spoke database architecture. The central hub is the Project model. Every uploaded file, survey response, scraped webpage, or AI analysis result MUST be associated with a Project.

The Core Entity Relationship (ER) Logic:

User (Accounts): Extended from AbstractUser. Owns or is a member of a Team.
Team (Accounts): Grouping mechanism for Projects. Projects can be team-wide.
Project (Data Ingestion): The container for all intelligence.
RawDataFile / Survey / ScrapeTarget: These are “Ingress Spokes” that feed raw data into the Project.
AnalysisResult / Report / ResearchInsight: These are “Egress Spokes” that store processed intelligence.

📂 2. Key Data Domains

A. The Accounts Engine (`accounts/models.py`)

PlugZero implements a strictly controlled Role-Based Access Control (RBAC) system.

User: Custom model using email as the unique identifier.
TeamMembership: A “Through” model managing roles: OWNER, ADMIN, MEMBER, VIEWER.
ActivityLog: An append-only audit trail logging every action for compliance.

B. The Ingestion Engine (`data_ingestion/models.py`)

RawDataFile: Stores file metadata and a JSON columns_metadata cache.
ScraperJob: Logs individual scraping runs. Linked to ScrapedPage for raw text storage.
Survey: Handles complex logic, quotas (SurveyQuota), and responses. Uses UUIDs for public URLs.

C. The Intelligence Engine (`analysis/models.py`)

AnalysisResult: Uses JSONField to store Pandas/Scikit-Learn outputs for fast rendering.
ResearchInsight: Stores atomic findings. Includes an embedding vector field for semantic search.

💾 3. Storage & Integrity Strategy

Data Type Standards

UUIDs: Used for all public-facing identifiers (Surveys, Reports).
JSONField: Used for variable schemas to maintain flexibility without frequent migrations.
DateTime: All records use auto_now_add for auditing.

File vs. Database Storage

Database: Stores metadata, settings, and small text results.
Filesystem (media/ folder): Stores the actual raw CSVs, Excel files, and PDF uploads.

CRITICAL: The database stores the path to the file. Deleting a record in the DB does not automatically delete the file on the disk. Handle file deletion explicitly in the application logic.

🔧 4. Maintenance & Migrations

Standard Operating Procedure (SOP)

Before applying ANY database changes:

Verify the environment: python manage.py check.
Generate migrations: python manage.py makemigrations.
Apply migrations: python manage.py migrate.

Data Retention

Background tasks (Celery) automatically purge old ActivityLog entries or temporary caches based on the data_retention_days setting.