Skip to Content
User ManualsDatabase Standards

Database Architecture & Data Strategy

This manual defines the data structure, relationships, and management protocols for the PlugZero Intelligence platform.


🏗️ 1. Architectural Philosophy: The Hub-and-Spoke Model

PlugZero uses a Hub-and-Spoke database architecture. The central hub is the Project model. Every uploaded file, survey response, scraped webpage, or AI analysis result MUST be associated with a Project.

The Core Entity Relationship (ER) Logic:

  • User (Accounts): Extended from AbstractUser. Owns or is a member of a Team.
  • Team (Accounts): Grouping mechanism for Projects. Projects can be team-wide.
  • Project (Data Ingestion): The container for all intelligence.
  • RawDataFile / Survey / ScrapeTarget: These are “Ingress Spokes” that feed raw data into the Project.
  • AnalysisResult / Report / ResearchInsight: These are “Egress Spokes” that store processed intelligence.

📂 2. Key Data Domains

A. The Accounts Engine (accounts/models.py)

PlugZero implements a strictly controlled Role-Based Access Control (RBAC) system.

  • User: Custom model using email as the unique identifier.
  • TeamMembership: A “Through” model managing roles: OWNER, ADMIN, MEMBER, VIEWER.
  • ActivityLog: An append-only audit trail logging every action for compliance.

B. The Ingestion Engine (data_ingestion/models.py)

  • RawDataFile: Stores file metadata and a JSON columns_metadata cache.
  • ScraperJob: Logs individual scraping runs. Linked to ScrapedPage for raw text storage.
  • Survey: Handles complex logic, quotas (SurveyQuota), and responses. Uses UUIDs for public URLs.

C. The Intelligence Engine (analysis/models.py)

  • AnalysisResult: Uses JSONField to store Pandas/Scikit-Learn outputs for fast rendering.
  • ResearchInsight: Stores atomic findings. Includes an embedding vector field for semantic search.

💾 3. Storage & Integrity Strategy

Data Type Standards

  1. UUIDs: Used for all public-facing identifiers (Surveys, Reports).
  2. JSONField: Used for variable schemas to maintain flexibility without frequent migrations.
  3. DateTime: All records use auto_now_add for auditing.

File vs. Database Storage

  • Database: Stores metadata, settings, and small text results.
  • Filesystem (media/ folder): Stores the actual raw CSVs, Excel files, and PDF uploads.

CRITICAL: The database stores the path to the file. Deleting a record in the DB does not automatically delete the file on the disk. Handle file deletion explicitly in the application logic.


🔧 4. Maintenance & Migrations

Standard Operating Procedure (SOP)

Before applying ANY database changes:

  1. Verify the environment: python manage.py check.
  2. Generate migrations: python manage.py makemigrations.
  3. Apply migrations: python manage.py migrate.

Data Retention

Background tasks (Celery) automatically purge old ActivityLog entries or temporary caches based on the data_retention_days setting.


Last updated on