Skip to Content

Ingestion & Inflow Resource Map

This map outlines the services responsible for the secure collection and “Semantic Cleanup” of external research data.


📂 The Asset Gateway

The File Ingestor (RawDataFileViewSet)

Handles the physical and logical ingestion of research artifacts.

  • create: Triggers the “Deconstruction” of a file. It doesn’t just store the blob; it initiates the background worker that scans for headers and data types.
  • preview: A stateless window into the data. It returns a “Micro-Sample” (the first 50 rows), allowing the frontend to render the initial table view without loading the entire multi-megabyte dataset.
  • gsheets_import: A bridge to the Google ecosystem. It treats a public URL as a “Live Stream,” pulling data into our internal normalization pipeline.

The Survey Architect (SurveyViewSet)

A complex orchestrator for interactive data collection.

  • generate_questions: Uses the “Generative Scaffolding” logic to turn a project intent into a valid Survey schema. It returns a Transient Draft which the user must confirm before it is committed to the database.
  • duplicate: Performs a “Recursive Clone.” It copies not just the survey, but all nested logic gates, distribution settings, and question branches to a new UUID.
  • distribute: The outbound “Engagement Hub.” It handles the generation of unique, trackable links for participants to ensure data provenance.

🛠️ The Extraction Engine (file_processor.py)

The “Low-Level” machinery that powers the ingestors.

  • PDF/DOCX Extractor: A stateless utility that strips formatting to extract core textual data for our Intelligence Engine’s NLP pipeline.
  • Semantic Analyzer: The logic that guesses “Who is this column?”. It uses heuristic pattern matching to identify Email addresses, Dates, and Sentiment-heavy text fields.

📋 Distribution Models

  • SurveyResponse: The atomic unit of collection. Every response is “Immutable” once submitted—it can be archived but never modified, ensuring the integrity of the research findings.

Last updated on