Ingestion & Inflow Resource Map
This map outlines the services responsible for the secure collection and “Semantic Cleanup” of external research data.
📂 The Asset Gateway
The File Ingestor (RawDataFileViewSet)
Handles the physical and logical ingestion of research artifacts.
create: Triggers the “Deconstruction” of a file. It doesn’t just store the blob; it initiates the background worker that scans for headers and data types.preview: A stateless window into the data. It returns a “Micro-Sample” (the first 50 rows), allowing the frontend to render the initial table view without loading the entire multi-megabyte dataset.gsheets_import: A bridge to the Google ecosystem. It treats a public URL as a “Live Stream,” pulling data into our internal normalization pipeline.
The Survey Architect (SurveyViewSet)
A complex orchestrator for interactive data collection.
generate_questions: Uses the “Generative Scaffolding” logic to turn a project intent into a valid Survey schema. It returns a Transient Draft which the user must confirm before it is committed to the database.duplicate: Performs a “Recursive Clone.” It copies not just the survey, but all nested logic gates, distribution settings, and question branches to a new UUID.distribute: The outbound “Engagement Hub.” It handles the generation of unique, trackable links for participants to ensure data provenance.
🛠️ The Extraction Engine (file_processor.py)
The “Low-Level” machinery that powers the ingestors.
PDF/DOCX Extractor: A stateless utility that strips formatting to extract core textual data for our Intelligence Engine’s NLP pipeline.Semantic Analyzer: The logic that guesses “Who is this column?”. It uses heuristic pattern matching to identify Email addresses, Dates, and Sentiment-heavy text fields.
📋 Distribution Models
- SurveyResponse: The atomic unit of collection. Every response is “Immutable” once submitted—it can be archived but never modified, ensuring the integrity of the research findings.
Last updated on