CyberOrigen Data Flow Documentation
Overview
This document provides a comprehensive analysis of data flow through the CyberOrigen application, covering all major user interactions, security boundaries, and system integrations. The application is a multi-tenant SaaS platform for autonomous security scanning, compliance assessment, and GRC (Governance, Risk & Compliance) management.
System Architecture Overview
Frontend (React/Vite) ←→ Backend (FastAPI/Python) ←→ External Services
├─ Admin Portal (5173) ├─ API Gateway ├─ AI Providers
├─ Marketing Site (5175) ├─ Authentication │ ├─ AWS Bedrock
└─ Mobile Apps (Future) ├─ Business Logic Services │ ├─ OpenAI
├─ Database (PostgreSQL) │ ├─ Anthropic (Claude)
└─ Background Workers │ └─ Google Gemini
├─ Email (Resend)
├─ Billing (Stripe)
├─ Ticketing (Peppermint)
├─ Threat Intel (OTX)
├─ Malware Scanning (ClamAV)
└─ Monitoring/AlertsCore Data Entities
Primary Entities
- User: Platform administrators and customers
- Organization: Multi-tenant isolation boundary
- Scan: Security assessment jobs
- Vulnerability: Security findings
- Asset: Scanned targets (domains, IPs, etc.)
- Evidence: GRC compliance artifacts
- Control: Compliance framework controls
- Risk: Risk register entries
Security Entities
- QuarantinedFile: Malware-detected files
- AuditLog: Security audit trail
- APIKey: Customer BYOK (Bring Your Own Keys)
- PlatformAdminKey: Platform AI provider keys
1. Authentication and Authorization Flow
User Authentication Process
sequenceDiagram
participant U as User
participant F as Frontend
participant A as Auth API
participant DB as Database
participant JWT as JWT Service
U->>F: Login Request (username/password)
F->>A: POST /api/v1/auth/login
A->>DB: Validate Credentials
DB-->>A: User Data + Organization
A->>JWT: Generate JWT Token
JWT-->>A: Signed Token
A-->>F: JWT + User Context
F->>F: Store Token (localStorage)
F-->>U: Redirect to Dashboard
Note over A: Multi-tenancy enforced at DB level
Note over JWT: Token includes org_id for tenant isolationAuthorization & Multi-Tenancy
Security Boundaries:
- Platform Admin vs Customer: UserType enum distinguishes Bonum staff from customers
- Organization Isolation: All queries filtered by organization_id
- Role-Based Access: Owner/Admin/Member/Viewer permissions within organizations
- Quota Enforcement: Subscription tier limits enforced per organization
Data Isolation Mechanisms:
- Database row-level security via organization_id foreign keys
- UserContext object carries tenant information through all requests
- API endpoints automatically filter by current user's organization
- Platform admins bypass isolation for support purposes
JWT Token Structure
{
"sub": "username",
"exp": 1640995200,
"iat": 1640908800,
"organization_id": 123,
"user_type": "CUSTOMER",
"role": "ADMIN"
}2. Security Scan Initiation and Processing Flow
Scan Lifecycle State Machine
stateDiagram-v2
[*] --> IDLE
IDLE --> DISCOVERY: Start Scan
DISCOVERY --> ENUMERATION: Targets Found
ENUMERATION --> VULN_SCAN: Services Enumerated
VULN_SCAN --> CORRELATION: Vulnerabilities Found
CORRELATION --> THREAT_INTEL: Duplicates Removed
THREAT_INTEL --> EXPLOIT_CHECK: Intel Gathered
EXPLOIT_CHECK --> PRIORITIZATION: Exploits Checked
PRIORITIZATION --> REMEDIATION: Risks Prioritized
REMEDIATION --> VERIFICATION: Fixes Generated
VERIFICATION --> REPORTING: Fixes Verified
REPORTING --> COMPLETED: Report Generated
DISCOVERY --> FAILED: Error
ENUMERATION --> FAILED: Error
VULN_SCAN --> FAILED: Error
CORRELATION --> FAILED: Error
THREAT_INTEL --> FAILED: Error
EXPLOIT_CHECK --> FAILED: Error
PRIORITIZATION --> FAILED: Error
REMEDIATION --> FAILED: Error
VERIFICATION --> FAILED: Error
REPORTING --> FAILED: ErrorScan Processing Pipeline
sequenceDiagram
participant U as User
participant F as Frontend
participant S as Scan API
participant SW as Scan Worker
participant AI as AI Service
participant DB as Database
participant EXT as External Tools
U->>F: Create New Scan
F->>S: POST /api/v1/scans
S->>DB: Create Scan Record (PENDING)
S->>SW: Queue Background Job
S-->>F: Scan Created Response
F-->>U: Show Scan Progress
SW->>SW: Start Processing (RUNNING)
SW->>DB: Update Status: DISCOVERY
SW->>EXT: Run Nuclei/Nmap Scans
EXT-->>SW: Raw Scan Results
SW->>DB: Update Status: VULN_SCAN
SW->>AI: Analyze Vulnerabilities
AI-->>SW: Enriched Data + Remediation
SW->>DB: Store Vulnerabilities
SW->>DB: Update Status: COMPLETED
SW->>S: Trigger Notifications
Note over SW: Each phase updates progress %
Note over AI: PII redaction + prompt injection protectionBackground Worker Architecture
The scan worker runs as a separate background process:
- Threat Scanner Service: Coordinates external security tools
- State Machine: Manages scan phase transitions
- AI Orchestrator: Enriches findings with threat intelligence
- Notification Dispatcher: Sends real-time updates
External Security Tools:
- Nuclei: Vulnerability scanner for web applications
- Nmap: Network discovery and port scanning
- ClamAV: Malware detection for uploaded files
- OTX (Open Threat Exchange): Threat intelligence enrichment
3. GRC Compliance Assessment and Evidence Collection Workflow
Compliance Framework Mapping
graph TD
V[Vulnerability] --> CM[Compliance Mapper]
CM --> SOC2[SOC 2 Controls]
CM --> PCI[PCI-DSS Requirements]
CM --> ISO[ISO 27001 Controls]
CM --> HIPAA[HIPAA Safeguards]
CM --> GDPR[GDPR Articles]
SOC2 --> CC71[CC7.1 - System Boundaries]
PCI --> REQ6[Req 6.5.1 - Injection Flaws]
ISO --> A811[A.8.11 - Data Protection]
CC71 --> EV1[Evidence Collection]
REQ6 --> EV1
A811 --> EV1
EV1 --> AUTO[Auto-Evidence Service]
AUTO --> SCAN[Scan Reports]
AUTO --> CONFIG[Config Snapshots]
AUTO --> POLICY[Policy Documents]Evidence Collection Workflow
sequenceDiagram
participant A as Auditor
participant GRC as GRC Dashboard
participant AE as Auto-Evidence
participant FS as File Scanner
participant Q as Quarantine
participant DB as Database
A->>GRC: Request Evidence for Control
GRC->>AE: Trigger Auto-Collection
AE->>DB: Query Related Vulnerabilities
AE->>AE: Generate Evidence Report
AE->>FS: Submit for Malware Scan
alt File is Clean
FS-->>AE: Scan Result: CLEAN
AE->>DB: Store Evidence
AE-->>GRC: Evidence Ready
else File is Malicious
FS->>Q: Quarantine File
FS-->>AE: Scan Result: MALICIOUS
AE->>DB: Log Security Alert
AE-->>GRC: Evidence Collection Failed
end
Note over FS: ClamAV + OTX threat intel
Note over Q: Admin review requiredAudit Engagement Process
Phases:
- Planning: Define scope, controls, and evidence requirements
- Fieldwork: Collect evidence, perform testing, document findings
- Review: Analyze evidence, validate controls, identify exceptions
- Reporting: Generate audit report with findings and recommendations
- Completed: Final report delivery and follow-up planning
Evidence Types:
- Policy: Written procedures and policies
- Procedure: Step-by-step implementation guides
- Screenshot: Visual evidence of controls
- Configuration: System configuration exports
- Log: Audit trails and access logs
- Scan Report: Automated vulnerability assessments
- Attestation: Management assertions and certifications
4. AI-Powered Analysis Pipeline and Chat Integration
AI Service Architecture
graph TD
UI[User Interface] --> CHAT[Chat API]
UI --> SCAN[Scan Analysis]
CHAT --> AIS[AI Service]
SCAN --> AIS
AIS --> PII[PII Redaction Layer]
PII --> GUARD[Prompt Injection Guards]
GUARD --> ROUTE[Provider Router]
ROUTE --> BEDROCK[AWS Bedrock]
ROUTE --> OPENAI[OpenAI GPT]
ROUTE --> CLAUDE[Anthropic Claude]
ROUTE --> GEMINI[Google Gemini]
BEDROCK --> ZDR[Zero Data Retention]
OPENAI --> ZDR
CLAUDE --> ZDR
GEMINI --> ZDR
ZDR --> SANITIZE[Output Sanitization]
SANITIZE --> RAG[Knowledge Base]
RAG --> RESPONSE[Final Response]Chat Interaction Flow
sequenceDiagram
participant U as User
participant C as Chat Interface
participant AI as AI Service
participant PII as PII Redactor
participant P as AI Provider
participant RAG as Knowledge Base
participant DB as Database
U->>C: Ask Security Question
C->>AI: Process Chat Message
AI->>PII: Redact Sensitive Data
PII-->>AI: Cleaned Input
AI->>AI: Check Prompt Injection
AI->>RAG: Query Knowledge Base
RAG-->>AI: Relevant Context
AI->>P: Send to AI Provider (Bedrock/OpenAI/Claude/Gemini)
P-->>AI: AI Response
AI->>PII: Redact Response
PII-->>AI: Clean Response
AI->>DB: Log Interaction (Audit Trail)
AI-->>C: Final Response
C-->>U: Display Answer
Note over PII: Removes emails, SSNs, API keys, etc.
Note over AI: Zero data retention complianceAI Provider Selection & Configuration
Provider Priority (Configurable per Organization):
- AWS Bedrock (Highest Security): HIPAA compliance, data sovereignty
- Anthropic Claude: Strong reasoning, safety features
- OpenAI GPT: General-purpose, cost-effective
- Google Gemini: Multimodal capabilities
Zero Data Retention (ZDR) Compliance:
- All providers configured to prevent training on customer data
- Ephemeral processing with minimal retention periods
- Audit trails for all AI interactions
- Customer BYOK (Bring Your Own Keys) support
Knowledge Base (RAG) System
Document Types:
- Static: Policies, procedures, architecture docs (manually created)
- Dynamic: Scan summaries, asset profiles, trend insights (auto-generated)
RAG Pipeline:
- Document ingestion and chunking
- Embedding generation (AI-powered)
- Vector storage and indexing
- Semantic similarity search
- Context injection into AI prompts
5. Real-Time Updates via WebSocket and Notification Flow
Real-Time Communication Architecture
graph TD
SCAN[Scan Worker] --> EVENTS[Event Publisher]
AI[AI Analysis] --> EVENTS
GRC[GRC Updates] --> EVENTS
EVENTS --> WS[WebSocket Manager]
EVENTS --> EMAIL[Email Service]
EVENTS --> SLACK[Slack Integration]
WS --> CLIENTS[Connected Clients]
EMAIL --> RESEND[Resend API]
SLACK --> WEBHOOK[Slack Webhooks]
CLIENTS --> ADMIN[Admin Portal]
CLIENTS --> MOBILE[Mobile App]
RESEND --> SMTP[Email Delivery]
WEBHOOK --> CHANNELS[Slack Channels]Notification Event Types
{
"scan_started": {
"scan_id": "scan_123",
"target": "example.com",
"organization_id": 123,
"timestamp": "2024-01-01T10:00:00Z"
},
"scan_progress": {
"scan_id": "scan_123",
"phase": "VULN_SCAN",
"progress": 45,
"organization_id": 123
},
"vulnerability_found": {
"vuln_id": "vuln_456",
"severity": "HIGH",
"title": "SQL Injection Vulnerability",
"scan_id": "scan_123",
"organization_id": 123
},
"evidence_collected": {
"evidence_id": "ev_789",
"control_id": "CC7.1",
"framework": "SOC2",
"organization_id": 123
}
}Email Notification Flow
sequenceDiagram
participant S as System Event
participant N as Notification Dispatcher
participant E as Email Service
participant R as Resend API
participant U as User
S->>N: Critical Vulnerability Found
N->>N: Check User Preferences
N->>E: Queue Email Notification
E->>R: Send via Resend
R-->>U: Email Delivered
R-->>E: Delivery Confirmation
E->>N: Log Delivery Status
Note over N: Respects user notification preferences
Note over R: Template-based emails with branding6. Multi-Tenant Data Isolation
Database-Level Isolation
Row-Level Security (RLS):
-- Example: Organizations can only see their own scans
CREATE POLICY org_isolation_scans ON scans
FOR ALL TO app_role
USING (organization_id = current_setting('app.current_org_id')::int);
-- Example: Platform admins bypass all restrictions
CREATE POLICY admin_bypass_scans ON scans
FOR ALL TO platform_admin_role
USING (true);Foreign Key Enforcement:
- All tenant data tables have
organization_idforeign key - Database constraints prevent cross-tenant data access
- Queries automatically filtered by organization context
Application-Level Isolation
UserContext Enforcement:
# Every API endpoint enforces tenant isolation
@router.get("/vulnerabilities")
async def get_vulnerabilities(
current_user: UserContext = Depends(get_current_user),
db: Session = Depends(get_db)
):
# Query automatically filtered by organization_id
vulnerabilities = db.query(Vulnerability).filter(
Vulnerability.organization_id == current_user.organization_id
).all()Subscription Tier Limits:
- Scan quotas enforced per organization
- User limits by subscription tier
- Feature access controlled by tier configuration
7. Security Boundaries and Data Protection
Encryption at Rest
- Database: Sensitive fields encrypted using EncryptedString/EncryptedJSON
- File Storage: Evidence attachments encrypted in S3
- Credentials: API keys encrypted in database
Encryption in Transit
- API Communication: HTTPS with TLS 1.2+
- Database Connections: SSL/TLS encrypted
- AI Provider APIs: Provider-native encryption
Input Validation and Sanitization
- API Validation: Pydantic models for all inputs
- SQL Injection Prevention: Parameterized queries only
- XSS Protection: HTML escaping and CSP headers
- File Upload Scanning: ClamAV malware detection
Audit Logging
- User Actions: All API calls logged with user context
- Data Changes: Change tracking for sensitive entities
- Security Events: Failed logins, permission escalations
- AI Interactions: All AI queries and responses logged
8. Caching and Performance Optimization
Caching Layers
graph TD
USER[User Request] --> CDN[CDN Cache]
CDN --> LB[Load Balancer]
LB --> APP[Application Cache]
APP --> DB[Database]
APP --> REDIS[Redis Cache]
APP --> MEMORY[In-Memory Cache]
REDIS --> SESSIONS[User Sessions]
REDIS --> SCAN[Scan Results]
REDIS --> AI[AI Responses]
MEMORY --> CONFIG[Configuration]
MEMORY --> STATIC[Static Data]Performance Optimizations
Database:
- Connection pooling (20 base, 30 overflow)
- Query optimization with indexes
- Read replicas for reporting queries
- Automated vacuum and reindex
API Layer:
- Response compression (gzip)
- Pagination for large datasets
- Async request processing
- Background job queuing
Frontend:
- Code splitting and lazy loading
- Asset compression and minification
- Service worker caching
- Progressive Web App features
9. External Service Integrations
AI Providers
- AWS Bedrock: Primary for sensitive data (HIPAA/SOC2 compliant)
- OpenAI: Cost-effective general analysis
- Anthropic Claude: Advanced reasoning and safety
- Google Gemini: Multimodal analysis capabilities
Communication Services
- Resend: Transactional email delivery
- Slack: Team notifications and alerts
- Peppermint: Helpdesk and ticketing integration
Security Services
- ClamAV: File malware scanning
- AlienVault OTX: Threat intelligence feeds
- Nuclei: Vulnerability scanning engine
Business Services
- Stripe: Subscription billing and payments
- AWS S3: File storage and backups
- Railway/Digital Ocean: Infrastructure hosting
10. Error Handling and Monitoring
Error Handling Strategy
- Graceful Degradation: System continues functioning with reduced capability
- Circuit Breakers: Prevent cascade failures in external integrations
- Retry Logic: Exponential backoff for transient failures
- User-Friendly Messages: Technical errors translated to user-friendly language
Monitoring and Observability
- Prometheus Metrics: API performance and business metrics
- Structured Logging: JSON logs with correlation IDs
- Health Checks: Endpoint monitoring for all services
- Alert Management: Automated alerts for critical issues
Disaster Recovery
- Database Backups: Automated daily backups with point-in-time recovery
- File Storage: Cross-region replication for evidence attachments
- Infrastructure: Blue-green deployment capability
- Data Export: Organization data export for compliance
Security Considerations Summary
- Multi-Tenancy: Strict organization isolation at database and application layers
- Authentication: JWT-based with role-based access control
- Data Protection: Encryption at rest and in transit
- Input Validation: Comprehensive sanitization and validation
- AI Security: PII redaction, prompt injection protection, zero data retention
- Audit Trail: Complete logging of all user actions and system events
- Compliance: SOC2, ISO27001, HIPAA, GDPR alignment
- Incident Response: Automated threat detection and quarantine
This data flow architecture ensures secure, scalable, and compliant operation of the CyberOrigen platform while maintaining performance and user experience.