Files
2026-01-09 18:27:46 +02:00

394 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# OpsMemory Architecture
> **Technical deep-dive into the Decision Ledger**
## Overview
OpsMemory provides a structured approach to organizational learning from security decisions. It captures the complete lifecycle of a decision - from the situation context through the action taken to the eventual outcome.
## Design Principles
### 1. Determinism First
All operations produce deterministic, reproducible results:
- Similarity vectors are computed from stable inputs
- Confidence scores use fixed formulas
- No randomness in suggestion ranking
### 2. Multi-Tenant Isolation
Every operation is scoped to a tenant:
- Records cannot be accessed across tenants
- Similarity search is tenant-isolated
- Statistics are per-tenant
### 3. Fire-and-Forget Integration
Decision recording is async and non-blocking:
- UI decisions complete immediately
- OpsMemory recording happens in background
- Failures don't affect the primary flow
### 4. Offline Capable
All features work without network access:
- Local PostgreSQL storage
- No external API dependencies
- Self-contained similarity computation
## Component Architecture
```
┌────────────────────────────────────────────────────────────────────┐
│ WebService Layer │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ OpsMemoryEndpoints │ │
│ │ POST /decisions GET /decisions GET /suggestions GET /stats│ │
│ └──────────────────────────────────────────────────────────────┘ │
└────────────────────────────────┬───────────────────────────────────┘
┌────────────────────────────────┼───────────────────────────────────┐
│ Service Layer │
│ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────────┐ │
│ │ PlaybookSuggest │ │ OutcomeTracking │ │ SimilarityVector │ │
│ │ Service │ │ Service │ │ Generator │ │
│ └────────┬────────┘ └────────┬────────┘ └─────────┬──────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ IOpsMemoryStore │ │
│ └──────────────────────────────────────────────────────────────┘ │
└────────────────────────────────┬───────────────────────────────────┘
┌────────────────────────────────┼───────────────────────────────────┐
│ Storage Layer │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PostgresOpsMemoryStore │ │
│ │ - Decision CRUD │ │
│ │ - Outcome updates │ │
│ │ - Similarity search (array-based cosine) │ │
│ │ - Query with pagination │ │
│ │ - Statistics aggregation │ │
│ └──────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────┘
```
## Data Model
### OpsMemoryRecord
The core aggregate containing all decision information:
```csharp
public sealed record OpsMemoryRecord
{
public required string MemoryId { get; init; }
public required string TenantId { get; init; }
public required DateTimeOffset RecordedAt { get; init; }
public required SituationContext Situation { get; init; }
public required DecisionRecord Decision { get; init; }
public OutcomeRecord? Outcome { get; init; }
public ImmutableArray<float> SimilarityVector { get; init; }
}
```
### SituationContext
Captures the security context at decision time:
```csharp
public sealed record SituationContext
{
public string? CveId { get; init; }
public string? Component { get; init; } // PURL
public string? Severity { get; init; } // low/medium/high/critical
public ReachabilityStatus Reachability { get; init; }
public double? EpssScore { get; init; } // 0-1
public double? CvssScore { get; init; } // 0-10
public bool IsKev { get; init; }
public ImmutableArray<string> ContextTags { get; init; }
}
```
### DecisionRecord
The action taken and why:
```csharp
public sealed record DecisionRecord
{
public required DecisionAction Action { get; init; }
public required string Rationale { get; init; }
public required string DecidedBy { get; init; }
public required DateTimeOffset DecidedAt { get; init; }
public string? PolicyReference { get; init; }
public MitigationDetails? Mitigation { get; init; }
}
```
### OutcomeRecord
The result of the decision:
```csharp
public sealed record OutcomeRecord
{
public required OutcomeStatus Status { get; init; }
public TimeSpan? ResolutionTime { get; init; }
public string? ActualImpact { get; init; }
public string? LessonsLearned { get; init; }
public required string RecordedBy { get; init; }
public required DateTimeOffset RecordedAt { get; init; }
}
```
## Similarity Algorithm
### Vector Generation
The `SimilarityVectorGenerator` creates 50-dimensional feature vectors:
```
Vector Layout:
[0-9] : CVE category one-hot (memory, injection, auth, crypto, dos,
info-disclosure, privilege-escalation, xss, path-traversal, other)
[10-14] : Severity one-hot (none, low, medium, high, critical)
[15-18] : Reachability one-hot (unknown, reachable, not-reachable, potential)
[19-23] : EPSS band one-hot (0-0.2, 0.2-0.4, 0.4-0.6, 0.6-0.8, 0.8-1.0)
[24-28] : CVSS band one-hot (0-2, 2-4, 4-6, 6-8, 8-10)
[29] : KEV flag (0 or 1)
[30-39] : Component type one-hot (npm, maven, pypi, nuget, go, cargo,
deb, rpm, apk, other)
[40-49] : Context tag presence (production, development, staging,
external-facing, internal, payment, auth, data, api, frontend)
```
### Cosine Similarity
Similarity between vectors A and B:
```
similarity = (A · B) / (||A|| × ||B||)
```
Where `A · B` is the dot product and `||A||` is the L2 norm.
### CVE Classification
CVEs are classified by analyzing keywords in the CVE ID and description:
| Category | Keywords |
|----------|----------|
| memory | buffer, overflow, heap, stack, use-after-free |
| injection | sql, command, code injection, ldap |
| auth | authentication, authorization, bypass |
| crypto | cryptographic, encryption, key |
| dos | denial of service, resource exhaustion |
| info-disclosure | information disclosure, leak |
| privilege-escalation | privilege escalation, elevation |
| xss | cross-site scripting, xss |
| path-traversal | path traversal, directory traversal |
## Playbook Suggestion Algorithm
### Confidence Calculation
```csharp
confidence = baseSimilarity
× successRateBonus
× recencyBonus
× evidenceCountBonus
```
Where:
- `baseSimilarity`: Highest similarity score from matching records
- `successRateBonus`: `1 + (successRate - 0.5) * 0.5` (rewards high success rate)
- `recencyBonus`: More recent decisions weighted higher
- `evidenceCountBonus`: More evidence = higher confidence
### Suggestion Ranking
1. Group past decisions by action taken
2. For each action, calculate:
- Average similarity of records with that action
- Success rate for that action
- Number of similar decisions
3. Compute confidence score
4. Rank by confidence descending
5. Return top N suggestions
### Rationale Generation
Rationales are generated programmatically:
```
"{confidence}% confidence based on {count} similar past decisions.
{action} succeeded in {successRate}% of {factors}."
```
## Storage Design
### PostgreSQL Schema
```sql
CREATE TABLE opsmemory.decisions (
memory_id TEXT PRIMARY KEY,
tenant_id TEXT NOT NULL,
recorded_at TIMESTAMPTZ NOT NULL,
-- Denormalized situation fields for indexing
cve_id TEXT,
component TEXT,
severity TEXT,
-- Full data as JSONB
situation JSONB NOT NULL,
decision JSONB NOT NULL,
outcome JSONB,
-- Similarity vector as array (not pgvector)
similarity_vector REAL[] NOT NULL
);
-- Indexes
CREATE INDEX idx_decisions_tenant ON opsmemory.decisions(tenant_id);
CREATE INDEX idx_decisions_recorded ON opsmemory.decisions(recorded_at DESC);
CREATE INDEX idx_decisions_cve ON opsmemory.decisions(cve_id) WHERE cve_id IS NOT NULL;
CREATE INDEX idx_decisions_component ON opsmemory.decisions(component) WHERE component IS NOT NULL;
```
### Why Not pgvector?
The current implementation uses PostgreSQL arrays instead of pgvector:
1. **Simpler deployment**: No extension installation required
2. **Smaller dataset**: OpsMemory is per-org, not global
3. **Adequate performance**: Array operations are fast enough for <100K records
4. **Future option**: Can migrate to pgvector if needed
### Cosine Similarity in SQL
```sql
-- Cosine similarity between query vector and stored vectors
SELECT memory_id,
(
SELECT SUM(a * b)
FROM UNNEST(similarity_vector, @query_vector) AS t(a, b)
) / (
SQRT((SELECT SUM(a * a) FROM UNNEST(similarity_vector) AS t(a))) *
SQRT((SELECT SUM(b * b) FROM UNNEST(@query_vector) AS t(b)))
) AS similarity
FROM opsmemory.decisions
WHERE tenant_id = @tenant_id
ORDER BY similarity DESC
LIMIT @top_k;
```
## API Design
### Endpoint Overview
| Method | Path | Description |
|--------|------|-------------|
| POST | `/api/v1/opsmemory/decisions` | Record a new decision |
| GET | `/api/v1/opsmemory/decisions/{id}` | Get decision details |
| POST | `/api/v1/opsmemory/decisions/{id}/outcome` | Record outcome |
| GET | `/api/v1/opsmemory/suggestions` | Get playbook suggestions |
| GET | `/api/v1/opsmemory/decisions` | Query past decisions |
| GET | `/api/v1/opsmemory/stats` | Get statistics |
### Request/Response DTOs
The API uses string-based DTOs that convert to/from internal enums:
```csharp
// API accepts strings
public record RecordDecisionRequest
{
public required string Action { get; init; } // "Remediate", "Accept", etc.
public string? Reachability { get; init; } // "reachable", "not-reachable"
}
// Internal uses enums
public enum DecisionAction { Accept, Remediate, Quarantine, ... }
public enum ReachabilityStatus { Unknown, Reachable, NotReachable, Potential }
```
## Testing Strategy
### Unit Tests (26 tests)
**SimilarityVectorGeneratorTests:**
- Vector dimension validation
- Feature encoding (severity, reachability, EPSS, CVSS, KEV)
- Component type classification
- Context tag encoding
- Vector normalization
- Cosine similarity computation
- Matching factor detection
**PlaybookSuggestionServiceTests:**
- Empty history handling
- Single record suggestions
- Multiple record ranking
- Confidence calculation
- Rationale generation
- Evidence linking
### Integration Tests (5 tests)
**PostgresOpsMemoryStoreTests:**
- Decision persistence and retrieval
- Outcome updates
- Tenant isolation
- Query filtering
- Statistics calculation
## Performance Considerations
### Indexing Strategy
- Primary key on `memory_id` for direct lookups
- Index on `tenant_id` for isolation
- Index on `recorded_at` for recent-first queries
- Partial indexes on `cve_id` and `component` for filtered queries
### Query Optimization
- Limit similarity search to last N days by default
- Return only top-K similar records
- Use cursor-based pagination for large result sets
### Caching
Currently no caching (records are infrequently accessed). Future options:
- Cache similarity vectors in memory
- Cache recent suggestions per tenant
- Use read replicas for heavy read loads
## Future Enhancements
### pgvector Migration
If dataset grows significantly:
1. Install pgvector extension
2. Add vector column with IVFFlat index
3. Replace array-based similarity with vector operations
4. ~100x speedup for large datasets
### ML-Based Suggestions
Replace rule-based confidence with ML model:
1. Train on historical decision-outcome pairs
2. Include more features (time of day, team, etc.)
3. Use gradient boosting or neural network
4. Continuous learning from new outcomes
### Outcome Prediction
Predict outcome before decision is made:
1. Use past outcomes as training data
2. Predict success probability per action
3. Show predicted outcomes in UI
4. Track prediction accuracy over time