394 lines
14 KiB
Markdown
394 lines
14 KiB
Markdown
# OpsMemory Architecture
|
||
|
||
> **Technical deep-dive into the Decision Ledger**
|
||
|
||
## Overview
|
||
|
||
OpsMemory provides a structured approach to organizational learning from security decisions. It captures the complete lifecycle of a decision - from the situation context through the action taken to the eventual outcome.
|
||
|
||
## Design Principles
|
||
|
||
### 1. Determinism First
|
||
|
||
All operations produce deterministic, reproducible results:
|
||
- Similarity vectors are computed from stable inputs
|
||
- Confidence scores use fixed formulas
|
||
- No randomness in suggestion ranking
|
||
|
||
### 2. Multi-Tenant Isolation
|
||
|
||
Every operation is scoped to a tenant:
|
||
- Records cannot be accessed across tenants
|
||
- Similarity search is tenant-isolated
|
||
- Statistics are per-tenant
|
||
|
||
### 3. Fire-and-Forget Integration
|
||
|
||
Decision recording is async and non-blocking:
|
||
- UI decisions complete immediately
|
||
- OpsMemory recording happens in background
|
||
- Failures don't affect the primary flow
|
||
|
||
### 4. Offline Capable
|
||
|
||
All features work without network access:
|
||
- Local PostgreSQL storage
|
||
- No external API dependencies
|
||
- Self-contained similarity computation
|
||
|
||
## Component Architecture
|
||
|
||
```
|
||
┌────────────────────────────────────────────────────────────────────┐
|
||
│ WebService Layer │
|
||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||
│ │ OpsMemoryEndpoints │ │
|
||
│ │ POST /decisions GET /decisions GET /suggestions GET /stats│ │
|
||
│ └──────────────────────────────────────────────────────────────┘ │
|
||
└────────────────────────────────┬───────────────────────────────────┘
|
||
│
|
||
┌────────────────────────────────┼───────────────────────────────────┐
|
||
│ Service Layer │
|
||
│ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────────┐ │
|
||
│ │ PlaybookSuggest │ │ OutcomeTracking │ │ SimilarityVector │ │
|
||
│ │ Service │ │ Service │ │ Generator │ │
|
||
│ └────────┬────────┘ └────────┬────────┘ └─────────┬──────────┘ │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||
│ │ IOpsMemoryStore │ │
|
||
│ └──────────────────────────────────────────────────────────────┘ │
|
||
└────────────────────────────────┬───────────────────────────────────┘
|
||
│
|
||
┌────────────────────────────────┼───────────────────────────────────┐
|
||
│ Storage Layer │
|
||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||
│ │ PostgresOpsMemoryStore │ │
|
||
│ │ - Decision CRUD │ │
|
||
│ │ - Outcome updates │ │
|
||
│ │ - Similarity search (array-based cosine) │ │
|
||
│ │ - Query with pagination │ │
|
||
│ │ - Statistics aggregation │ │
|
||
│ └──────────────────────────────────────────────────────────────┘ │
|
||
└────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Data Model
|
||
|
||
### OpsMemoryRecord
|
||
|
||
The core aggregate containing all decision information:
|
||
|
||
```csharp
|
||
public sealed record OpsMemoryRecord
|
||
{
|
||
public required string MemoryId { get; init; }
|
||
public required string TenantId { get; init; }
|
||
public required DateTimeOffset RecordedAt { get; init; }
|
||
public required SituationContext Situation { get; init; }
|
||
public required DecisionRecord Decision { get; init; }
|
||
public OutcomeRecord? Outcome { get; init; }
|
||
public ImmutableArray<float> SimilarityVector { get; init; }
|
||
}
|
||
```
|
||
|
||
### SituationContext
|
||
|
||
Captures the security context at decision time:
|
||
|
||
```csharp
|
||
public sealed record SituationContext
|
||
{
|
||
public string? CveId { get; init; }
|
||
public string? Component { get; init; } // PURL
|
||
public string? Severity { get; init; } // low/medium/high/critical
|
||
public ReachabilityStatus Reachability { get; init; }
|
||
public double? EpssScore { get; init; } // 0-1
|
||
public double? CvssScore { get; init; } // 0-10
|
||
public bool IsKev { get; init; }
|
||
public ImmutableArray<string> ContextTags { get; init; }
|
||
}
|
||
```
|
||
|
||
### DecisionRecord
|
||
|
||
The action taken and why:
|
||
|
||
```csharp
|
||
public sealed record DecisionRecord
|
||
{
|
||
public required DecisionAction Action { get; init; }
|
||
public required string Rationale { get; init; }
|
||
public required string DecidedBy { get; init; }
|
||
public required DateTimeOffset DecidedAt { get; init; }
|
||
public string? PolicyReference { get; init; }
|
||
public MitigationDetails? Mitigation { get; init; }
|
||
}
|
||
```
|
||
|
||
### OutcomeRecord
|
||
|
||
The result of the decision:
|
||
|
||
```csharp
|
||
public sealed record OutcomeRecord
|
||
{
|
||
public required OutcomeStatus Status { get; init; }
|
||
public TimeSpan? ResolutionTime { get; init; }
|
||
public string? ActualImpact { get; init; }
|
||
public string? LessonsLearned { get; init; }
|
||
public required string RecordedBy { get; init; }
|
||
public required DateTimeOffset RecordedAt { get; init; }
|
||
}
|
||
```
|
||
|
||
## Similarity Algorithm
|
||
|
||
### Vector Generation
|
||
|
||
The `SimilarityVectorGenerator` creates 50-dimensional feature vectors:
|
||
|
||
```
|
||
Vector Layout:
|
||
[0-9] : CVE category one-hot (memory, injection, auth, crypto, dos,
|
||
info-disclosure, privilege-escalation, xss, path-traversal, other)
|
||
[10-14] : Severity one-hot (none, low, medium, high, critical)
|
||
[15-18] : Reachability one-hot (unknown, reachable, not-reachable, potential)
|
||
[19-23] : EPSS band one-hot (0-0.2, 0.2-0.4, 0.4-0.6, 0.6-0.8, 0.8-1.0)
|
||
[24-28] : CVSS band one-hot (0-2, 2-4, 4-6, 6-8, 8-10)
|
||
[29] : KEV flag (0 or 1)
|
||
[30-39] : Component type one-hot (npm, maven, pypi, nuget, go, cargo,
|
||
deb, rpm, apk, other)
|
||
[40-49] : Context tag presence (production, development, staging,
|
||
external-facing, internal, payment, auth, data, api, frontend)
|
||
```
|
||
|
||
### Cosine Similarity
|
||
|
||
Similarity between vectors A and B:
|
||
|
||
```
|
||
similarity = (A · B) / (||A|| × ||B||)
|
||
```
|
||
|
||
Where `A · B` is the dot product and `||A||` is the L2 norm.
|
||
|
||
### CVE Classification
|
||
|
||
CVEs are classified by analyzing keywords in the CVE ID and description:
|
||
|
||
| Category | Keywords |
|
||
|----------|----------|
|
||
| memory | buffer, overflow, heap, stack, use-after-free |
|
||
| injection | sql, command, code injection, ldap |
|
||
| auth | authentication, authorization, bypass |
|
||
| crypto | cryptographic, encryption, key |
|
||
| dos | denial of service, resource exhaustion |
|
||
| info-disclosure | information disclosure, leak |
|
||
| privilege-escalation | privilege escalation, elevation |
|
||
| xss | cross-site scripting, xss |
|
||
| path-traversal | path traversal, directory traversal |
|
||
|
||
## Playbook Suggestion Algorithm
|
||
|
||
### Confidence Calculation
|
||
|
||
```csharp
|
||
confidence = baseSimilarity
|
||
× successRateBonus
|
||
× recencyBonus
|
||
× evidenceCountBonus
|
||
```
|
||
|
||
Where:
|
||
- `baseSimilarity`: Highest similarity score from matching records
|
||
- `successRateBonus`: `1 + (successRate - 0.5) * 0.5` (rewards high success rate)
|
||
- `recencyBonus`: More recent decisions weighted higher
|
||
- `evidenceCountBonus`: More evidence = higher confidence
|
||
|
||
### Suggestion Ranking
|
||
|
||
1. Group past decisions by action taken
|
||
2. For each action, calculate:
|
||
- Average similarity of records with that action
|
||
- Success rate for that action
|
||
- Number of similar decisions
|
||
3. Compute confidence score
|
||
4. Rank by confidence descending
|
||
5. Return top N suggestions
|
||
|
||
### Rationale Generation
|
||
|
||
Rationales are generated programmatically:
|
||
|
||
```
|
||
"{confidence}% confidence based on {count} similar past decisions.
|
||
{action} succeeded in {successRate}% of {factors}."
|
||
```
|
||
|
||
## Storage Design
|
||
|
||
### PostgreSQL Schema
|
||
|
||
```sql
|
||
CREATE TABLE opsmemory.decisions (
|
||
memory_id TEXT PRIMARY KEY,
|
||
tenant_id TEXT NOT NULL,
|
||
recorded_at TIMESTAMPTZ NOT NULL,
|
||
|
||
-- Denormalized situation fields for indexing
|
||
cve_id TEXT,
|
||
component TEXT,
|
||
severity TEXT,
|
||
|
||
-- Full data as JSONB
|
||
situation JSONB NOT NULL,
|
||
decision JSONB NOT NULL,
|
||
outcome JSONB,
|
||
|
||
-- Similarity vector as array (not pgvector)
|
||
similarity_vector REAL[] NOT NULL
|
||
);
|
||
|
||
-- Indexes
|
||
CREATE INDEX idx_decisions_tenant ON opsmemory.decisions(tenant_id);
|
||
CREATE INDEX idx_decisions_recorded ON opsmemory.decisions(recorded_at DESC);
|
||
CREATE INDEX idx_decisions_cve ON opsmemory.decisions(cve_id) WHERE cve_id IS NOT NULL;
|
||
CREATE INDEX idx_decisions_component ON opsmemory.decisions(component) WHERE component IS NOT NULL;
|
||
```
|
||
|
||
### Why Not pgvector?
|
||
|
||
The current implementation uses PostgreSQL arrays instead of pgvector:
|
||
|
||
1. **Simpler deployment**: No extension installation required
|
||
2. **Smaller dataset**: OpsMemory is per-org, not global
|
||
3. **Adequate performance**: Array operations are fast enough for <100K records
|
||
4. **Future option**: Can migrate to pgvector if needed
|
||
|
||
### Cosine Similarity in SQL
|
||
|
||
```sql
|
||
-- Cosine similarity between query vector and stored vectors
|
||
SELECT memory_id,
|
||
(
|
||
SELECT SUM(a * b)
|
||
FROM UNNEST(similarity_vector, @query_vector) AS t(a, b)
|
||
) / (
|
||
SQRT((SELECT SUM(a * a) FROM UNNEST(similarity_vector) AS t(a))) *
|
||
SQRT((SELECT SUM(b * b) FROM UNNEST(@query_vector) AS t(b)))
|
||
) AS similarity
|
||
FROM opsmemory.decisions
|
||
WHERE tenant_id = @tenant_id
|
||
ORDER BY similarity DESC
|
||
LIMIT @top_k;
|
||
```
|
||
|
||
## API Design
|
||
|
||
### Endpoint Overview
|
||
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| POST | `/api/v1/opsmemory/decisions` | Record a new decision |
|
||
| GET | `/api/v1/opsmemory/decisions/{id}` | Get decision details |
|
||
| POST | `/api/v1/opsmemory/decisions/{id}/outcome` | Record outcome |
|
||
| GET | `/api/v1/opsmemory/suggestions` | Get playbook suggestions |
|
||
| GET | `/api/v1/opsmemory/decisions` | Query past decisions |
|
||
| GET | `/api/v1/opsmemory/stats` | Get statistics |
|
||
|
||
### Request/Response DTOs
|
||
|
||
The API uses string-based DTOs that convert to/from internal enums:
|
||
|
||
```csharp
|
||
// API accepts strings
|
||
public record RecordDecisionRequest
|
||
{
|
||
public required string Action { get; init; } // "Remediate", "Accept", etc.
|
||
public string? Reachability { get; init; } // "reachable", "not-reachable"
|
||
}
|
||
|
||
// Internal uses enums
|
||
public enum DecisionAction { Accept, Remediate, Quarantine, ... }
|
||
public enum ReachabilityStatus { Unknown, Reachable, NotReachable, Potential }
|
||
```
|
||
|
||
## Testing Strategy
|
||
|
||
### Unit Tests (26 tests)
|
||
|
||
**SimilarityVectorGeneratorTests:**
|
||
- Vector dimension validation
|
||
- Feature encoding (severity, reachability, EPSS, CVSS, KEV)
|
||
- Component type classification
|
||
- Context tag encoding
|
||
- Vector normalization
|
||
- Cosine similarity computation
|
||
- Matching factor detection
|
||
|
||
**PlaybookSuggestionServiceTests:**
|
||
- Empty history handling
|
||
- Single record suggestions
|
||
- Multiple record ranking
|
||
- Confidence calculation
|
||
- Rationale generation
|
||
- Evidence linking
|
||
|
||
### Integration Tests (5 tests)
|
||
|
||
**PostgresOpsMemoryStoreTests:**
|
||
- Decision persistence and retrieval
|
||
- Outcome updates
|
||
- Tenant isolation
|
||
- Query filtering
|
||
- Statistics calculation
|
||
|
||
## Performance Considerations
|
||
|
||
### Indexing Strategy
|
||
|
||
- Primary key on `memory_id` for direct lookups
|
||
- Index on `tenant_id` for isolation
|
||
- Index on `recorded_at` for recent-first queries
|
||
- Partial indexes on `cve_id` and `component` for filtered queries
|
||
|
||
### Query Optimization
|
||
|
||
- Limit similarity search to last N days by default
|
||
- Return only top-K similar records
|
||
- Use cursor-based pagination for large result sets
|
||
|
||
### Caching
|
||
|
||
Currently no caching (records are infrequently accessed). Future options:
|
||
- Cache similarity vectors in memory
|
||
- Cache recent suggestions per tenant
|
||
- Use read replicas for heavy read loads
|
||
|
||
## Future Enhancements
|
||
|
||
### pgvector Migration
|
||
|
||
If dataset grows significantly:
|
||
1. Install pgvector extension
|
||
2. Add vector column with IVFFlat index
|
||
3. Replace array-based similarity with vector operations
|
||
4. ~100x speedup for large datasets
|
||
|
||
### ML-Based Suggestions
|
||
|
||
Replace rule-based confidence with ML model:
|
||
1. Train on historical decision-outcome pairs
|
||
2. Include more features (time of day, team, etc.)
|
||
3. Use gradient boosting or neural network
|
||
4. Continuous learning from new outcomes
|
||
|
||
### Outcome Prediction
|
||
|
||
Predict outcome before decision is made:
|
||
1. Use past outcomes as training data
|
||
2. Predict success probability per action
|
||
3. Show predicted outcomes in UI
|
||
4. Track prediction accuracy over time
|