up

2025-12-13 18:08:55 +02:00
parent 6e45066e37
commit f1a39c4ce3
234 changed files with 24038 additions and 6910 deletions
--- a/docs/modules/scanner/operations/entrypoint-semantic.md
+++ b/docs/modules/scanner/operations/entrypoint-semantic.md
@@ -0,0 +1,280 @@
+# Semantic Entrypoint Analysis
+
+> Part of Sprint 0411 - Semantic Entrypoint Engine
+
+## Overview
+
+The Semantic Entrypoint Engine provides deep understanding of container entrypoints by inferring:
+- **Application Intent** - What the application is designed to do (web server, CLI tool, worker, etc.)
+- **Capabilities** - What system resources and external services the application uses
+- **Attack Surface** - Potential security vulnerabilities based on detected patterns
+- **Data Boundaries** - I/O edges where data enters or leaves the application
+
+This semantic layer enables more accurate vulnerability prioritization, reachability analysis, and policy decisioning.
+
+## Schema Definition
+
+### SemanticEntrypoint Record
+
+The core output of semantic analysis:
+
+```csharp
+public sealed record SemanticEntrypoint
+{
+    public required string Id { get; init; }
+    public required EntrypointSpecification Specification { get; init; }
+    public required ApplicationIntent Intent { get; init; }
+    public required CapabilityClass Capabilities { get; init; }
+    public required ImmutableArray<ThreatVector> AttackSurface { get; init; }
+    public required ImmutableArray<DataFlowBoundary> DataBoundaries { get; init; }
+    public required SemanticConfidence Confidence { get; init; }
+    public string? Language { get; init; }
+    public string? Framework { get; init; }
+    public string? FrameworkVersion { get; init; }
+    public string? RuntimeVersion { get; init; }
+    public ImmutableDictionary<string, string>? Metadata { get; init; }
+}
+```
+
+### Application Intent
+
+Enumeration of recognized application types:
+
+| Intent | Description | Example Frameworks |
+|--------|-------------|-------------------|
+| `WebServer` | HTTP/HTTPS listener | Django, Express, ASP.NET Core |
+| `CliTool` | Command-line utility | Click, Cobra, System.CommandLine |
+| `Worker` | Background job processor | Celery, Sidekiq, Hangfire |
+| `BatchJob` | One-shot data processing | MapReduce, ETL scripts |
+| `Serverless` | FaaS handler | Lambda, Azure Functions |
+| `Daemon` | Long-running background service | systemd units |
+| `StreamProcessor` | Real-time data pipeline | Kafka Streams, Flink |
+| `RpcServer` | gRPC/Thrift server | grpc-go, grpc-dotnet |
+| `GraphQlServer` | GraphQL API | Apollo, Hot Chocolate |
+| `DatabaseServer` | Database engine | PostgreSQL, Redis |
+| `MessageBroker` | Message queue server | RabbitMQ, NATS |
+| `CacheServer` | Cache/session store | Redis, Memcached |
+| `ProxyGateway` | Reverse proxy, API gateway | Envoy, NGINX |
+
+### Capability Classes
+
+Flags enum representing detected capabilities:
+
+| Capability | Description | Detection Signals |
+|------------|-------------|-------------------|
+| `NetworkListen` | Opens listening socket | `http.ListenAndServe`, `app.listen()` |
+| `NetworkConnect` | Makes outbound connections | `requests`, `http.Client` |
+| `FileRead` | Reads from filesystem | `open()`, `File.ReadAllText()` |
+| `FileWrite` | Writes to filesystem | File write operations |
+| `ProcessSpawn` | Spawns child processes | `subprocess`, `exec.Command` |
+| `DatabaseSql` | SQL database access | `psycopg2`, `SqlConnection` |
+| `DatabaseNoSql` | NoSQL database access | `pymongo`, `redis` |
+| `MessageQueue` | Message broker client | `pika`, `kafka-python` |
+| `CacheAccess` | Cache client operations | `redis`, `memcached` |
+| `ExternalHttpApi` | External HTTP API calls | REST clients |
+| `Authentication` | Auth operations | `passport`, `JWT` libraries |
+| `SecretAccess` | Accesses secrets/credentials | Vault clients, env secrets |
+
+### Threat Vectors
+
+Inferred security threats:
+
+| Threat Type | CWE ID | OWASP Category | Contributing Capabilities |
+|------------|--------|----------------|--------------------------|
+| `SqlInjection` | 89 | A03:2021 | `DatabaseSql` + `UserInput` |
+| `Xss` | 79 | A03:2021 | `NetworkListen` + `UserInput` |
+| `Ssrf` | 918 | A10:2021 | `ExternalHttpApi` + `UserInput` |
+| `Rce` | 94 | A03:2021 | `ProcessSpawn` + `UserInput` |
+| `PathTraversal` | 22 | A01:2021 | `FileRead` + `UserInput` |
+| `InsecureDeserialization` | 502 | A08:2021 | Deserialization patterns |
+| `AuthenticationBypass` | 287 | A07:2021 | Auth patterns detected |
+| `CommandInjection` | 78 | A03:2021 | `ProcessSpawn` patterns |
+
+### Data Flow Boundaries
+
+I/O edges for data flow analysis:
+
+| Boundary Type | Direction | Security Relevance |
+|---------------|-----------|-------------------|
+| `HttpRequest` | Inbound | User input entry point |
+| `HttpResponse` | Outbound | Data exposure point |
+| `DatabaseQuery` | Outbound | SQL injection surface |
+| `FileInput` | Inbound | Path traversal surface |
+| `EnvironmentVar` | Inbound | Config injection surface |
+| `MessageReceive` | Inbound | Deserialization surface |
+| `ProcessSpawn` | Outbound | Command injection surface |
+
+### Confidence Scoring
+
+All inferences include confidence scores:
+
+```csharp
+public sealed record SemanticConfidence
+{
+    public double Score { get; init; }           // 0.0-1.0
+    public ConfidenceTier Tier { get; init; }    // Unknown, Low, Medium, High, Definitive
+    public ImmutableArray<string> ReasoningChain { get; init; }
+}
+```
+
+| Tier | Score Range | Description |
+|------|-------------|-------------|
+| `Definitive` | 0.95-1.0 | Framework explicitly declared |
+| `High` | 0.8-0.95 | Strong pattern match |
+| `Medium` | 0.5-0.8 | Multiple weak signals |
+| `Low` | 0.2-0.5 | Heuristic inference |
+| `Unknown` | 0.0-0.2 | No reliable signals |
+
+## Language Adapters
+
+Semantic analysis uses language-specific adapters:
+
+### Python Adapter
+- **Django**: Detects `manage.py`, `INSTALLED_APPS`, migrations
+- **Flask/FastAPI**: Detects `Flask(__name__)`, `FastAPI()` patterns
+- **Celery**: Detects `Celery()` app, `@task` decorators
+- **Click/Typer**: Detects CLI decorators
+- **Lambda**: Detects `lambda_handler` pattern
+
+### Java Adapter
+- **Spring Boot**: Detects `@SpringBootApplication`, starter dependencies
+- **Quarkus**: Detects `io.quarkus` packages
+- **Kafka Streams**: Detects `kafka-streams` dependency
+- **Main-Class**: Falls back to manifest analysis
+
+### Node Adapter
+- **Express**: Detects `express()` + `listen()`
+- **NestJS**: Detects `@nestjs/core` dependency
+- **Fastify**: Detects `fastify()` patterns
+- **CLI bin**: Detects `bin` field in package.json
+
+### .NET Adapter
+- **ASP.NET Core**: Detects `Microsoft.AspNetCore` references
+- **Worker Service**: Detects `BackgroundService` inheritance
+- **Console**: Detects `OutputType=Exe` without web deps
+
+### Go Adapter
+- **net/http**: Detects `http.ListenAndServe` patterns
+- **Cobra**: Detects `github.com/spf13/cobra` import
+- **gRPC**: Detects `google.golang.org/grpc` import
+
+## Integration Points
+
+### Entry Trace Pipeline
+
+Semantic analysis integrates after entry trace resolution:
+
+```
+Container Image
+     ↓
+EntryTraceAnalyzer.ResolveAsync()
+     ↓
+EntryTraceGraph (nodes, edges, terminals)
+     ↓
+SemanticEntrypointOrchestrator.AnalyzeAsync()
+     ↓
+SemanticEntrypoint (intent, capabilities, threats)
+```
+
+### SBOM Output
+
+Semantic data appears in CycloneDX properties:
+
+```json
+{
+  "properties": [
+    { "name": "stellaops:semantic.intent", "value": "WebServer" },
+    { "name": "stellaops:semantic.capabilities", "value": "NetworkListen,DatabaseSql" },
+    { "name": "stellaops:semantic.threats", "value": "[{\"type\":\"SqlInjection\",\"confidence\":0.7}]" },
+    { "name": "stellaops:semantic.risk.score", "value": "0.7" },
+    { "name": "stellaops:semantic.framework", "value": "django" }
+  ]
+}
+```
+
+### RichGraph Output
+
+Semantic attributes on entrypoint nodes:
+
+```json
+{
+  "kind": "entrypoint",
+  "attributes": {
+    "semantic_intent": "WebServer",
+    "semantic_capabilities": "NetworkListen,DatabaseSql,UserInput",
+    "semantic_threats": "SqlInjection,Xss",
+    "semantic_risk_score": "0.7",
+    "semantic_confidence": "0.85",
+    "semantic_confidence_tier": "High"
+  }
+}
+```
+
+## Usage Examples
+
+### CLI Usage
+
+```bash
+# Scan with semantic analysis
+stella scan myimage:latest --semantic
+
+# Output includes semantic fields
+stella scan myimage:latest --format json | jq '.semantic'
+```
+
+### Programmatic Usage
+
+```csharp
+// Create orchestrator
+var orchestrator = new SemanticEntrypointOrchestrator();
+
+// Create context from entry trace result
+var context = orchestrator.CreateContext(entryTraceResult, fileSystem, containerMetadata);
+
+// Run analysis
+var result = await orchestrator.AnalyzeAsync(context);
+
+if (result.Success && result.Entrypoint is not null)
+{
+    Console.WriteLine($"Intent: {result.Entrypoint.Intent}");
+    Console.WriteLine($"Capabilities: {result.Entrypoint.Capabilities}");
+    Console.WriteLine($"Risk Score: {result.Entrypoint.AttackSurface.Max(t => t.Confidence)}");
+}
+```
+
+## Extending the Engine
+
+### Adding a New Language Adapter
+
+1. Implement `ISemanticEntrypointAnalyzer`:
+
+```csharp
+public sealed class RubySemanticAdapter : ISemanticEntrypointAnalyzer
+{
+    public IReadOnlyList<string> SupportedLanguages => new[] { "ruby" };
+    public int Priority => 100;
+
+    public ValueTask<SemanticEntrypoint> AnalyzeAsync(
+        SemanticAnalysisContext context,
+        CancellationToken cancellationToken)
+    {
+        // Detect Rails, Sinatra, Sidekiq, etc.
+    }
+}
+```
+
+2. Register in `SemanticEntrypointOrchestrator.CreateDefaultAdapters()`.
+
+### Adding a New Capability
+
+1. Add to `CapabilityClass` flags enum
+2. Update `CapabilityDetector` with detection patterns
+3. Update `ThreatVectorInferrer` if capability contributes to threats
+4. Update `DataBoundaryMapper` if capability implies I/O boundaries
+
+## Related Documentation
+
+- [Entry Trace Problem Statement](./entrypoint-problem.md)
+- [Static Analysis Approach](./entrypoint-static-analysis.md)
+- [Language-Specific Guides](./entrypoint-lang-python.md)
+- [Reachability Evidence](../../reachability/function-level-evidence.md)