# Semantic Entrypoint Analysis > Part of Sprint 0411 - Semantic Entrypoint Engine ## Overview The Semantic Entrypoint Engine provides deep understanding of container entrypoints by inferring: - **Application Intent** - What the application is designed to do (web server, CLI tool, worker, etc.) - **Capabilities** - What system resources and external services the application uses - **Attack Surface** - Potential security vulnerabilities based on detected patterns - **Data Boundaries** - I/O edges where data enters or leaves the application This semantic layer enables more accurate vulnerability prioritization, reachability analysis, and policy decisioning. ## Schema Definition ### SemanticEntrypoint Record The core output of semantic analysis: ```csharp public sealed record SemanticEntrypoint { public required string Id { get; init; } public required EntrypointSpecification Specification { get; init; } public required ApplicationIntent Intent { get; init; } public required CapabilityClass Capabilities { get; init; } public required ImmutableArray AttackSurface { get; init; } public required ImmutableArray DataBoundaries { get; init; } public required SemanticConfidence Confidence { get; init; } public string? Language { get; init; } public string? Framework { get; init; } public string? FrameworkVersion { get; init; } public string? RuntimeVersion { get; init; } public ImmutableDictionary? Metadata { get; init; } } ``` ### Application Intent Enumeration of recognized application types: | Intent | Description | Example Frameworks | |--------|-------------|-------------------| | `WebServer` | HTTP/HTTPS listener | Django, Express, ASP.NET Core | | `CliTool` | Command-line utility | Click, Cobra, System.CommandLine | | `Worker` | Background job processor | Celery, Sidekiq, Hangfire | | `BatchJob` | One-shot data processing | MapReduce, ETL scripts | | `Serverless` | FaaS handler | Lambda, Azure Functions | | `Daemon` | Long-running background service | systemd units | | `StreamProcessor` | Real-time data pipeline | Kafka Streams, Flink | | `RpcServer` | gRPC/Thrift server | grpc-go, grpc-dotnet | | `GraphQlServer` | GraphQL API | Apollo, Hot Chocolate | | `DatabaseServer` | Database engine | PostgreSQL, Redis | | `MessageBroker` | Message queue server | RabbitMQ, NATS | | `CacheServer` | Cache/session store | Redis, Memcached | | `ProxyGateway` | Reverse proxy, API gateway | Envoy, NGINX | ### Capability Classes Flags enum representing detected capabilities: | Capability | Description | Detection Signals | |------------|-------------|-------------------| | `NetworkListen` | Opens listening socket | `http.ListenAndServe`, `app.listen()` | | `NetworkConnect` | Makes outbound connections | `requests`, `http.Client` | | `FileRead` | Reads from filesystem | `open()`, `File.ReadAllText()` | | `FileWrite` | Writes to filesystem | File write operations | | `ProcessSpawn` | Spawns child processes | `subprocess`, `exec.Command` | | `DatabaseSql` | SQL database access | `psycopg2`, `SqlConnection` | | `DatabaseNoSql` | NoSQL database access | `pymongo`, `redis` | | `MessageQueue` | Message broker client | `pika`, `kafka-python` | | `CacheAccess` | Cache client operations | `redis`, `memcached` | | `ExternalHttpApi` | External HTTP API calls | REST clients | | `Authentication` | Auth operations | `passport`, `JWT` libraries | | `SecretAccess` | Accesses secrets/credentials | Vault clients, env secrets | ### Threat Vectors Inferred security threats: | Threat Type | CWE ID | OWASP Category | Contributing Capabilities | |------------|--------|----------------|--------------------------| | `SqlInjection` | 89 | A03:2021 | `DatabaseSql` + `UserInput` | | `Xss` | 79 | A03:2021 | `NetworkListen` + `UserInput` | | `Ssrf` | 918 | A10:2021 | `ExternalHttpApi` + `UserInput` | | `Rce` | 94 | A03:2021 | `ProcessSpawn` + `UserInput` | | `PathTraversal` | 22 | A01:2021 | `FileRead` + `UserInput` | | `InsecureDeserialization` | 502 | A08:2021 | Deserialization patterns | | `AuthenticationBypass` | 287 | A07:2021 | Auth patterns detected | | `CommandInjection` | 78 | A03:2021 | `ProcessSpawn` patterns | ### Data Flow Boundaries I/O edges for data flow analysis: | Boundary Type | Direction | Security Relevance | |---------------|-----------|-------------------| | `HttpRequest` | Inbound | User input entry point | | `HttpResponse` | Outbound | Data exposure point | | `DatabaseQuery` | Outbound | SQL injection surface | | `FileInput` | Inbound | Path traversal surface | | `EnvironmentVar` | Inbound | Config injection surface | | `MessageReceive` | Inbound | Deserialization surface | | `ProcessSpawn` | Outbound | Command injection surface | ### Confidence Scoring All inferences include confidence scores: ```csharp public sealed record SemanticConfidence { public double Score { get; init; } // 0.0-1.0 public ConfidenceTier Tier { get; init; } // Unknown, Low, Medium, High, Definitive public ImmutableArray ReasoningChain { get; init; } } ``` | Tier | Score Range | Description | |------|-------------|-------------| | `Definitive` | 0.95-1.0 | Framework explicitly declared | | `High` | 0.8-0.95 | Strong pattern match | | `Medium` | 0.5-0.8 | Multiple weak signals | | `Low` | 0.2-0.5 | Heuristic inference | | `Unknown` | 0.0-0.2 | No reliable signals | ## Language Adapters Semantic analysis uses language-specific adapters: ### Python Adapter - **Django**: Detects `manage.py`, `INSTALLED_APPS`, migrations - **Flask/FastAPI**: Detects `Flask(__name__)`, `FastAPI()` patterns - **Celery**: Detects `Celery()` app, `@task` decorators - **Click/Typer**: Detects CLI decorators - **Lambda**: Detects `lambda_handler` pattern ### Java Adapter - **Spring Boot**: Detects `@SpringBootApplication`, starter dependencies - **Quarkus**: Detects `io.quarkus` packages - **Kafka Streams**: Detects `kafka-streams` dependency - **Main-Class**: Falls back to manifest analysis ### Node Adapter - **Express**: Detects `express()` + `listen()` - **NestJS**: Detects `@nestjs/core` dependency - **Fastify**: Detects `fastify()` patterns - **CLI bin**: Detects `bin` field in package.json ### .NET Adapter - **ASP.NET Core**: Detects `Microsoft.AspNetCore` references - **Worker Service**: Detects `BackgroundService` inheritance - **Console**: Detects `OutputType=Exe` without web deps ### Go Adapter - **net/http**: Detects `http.ListenAndServe` patterns - **Cobra**: Detects `github.com/spf13/cobra` import - **gRPC**: Detects `google.golang.org/grpc` import ## Integration Points ### Entry Trace Pipeline Semantic analysis integrates after entry trace resolution: ``` Container Image ↓ EntryTraceAnalyzer.ResolveAsync() ↓ EntryTraceGraph (nodes, edges, terminals) ↓ SemanticEntrypointOrchestrator.AnalyzeAsync() ↓ SemanticEntrypoint (intent, capabilities, threats) ``` ### SBOM Output Semantic data appears in CycloneDX properties: ```json { "properties": [ { "name": "stellaops:semantic.intent", "value": "WebServer" }, { "name": "stellaops:semantic.capabilities", "value": "NetworkListen,DatabaseSql" }, { "name": "stellaops:semantic.threats", "value": "[{\"type\":\"SqlInjection\",\"confidence\":0.7}]" }, { "name": "stellaops:semantic.risk.score", "value": "0.7" }, { "name": "stellaops:semantic.framework", "value": "django" } ] } ``` ### RichGraph Output Semantic attributes on entrypoint nodes: ```json { "kind": "entrypoint", "attributes": { "semantic_intent": "WebServer", "semantic_capabilities": "NetworkListen,DatabaseSql,UserInput", "semantic_threats": "SqlInjection,Xss", "semantic_risk_score": "0.7", "semantic_confidence": "0.85", "semantic_confidence_tier": "High" } } ``` ## Usage Examples ### CLI Usage ```bash # Scan with semantic analysis stella scan myimage:latest --semantic # Output includes semantic fields stella scan myimage:latest --format json | jq '.semantic' ``` ### Programmatic Usage ```csharp // Create orchestrator var orchestrator = new SemanticEntrypointOrchestrator(); // Create context from entry trace result var context = orchestrator.CreateContext(entryTraceResult, fileSystem, containerMetadata); // Run analysis var result = await orchestrator.AnalyzeAsync(context); if (result.Success && result.Entrypoint is not null) { Console.WriteLine($"Intent: {result.Entrypoint.Intent}"); Console.WriteLine($"Capabilities: {result.Entrypoint.Capabilities}"); Console.WriteLine($"Risk Score: {result.Entrypoint.AttackSurface.Max(t => t.Confidence)}"); } ``` ## Extending the Engine ### Adding a New Language Adapter 1. Implement `ISemanticEntrypointAnalyzer`: ```csharp public sealed class RubySemanticAdapter : ISemanticEntrypointAnalyzer { public IReadOnlyList SupportedLanguages => new[] { "ruby" }; public int Priority => 100; public ValueTask AnalyzeAsync( SemanticAnalysisContext context, CancellationToken cancellationToken) { // Detect Rails, Sinatra, Sidekiq, etc. } } ``` 2. Register in `SemanticEntrypointOrchestrator.CreateDefaultAdapters()`. ### Adding a New Capability 1. Add to `CapabilityClass` flags enum 2. Update `CapabilityDetector` with detection patterns 3. Update `ThreatVectorInferrer` if capability contributes to threats 4. Update `DataBoundaryMapper` if capability implies I/O boundaries ## Related Documentation - [Entry Trace Problem Statement](./entrypoint-problem.md) - [Static Analysis Approach](./entrypoint-static-analysis.md) - [Language-Specific Guides](./entrypoint-lang-python.md) - [Reachability Evidence](../../reachability/function-level-evidence.md)