# Semantic Entrypoint Schema > Part of Sprint 0411 - Semantic Entrypoint Engine (Task 23) This document defines the schema for semantic entrypoint analysis, which enriches container scan results with application-level intent, capabilities, and threat modeling. --- ## Overview The Semantic Entrypoint Engine analyzes container entrypoints to infer: 1. **Application Intent** - What kind of application is running (web server, worker, CLI, etc.) 2. **Capabilities** - What system resources the application accesses (network, filesystem, database, etc.) 3. **Attack Surface** - Potential security threat vectors based on capabilities 4. **Data Boundaries** - Data flow boundaries with sensitivity classification This semantic layer enables more precise vulnerability prioritization by understanding which code paths are actually reachable from the entrypoint. --- ## Schema Definitions ### SemanticEntrypoint The root type representing semantic analysis of an entrypoint. ```typescript interface SemanticEntrypoint { id: string; // Unique identifier for this analysis specification: EntrypointSpecification; intent: ApplicationIntent; capabilities: CapabilityClass; // Bitmask of detected capabilities attackSurface: ThreatVector[]; dataBoundaries: DataFlowBoundary[]; confidence: SemanticConfidence; language?: string; // Primary language (python, java, node, dotnet, go) framework?: string; // Detected framework (django, spring-boot, express, etc.) frameworkVersion?: string; runtimeVersion?: string; analyzedAt: string; // ISO-8601 timestamp } ``` ### ApplicationIntent Enumeration of application types. | Value | Description | Common Indicators | |-------|-------------|-------------------| | `Unknown` | Intent could not be determined | Fallback | | `WebServer` | HTTP/HTTPS server | Flask, Django, Express, ASP.NET Core, Gin | | `Worker` | Background job processor | Celery, Sidekiq, BackgroundService | | `CliTool` | Command-line interface | Click, argparse, Cobra, Picocli | | `Serverless` | FaaS function | Lambda handler, Cloud Functions | | `StreamProcessor` | Event stream handler | Kafka Streams, Flink | | `RpcServer` | RPC/gRPC server | gRPC, Thrift | | `Daemon` | Long-running service | Custom main loops | | `TestRunner` | Test execution | pytest, JUnit, xunit | | `BatchJob` | Scheduled/periodic task | Cron-style entry | | `Proxy` | Network proxy/gateway | Envoy, nginx config | ### CapabilityClass (Bitmask) Flags indicating detected capabilities. Multiple flags can be combined. | Flag | Value | Description | |------|-------|-------------| | `None` | 0x0 | No capabilities detected | | `NetworkListen` | 0x1 | Binds to network ports | | `NetworkOutbound` | 0x2 | Makes outbound network requests | | `FileRead` | 0x4 | Reads from filesystem | | `FileWrite` | 0x8 | Writes to filesystem | | `ProcessSpawn` | 0x10 | Spawns child processes | | `DatabaseSql` | 0x20 | SQL database access | | `DatabaseNoSql` | 0x40 | NoSQL database access | | `MessageQueue` | 0x80 | Message queue producer/consumer | | `CacheAccess` | 0x100 | Cache system access (Redis, Memcached) | | `CryptoSign` | 0x200 | Cryptographic signing operations | | `CryptoEncrypt` | 0x400 | Encryption/decryption operations | | `UserInput` | 0x800 | Processes user input | | `SecretAccess` | 0x1000 | Reads secrets/credentials | | `CloudSdk` | 0x2000 | Cloud provider SDK usage | | `ContainerApi` | 0x4000 | Container/orchestration API access | | `SystemCall` | 0x8000 | Direct syscall/FFI usage | ### ThreatVector Represents a potential attack vector. ```typescript interface ThreatVector { type: ThreatVectorType; confidence: number; // 0.0 to 1.0 contributingCapabilities: CapabilityClass; evidence: string[]; cweId?: number; // CWE identifier owaspCategory?: string; // OWASP category } ``` ### ThreatVectorType | Type | CWE | OWASP | Triggered By | |------|-----|-------|--------------| | `SqlInjection` | 89 | A03:Injection | DatabaseSql + UserInput | | `CommandInjection` | 78 | A03:Injection | ProcessSpawn + UserInput | | `PathTraversal` | 22 | A01:Broken Access Control | FileRead/FileWrite + UserInput | | `Ssrf` | 918 | A10:SSRF | NetworkOutbound + UserInput | | `Xss` | 79 | A03:Injection | NetworkListen + UserInput | | `InsecureDeserialization` | 502 | A08:Software and Data Integrity | UserInput + dynamic types | | `SensitiveDataExposure` | 200 | A02:Cryptographic Failures | SecretAccess + NetworkListen | | `BrokenAuthentication` | 287 | A07:Identification and Auth | NetworkListen + SecretAccess | | `InsufficientLogging` | 778 | A09:Logging Failures | NetworkListen without logging | | `CryptoWeakness` | 327 | A02:Cryptographic Failures | CryptoSign/CryptoEncrypt | ### DataFlowBoundary Represents a data flow boundary crossing. ```typescript interface DataFlowBoundary { type: DataFlowBoundaryType; direction: DataFlowDirection; // Inbound | Outbound | Bidirectional sensitivity: DataSensitivity; // Public | Internal | Confidential | Restricted confidence: number; port?: number; // For network boundaries protocol?: string; // http, grpc, amqp, etc. evidence: string[]; } ``` ### DataFlowBoundaryType | Type | Security Sensitive | Description | |------|-------------------|-------------| | `HttpRequest` | Yes | HTTP/HTTPS endpoint | | `GrpcCall` | Yes | gRPC service | | `WebSocket` | Yes | WebSocket connection | | `DatabaseQuery` | Yes | Database queries | | `MessageBroker` | No | Message queue pub/sub | | `FileSystem` | No | File I/O boundary | | `Cache` | No | Cache read/write | | `ExternalApi` | Yes | Third-party API calls | | `CloudService` | Yes | Cloud provider services | ### SemanticConfidence Confidence scoring for semantic analysis. ```typescript interface SemanticConfidence { score: number; // 0.0 to 1.0 tier: ConfidenceTier; reasons: string[]; } enum ConfidenceTier { Unknown = 0, Low = 1, Medium = 2, High = 3, Definitive = 4 } ``` | Tier | Score Range | Description | |------|-------------|-------------| | `Unknown` | 0.0 | No analysis possible | | `Low` | 0.0-0.4 | Heuristic guess only | | `Medium` | 0.4-0.7 | Partial evidence | | `High` | 0.7-0.9 | Strong indicators | | `Definitive` | 0.9-1.0 | Explicit declaration found | --- ## SBOM Property Extensions When semantic data is included in CycloneDX or SPDX SBOMs, the following property namespace is used: ``` stellaops:semantic.* ``` ### Property Names | Property | Type | Description | |----------|------|-------------| | `stellaops:semantic.intent` | string | ApplicationIntent value | | `stellaops:semantic.capabilities` | string | Comma-separated capability names | | `stellaops:semantic.capability.count` | int | Number of detected capabilities | | `stellaops:semantic.threats` | JSON | Array of threat vector summaries | | `stellaops:semantic.threat.count` | int | Number of identified threats | | `stellaops:semantic.risk.score` | float | Overall risk score (0.0-1.0) | | `stellaops:semantic.confidence` | float | Confidence score (0.0-1.0) | | `stellaops:semantic.confidence.tier` | string | Confidence tier name | | `stellaops:semantic.language` | string | Primary language | | `stellaops:semantic.framework` | string | Detected framework | | `stellaops:semantic.framework.version` | string | Framework version | | `stellaops:semantic.boundary.count` | int | Number of data boundaries | | `stellaops:semantic.boundary.sensitive.count` | int | Security-sensitive boundaries | | `stellaops:semantic.owasp.categories` | string | Comma-separated OWASP categories | | `stellaops:semantic.cwe.ids` | string | Comma-separated CWE IDs | --- ## RichGraph Integration Semantic data is attached to `richgraph-v1` nodes via the Attributes dictionary: | Attribute Key | Description | |---------------|-------------| | `semantic_intent` | ApplicationIntent value | | `semantic_capabilities` | Comma-separated capability flags | | `semantic_threats` | Comma-separated threat types | | `semantic_risk_score` | Risk score (formatted to 3 decimal places) | | `semantic_confidence` | Confidence score | | `semantic_confidence_tier` | Confidence tier name | | `semantic_framework` | Framework name | | `semantic_framework_version` | Framework version | | `is_entrypoint` | "true" if node is an entrypoint | | `semantic_boundaries` | JSON array of boundary types | | `owasp_category` | OWASP category if applicable | | `cwe_id` | CWE identifier if applicable | --- ## Language Adapter Support The following language-specific adapters are available: | Language | Adapter | Supported Frameworks | |----------|---------|---------------------| | Python | `PythonSemanticAdapter` | Django, Flask, FastAPI, Celery, Click | | Java | `JavaSemanticAdapter` | Spring Boot, Quarkus, Micronaut, Kafka Streams | | Node.js | `NodeSemanticAdapter` | Express, NestJS, Fastify, Koa | | .NET | `DotNetSemanticAdapter` | ASP.NET Core, Worker Service, Console | | Go | `GoSemanticAdapter` | net/http, Gin, Echo, Cobra, gRPC | --- ## Configuration Semantic analysis is configured via the `Scanner:EntryTrace:Semantic` configuration section: ```yaml Scanner: EntryTrace: Semantic: Enabled: true ThreatConfidenceThreshold: 0.3 MaxThreatVectors: 50 IncludeLowConfidenceCapabilities: false EnabledLanguages: [] # Empty = all languages ``` | Option | Default | Description | |--------|---------|-------------| | `Enabled` | true | Enable semantic analysis | | `ThreatConfidenceThreshold` | 0.3 | Minimum confidence for threat vectors | | `MaxThreatVectors` | 50 | Maximum threats per entrypoint | | `IncludeLowConfidenceCapabilities` | false | Include low-confidence capabilities | | `EnabledLanguages` | [] | Languages to analyze (empty = all) | --- ## Determinism Guarantees All semantic analysis outputs are deterministic: 1. **Capability ordering** - Flags are ordered by value (bitmask position) 2. **Threat vector ordering** - Ordered by ThreatVectorType enum value 3. **Data boundary ordering** - Ordered by (Type, Direction) tuple 4. **Evidence ordering** - Alphabetically sorted within each element 5. **JSON serialization** - Uses camelCase naming, consistent formatting This enables reliable diffing of semantic analysis results across scan runs. --- ## CLI Usage Semantic analysis can be enabled via the CLI `--semantic` flag: ```bash stella scan --semantic docker.io/library/python:3.12 ``` Output includes semantic summary when enabled: ``` Semantic Analysis: Intent: WebServer Framework: flask (v3.0.0) Capabilities: NetworkListen, DatabaseSql, FileRead Threat Vectors: 2 (SqlInjection, Ssrf) Risk Score: 0.72 Confidence: High (0.85) ``` --- ## References - [OWASP Top 10 2021](https://owasp.org/Top10/) - [CWE/SANS Top 25](https://cwe.mitre.org/top25/) - [CycloneDX Property Extensions](https://cyclonedx.org/docs/1.5/json/#properties) - [SPDX 3.0 External Identifiers](https://spdx.github.io/spdx-spec/v3.0/annexes/external-identifier-types/)