11 KiB
Semantic Entrypoint Schema
Part of Sprint 0411 - Semantic Entrypoint Engine (Task 23)
This document defines the schema for semantic entrypoint analysis, which enriches container scan results with application-level intent, capabilities, and threat modeling.
Overview
The Semantic Entrypoint Engine analyzes container entrypoints to infer:
- Application Intent - What kind of application is running (web server, worker, CLI, etc.)
- Capabilities - What system resources the application accesses (network, filesystem, database, etc.)
- Attack Surface - Potential security threat vectors based on capabilities
- Data Boundaries - Data flow boundaries with sensitivity classification
This semantic layer enables more precise vulnerability prioritization by understanding which code paths are actually reachable from the entrypoint.
Schema Definitions
SemanticEntrypoint
The root type representing semantic analysis of an entrypoint.
interface SemanticEntrypoint {
id: string; // Unique identifier for this analysis
specification: EntrypointSpecification;
intent: ApplicationIntent;
capabilities: CapabilityClass; // Bitmask of detected capabilities
attackSurface: ThreatVector[];
dataBoundaries: DataFlowBoundary[];
confidence: SemanticConfidence;
language?: string; // Primary language (python, java, node, dotnet, go)
framework?: string; // Detected framework (django, spring-boot, express, etc.)
frameworkVersion?: string;
runtimeVersion?: string;
analyzedAt: string; // ISO-8601 timestamp
}
ApplicationIntent
Enumeration of application types.
| Value | Description | Common Indicators |
|---|---|---|
Unknown |
Intent could not be determined | Fallback |
WebServer |
HTTP/HTTPS server | Flask, Django, Express, ASP.NET Core, Gin |
Worker |
Background job processor | Celery, Sidekiq, BackgroundService |
CliTool |
Command-line interface | Click, argparse, Cobra, Picocli |
Serverless |
FaaS function | Lambda handler, Cloud Functions |
StreamProcessor |
Event stream handler | Kafka Streams, Flink |
RpcServer |
RPC/gRPC server | gRPC, Thrift |
Daemon |
Long-running service | Custom main loops |
TestRunner |
Test execution | pytest, JUnit, xunit |
BatchJob |
Scheduled/periodic task | Cron-style entry |
Proxy |
Network proxy/gateway | Envoy, nginx config |
CapabilityClass (Bitmask)
Flags indicating detected capabilities. Multiple flags can be combined.
| Flag | Value | Description |
|---|---|---|
None |
0x0 | No capabilities detected |
NetworkListen |
0x1 | Binds to network ports |
NetworkOutbound |
0x2 | Makes outbound network requests |
FileRead |
0x4 | Reads from filesystem |
FileWrite |
0x8 | Writes to filesystem |
ProcessSpawn |
0x10 | Spawns child processes |
DatabaseSql |
0x20 | SQL database access |
DatabaseNoSql |
0x40 | NoSQL database access |
MessageQueue |
0x80 | Message queue producer/consumer |
CacheAccess |
0x100 | Cache system access (Redis, Memcached) |
CryptoSign |
0x200 | Cryptographic signing operations |
CryptoEncrypt |
0x400 | Encryption/decryption operations |
UserInput |
0x800 | Processes user input |
SecretAccess |
0x1000 | Reads secrets/credentials |
CloudSdk |
0x2000 | Cloud provider SDK usage |
ContainerApi |
0x4000 | Container/orchestration API access |
SystemCall |
0x8000 | Direct syscall/FFI usage |
ThreatVector
Represents a potential attack vector.
interface ThreatVector {
type: ThreatVectorType;
confidence: number; // 0.0 to 1.0
contributingCapabilities: CapabilityClass;
evidence: string[];
cweId?: number; // CWE identifier
owaspCategory?: string; // OWASP category
}
ThreatVectorType
| Type | CWE | OWASP | Triggered By |
|---|---|---|---|
SqlInjection |
89 | A03:Injection | DatabaseSql + UserInput |
CommandInjection |
78 | A03:Injection | ProcessSpawn + UserInput |
PathTraversal |
22 | A01:Broken Access Control | FileRead/FileWrite + UserInput |
Ssrf |
918 | A10:SSRF | NetworkOutbound + UserInput |
Xss |
79 | A03:Injection | NetworkListen + UserInput |
InsecureDeserialization |
502 | A08:Software and Data Integrity | UserInput + dynamic types |
SensitiveDataExposure |
200 | A02:Cryptographic Failures | SecretAccess + NetworkListen |
BrokenAuthentication |
287 | A07:Identification and Auth | NetworkListen + SecretAccess |
InsufficientLogging |
778 | A09:Logging Failures | NetworkListen without logging |
CryptoWeakness |
327 | A02:Cryptographic Failures | CryptoSign/CryptoEncrypt |
DataFlowBoundary
Represents a data flow boundary crossing.
interface DataFlowBoundary {
type: DataFlowBoundaryType;
direction: DataFlowDirection; // Inbound | Outbound | Bidirectional
sensitivity: DataSensitivity; // Public | Internal | Confidential | Restricted
confidence: number;
port?: number; // For network boundaries
protocol?: string; // http, grpc, amqp, etc.
evidence: string[];
}
DataFlowBoundaryType
| Type | Security Sensitive | Description |
|---|---|---|
HttpRequest |
Yes | HTTP/HTTPS endpoint |
GrpcCall |
Yes | gRPC service |
WebSocket |
Yes | WebSocket connection |
DatabaseQuery |
Yes | Database queries |
MessageBroker |
No | Message queue pub/sub |
FileSystem |
No | File I/O boundary |
Cache |
No | Cache read/write |
ExternalApi |
Yes | Third-party API calls |
CloudService |
Yes | Cloud provider services |
SemanticConfidence
Confidence scoring for semantic analysis.
interface SemanticConfidence {
score: number; // 0.0 to 1.0
tier: ConfidenceTier;
reasons: string[];
}
enum ConfidenceTier {
Unknown = 0,
Low = 1,
Medium = 2,
High = 3,
Definitive = 4
}
| Tier | Score Range | Description |
|---|---|---|
Unknown |
0.0 | No analysis possible |
Low |
0.0-0.4 | Heuristic guess only |
Medium |
0.4-0.7 | Partial evidence |
High |
0.7-0.9 | Strong indicators |
Definitive |
0.9-1.0 | Explicit declaration found |
SBOM Property Extensions
When semantic data is included in CycloneDX or SPDX SBOMs, the following property namespace is used:
stellaops:semantic.*
Property Names
| Property | Type | Description |
|---|---|---|
stellaops:semantic.intent |
string | ApplicationIntent value |
stellaops:semantic.capabilities |
string | Comma-separated capability names |
stellaops:semantic.capability.count |
int | Number of detected capabilities |
stellaops:semantic.threats |
JSON | Array of threat vector summaries |
stellaops:semantic.threat.count |
int | Number of identified threats |
stellaops:semantic.risk.score |
float | Overall risk score (0.0-1.0) |
stellaops:semantic.confidence |
float | Confidence score (0.0-1.0) |
stellaops:semantic.confidence.tier |
string | Confidence tier name |
stellaops:semantic.language |
string | Primary language |
stellaops:semantic.framework |
string | Detected framework |
stellaops:semantic.framework.version |
string | Framework version |
stellaops:semantic.boundary.count |
int | Number of data boundaries |
stellaops:semantic.boundary.sensitive.count |
int | Security-sensitive boundaries |
stellaops:semantic.owasp.categories |
string | Comma-separated OWASP categories |
stellaops:semantic.cwe.ids |
string | Comma-separated CWE IDs |
RichGraph Integration
Semantic data is attached to richgraph-v1 nodes via the Attributes dictionary:
| Attribute Key | Description |
|---|---|
semantic_intent |
ApplicationIntent value |
semantic_capabilities |
Comma-separated capability flags |
semantic_threats |
Comma-separated threat types |
semantic_risk_score |
Risk score (formatted to 3 decimal places) |
semantic_confidence |
Confidence score |
semantic_confidence_tier |
Confidence tier name |
semantic_framework |
Framework name |
semantic_framework_version |
Framework version |
is_entrypoint |
"true" if node is an entrypoint |
semantic_boundaries |
JSON array of boundary types |
owasp_category |
OWASP category if applicable |
cwe_id |
CWE identifier if applicable |
Language Adapter Support
The following language-specific adapters are available:
| Language | Adapter | Supported Frameworks |
|---|---|---|
| Python | PythonSemanticAdapter |
Django, Flask, FastAPI, Celery, Click |
| Java | JavaSemanticAdapter |
Spring Boot, Quarkus, Micronaut, Kafka Streams |
| Node.js | NodeSemanticAdapter |
Express, NestJS, Fastify, Koa |
| .NET | DotNetSemanticAdapter |
ASP.NET Core, Worker Service, Console |
| Go | GoSemanticAdapter |
net/http, Gin, Echo, Cobra, gRPC |
Configuration
Semantic analysis is configured via the Scanner:EntryTrace:Semantic configuration section:
Scanner:
EntryTrace:
Semantic:
Enabled: true
ThreatConfidenceThreshold: 0.3
MaxThreatVectors: 50
IncludeLowConfidenceCapabilities: false
EnabledLanguages: [] # Empty = all languages
| Option | Default | Description |
|---|---|---|
Enabled |
true | Enable semantic analysis |
ThreatConfidenceThreshold |
0.3 | Minimum confidence for threat vectors |
MaxThreatVectors |
50 | Maximum threats per entrypoint |
IncludeLowConfidenceCapabilities |
false | Include low-confidence capabilities |
EnabledLanguages |
[] | Languages to analyze (empty = all) |
Determinism Guarantees
All semantic analysis outputs are deterministic:
- Capability ordering - Flags are ordered by value (bitmask position)
- Threat vector ordering - Ordered by ThreatVectorType enum value
- Data boundary ordering - Ordered by (Type, Direction) tuple
- Evidence ordering - Alphabetically sorted within each element
- JSON serialization - Uses camelCase naming, consistent formatting
This enables reliable diffing of semantic analysis results across scan runs.
CLI Usage
Semantic analysis can be enabled via the CLI --semantic flag:
stella scan --semantic docker.io/library/python:3.12
Output includes semantic summary when enabled:
Semantic Analysis:
Intent: WebServer
Framework: flask (v3.0.0)
Capabilities: NetworkListen, DatabaseSql, FileRead
Threat Vectors: 2 (SqlInjection, Ssrf)
Risk Score: 0.72
Confidence: High (0.85)