Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
309 lines
11 KiB
Markdown
309 lines
11 KiB
Markdown
# Semantic Entrypoint Schema
|
|
|
|
> Part of Sprint 0411 - Semantic Entrypoint Engine (Task 23)
|
|
|
|
This document defines the schema for semantic entrypoint analysis, which enriches container scan results with application-level intent, capabilities, and threat modeling.
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
The Semantic Entrypoint Engine analyzes container entrypoints to infer:
|
|
|
|
1. **Application Intent** - What kind of application is running (web server, worker, CLI, etc.)
|
|
2. **Capabilities** - What system resources the application accesses (network, filesystem, database, etc.)
|
|
3. **Attack Surface** - Potential security threat vectors based on capabilities
|
|
4. **Data Boundaries** - Data flow boundaries with sensitivity classification
|
|
|
|
This semantic layer enables more precise vulnerability prioritization by understanding which code paths are actually reachable from the entrypoint.
|
|
|
|
---
|
|
|
|
## Schema Definitions
|
|
|
|
### SemanticEntrypoint
|
|
|
|
The root type representing semantic analysis of an entrypoint.
|
|
|
|
```typescript
|
|
interface SemanticEntrypoint {
|
|
id: string; // Unique identifier for this analysis
|
|
specification: EntrypointSpecification;
|
|
intent: ApplicationIntent;
|
|
capabilities: CapabilityClass; // Bitmask of detected capabilities
|
|
attackSurface: ThreatVector[];
|
|
dataBoundaries: DataFlowBoundary[];
|
|
confidence: SemanticConfidence;
|
|
language?: string; // Primary language (python, java, node, dotnet, go)
|
|
framework?: string; // Detected framework (django, spring-boot, express, etc.)
|
|
frameworkVersion?: string;
|
|
runtimeVersion?: string;
|
|
analyzedAt: string; // ISO-8601 timestamp
|
|
}
|
|
```
|
|
|
|
### ApplicationIntent
|
|
|
|
Enumeration of application types.
|
|
|
|
| Value | Description | Common Indicators |
|
|
|-------|-------------|-------------------|
|
|
| `Unknown` | Intent could not be determined | Fallback |
|
|
| `WebServer` | HTTP/HTTPS server | Flask, Django, Express, ASP.NET Core, Gin |
|
|
| `Worker` | Background job processor | Celery, Sidekiq, BackgroundService |
|
|
| `CliTool` | Command-line interface | Click, argparse, Cobra, Picocli |
|
|
| `Serverless` | FaaS function | Lambda handler, Cloud Functions |
|
|
| `StreamProcessor` | Event stream handler | Kafka Streams, Flink |
|
|
| `RpcServer` | RPC/gRPC server | gRPC, Thrift |
|
|
| `Daemon` | Long-running service | Custom main loops |
|
|
| `TestRunner` | Test execution | pytest, JUnit, xunit |
|
|
| `BatchJob` | Scheduled/periodic task | Cron-style entry |
|
|
| `Proxy` | Network proxy/gateway | Envoy, nginx config |
|
|
|
|
### CapabilityClass (Bitmask)
|
|
|
|
Flags indicating detected capabilities. Multiple flags can be combined.
|
|
|
|
| Flag | Value | Description |
|
|
|------|-------|-------------|
|
|
| `None` | 0x0 | No capabilities detected |
|
|
| `NetworkListen` | 0x1 | Binds to network ports |
|
|
| `NetworkOutbound` | 0x2 | Makes outbound network requests |
|
|
| `FileRead` | 0x4 | Reads from filesystem |
|
|
| `FileWrite` | 0x8 | Writes to filesystem |
|
|
| `ProcessSpawn` | 0x10 | Spawns child processes |
|
|
| `DatabaseSql` | 0x20 | SQL database access |
|
|
| `DatabaseNoSql` | 0x40 | NoSQL database access |
|
|
| `MessageQueue` | 0x80 | Message queue producer/consumer |
|
|
| `CacheAccess` | 0x100 | Cache system access (Redis, Memcached) |
|
|
| `CryptoSign` | 0x200 | Cryptographic signing operations |
|
|
| `CryptoEncrypt` | 0x400 | Encryption/decryption operations |
|
|
| `UserInput` | 0x800 | Processes user input |
|
|
| `SecretAccess` | 0x1000 | Reads secrets/credentials |
|
|
| `CloudSdk` | 0x2000 | Cloud provider SDK usage |
|
|
| `ContainerApi` | 0x4000 | Container/orchestration API access |
|
|
| `SystemCall` | 0x8000 | Direct syscall/FFI usage |
|
|
|
|
### ThreatVector
|
|
|
|
Represents a potential attack vector.
|
|
|
|
```typescript
|
|
interface ThreatVector {
|
|
type: ThreatVectorType;
|
|
confidence: number; // 0.0 to 1.0
|
|
contributingCapabilities: CapabilityClass;
|
|
evidence: string[];
|
|
cweId?: number; // CWE identifier
|
|
owaspCategory?: string; // OWASP category
|
|
}
|
|
```
|
|
|
|
### ThreatVectorType
|
|
|
|
| Type | CWE | OWASP | Triggered By |
|
|
|------|-----|-------|--------------|
|
|
| `SqlInjection` | 89 | A03:Injection | DatabaseSql + UserInput |
|
|
| `CommandInjection` | 78 | A03:Injection | ProcessSpawn + UserInput |
|
|
| `PathTraversal` | 22 | A01:Broken Access Control | FileRead/FileWrite + UserInput |
|
|
| `Ssrf` | 918 | A10:SSRF | NetworkOutbound + UserInput |
|
|
| `Xss` | 79 | A03:Injection | NetworkListen + UserInput |
|
|
| `InsecureDeserialization` | 502 | A08:Software and Data Integrity | UserInput + dynamic types |
|
|
| `SensitiveDataExposure` | 200 | A02:Cryptographic Failures | SecretAccess + NetworkListen |
|
|
| `BrokenAuthentication` | 287 | A07:Identification and Auth | NetworkListen + SecretAccess |
|
|
| `InsufficientLogging` | 778 | A09:Logging Failures | NetworkListen without logging |
|
|
| `CryptoWeakness` | 327 | A02:Cryptographic Failures | CryptoSign/CryptoEncrypt |
|
|
|
|
### DataFlowBoundary
|
|
|
|
Represents a data flow boundary crossing.
|
|
|
|
```typescript
|
|
interface DataFlowBoundary {
|
|
type: DataFlowBoundaryType;
|
|
direction: DataFlowDirection; // Inbound | Outbound | Bidirectional
|
|
sensitivity: DataSensitivity; // Public | Internal | Confidential | Restricted
|
|
confidence: number;
|
|
port?: number; // For network boundaries
|
|
protocol?: string; // http, grpc, amqp, etc.
|
|
evidence: string[];
|
|
}
|
|
```
|
|
|
|
### DataFlowBoundaryType
|
|
|
|
| Type | Security Sensitive | Description |
|
|
|------|-------------------|-------------|
|
|
| `HttpRequest` | Yes | HTTP/HTTPS endpoint |
|
|
| `GrpcCall` | Yes | gRPC service |
|
|
| `WebSocket` | Yes | WebSocket connection |
|
|
| `DatabaseQuery` | Yes | Database queries |
|
|
| `MessageBroker` | No | Message queue pub/sub |
|
|
| `FileSystem` | No | File I/O boundary |
|
|
| `Cache` | No | Cache read/write |
|
|
| `ExternalApi` | Yes | Third-party API calls |
|
|
| `CloudService` | Yes | Cloud provider services |
|
|
|
|
### SemanticConfidence
|
|
|
|
Confidence scoring for semantic analysis.
|
|
|
|
```typescript
|
|
interface SemanticConfidence {
|
|
score: number; // 0.0 to 1.0
|
|
tier: ConfidenceTier;
|
|
reasons: string[];
|
|
}
|
|
|
|
enum ConfidenceTier {
|
|
Unknown = 0,
|
|
Low = 1,
|
|
Medium = 2,
|
|
High = 3,
|
|
Definitive = 4
|
|
}
|
|
```
|
|
|
|
| Tier | Score Range | Description |
|
|
|------|-------------|-------------|
|
|
| `Unknown` | 0.0 | No analysis possible |
|
|
| `Low` | 0.0-0.4 | Heuristic guess only |
|
|
| `Medium` | 0.4-0.7 | Partial evidence |
|
|
| `High` | 0.7-0.9 | Strong indicators |
|
|
| `Definitive` | 0.9-1.0 | Explicit declaration found |
|
|
|
|
---
|
|
|
|
## SBOM Property Extensions
|
|
|
|
When semantic data is included in CycloneDX or SPDX SBOMs, the following property namespace is used:
|
|
|
|
```
|
|
stellaops:semantic.*
|
|
```
|
|
|
|
### Property Names
|
|
|
|
| Property | Type | Description |
|
|
|----------|------|-------------|
|
|
| `stellaops:semantic.intent` | string | ApplicationIntent value |
|
|
| `stellaops:semantic.capabilities` | string | Comma-separated capability names |
|
|
| `stellaops:semantic.capability.count` | int | Number of detected capabilities |
|
|
| `stellaops:semantic.threats` | JSON | Array of threat vector summaries |
|
|
| `stellaops:semantic.threat.count` | int | Number of identified threats |
|
|
| `stellaops:semantic.risk.score` | float | Overall risk score (0.0-1.0) |
|
|
| `stellaops:semantic.confidence` | float | Confidence score (0.0-1.0) |
|
|
| `stellaops:semantic.confidence.tier` | string | Confidence tier name |
|
|
| `stellaops:semantic.language` | string | Primary language |
|
|
| `stellaops:semantic.framework` | string | Detected framework |
|
|
| `stellaops:semantic.framework.version` | string | Framework version |
|
|
| `stellaops:semantic.boundary.count` | int | Number of data boundaries |
|
|
| `stellaops:semantic.boundary.sensitive.count` | int | Security-sensitive boundaries |
|
|
| `stellaops:semantic.owasp.categories` | string | Comma-separated OWASP categories |
|
|
| `stellaops:semantic.cwe.ids` | string | Comma-separated CWE IDs |
|
|
|
|
---
|
|
|
|
## RichGraph Integration
|
|
|
|
Semantic data is attached to `richgraph-v1` nodes via the Attributes dictionary:
|
|
|
|
| Attribute Key | Description |
|
|
|---------------|-------------|
|
|
| `semantic_intent` | ApplicationIntent value |
|
|
| `semantic_capabilities` | Comma-separated capability flags |
|
|
| `semantic_threats` | Comma-separated threat types |
|
|
| `semantic_risk_score` | Risk score (formatted to 3 decimal places) |
|
|
| `semantic_confidence` | Confidence score |
|
|
| `semantic_confidence_tier` | Confidence tier name |
|
|
| `semantic_framework` | Framework name |
|
|
| `semantic_framework_version` | Framework version |
|
|
| `is_entrypoint` | "true" if node is an entrypoint |
|
|
| `semantic_boundaries` | JSON array of boundary types |
|
|
| `owasp_category` | OWASP category if applicable |
|
|
| `cwe_id` | CWE identifier if applicable |
|
|
|
|
---
|
|
|
|
## Language Adapter Support
|
|
|
|
The following language-specific adapters are available:
|
|
|
|
| Language | Adapter | Supported Frameworks |
|
|
|----------|---------|---------------------|
|
|
| Python | `PythonSemanticAdapter` | Django, Flask, FastAPI, Celery, Click |
|
|
| Java | `JavaSemanticAdapter` | Spring Boot, Quarkus, Micronaut, Kafka Streams |
|
|
| Node.js | `NodeSemanticAdapter` | Express, NestJS, Fastify, Koa |
|
|
| .NET | `DotNetSemanticAdapter` | ASP.NET Core, Worker Service, Console |
|
|
| Go | `GoSemanticAdapter` | net/http, Gin, Echo, Cobra, gRPC |
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
Semantic analysis is configured via the `Scanner:EntryTrace:Semantic` configuration section:
|
|
|
|
```yaml
|
|
Scanner:
|
|
EntryTrace:
|
|
Semantic:
|
|
Enabled: true
|
|
ThreatConfidenceThreshold: 0.3
|
|
MaxThreatVectors: 50
|
|
IncludeLowConfidenceCapabilities: false
|
|
EnabledLanguages: [] # Empty = all languages
|
|
```
|
|
|
|
| Option | Default | Description |
|
|
|--------|---------|-------------|
|
|
| `Enabled` | true | Enable semantic analysis |
|
|
| `ThreatConfidenceThreshold` | 0.3 | Minimum confidence for threat vectors |
|
|
| `MaxThreatVectors` | 50 | Maximum threats per entrypoint |
|
|
| `IncludeLowConfidenceCapabilities` | false | Include low-confidence capabilities |
|
|
| `EnabledLanguages` | [] | Languages to analyze (empty = all) |
|
|
|
|
---
|
|
|
|
## Determinism Guarantees
|
|
|
|
All semantic analysis outputs are deterministic:
|
|
|
|
1. **Capability ordering** - Flags are ordered by value (bitmask position)
|
|
2. **Threat vector ordering** - Ordered by ThreatVectorType enum value
|
|
3. **Data boundary ordering** - Ordered by (Type, Direction) tuple
|
|
4. **Evidence ordering** - Alphabetically sorted within each element
|
|
5. **JSON serialization** - Uses camelCase naming, consistent formatting
|
|
|
|
This enables reliable diffing of semantic analysis results across scan runs.
|
|
|
|
---
|
|
|
|
## CLI Usage
|
|
|
|
Semantic analysis can be enabled via the CLI `--semantic` flag:
|
|
|
|
```bash
|
|
stella scan --semantic docker.io/library/python:3.12
|
|
```
|
|
|
|
Output includes semantic summary when enabled:
|
|
|
|
```
|
|
Semantic Analysis:
|
|
Intent: WebServer
|
|
Framework: flask (v3.0.0)
|
|
Capabilities: NetworkListen, DatabaseSql, FileRead
|
|
Threat Vectors: 2 (SqlInjection, Ssrf)
|
|
Risk Score: 0.72
|
|
Confidence: High (0.85)
|
|
```
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [OWASP Top 10 2021](https://owasp.org/Top10/)
|
|
- [CWE/SANS Top 25](https://cwe.mitre.org/top25/)
|
|
- [CycloneDX Property Extensions](https://cyclonedx.org/docs/1.5/json/#properties)
|
|
- [SPDX 3.0 External Identifiers](https://spdx.github.io/spdx-spec/v3.0/annexes/external-identifier-types/)
|