Files
git.stella-ops.org/docs/modules/scanner/semantic-entrypoint-schema.md
StellaOps Bot f1a39c4ce3
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
up
2025-12-13 18:08:55 +02:00

11 KiB

Semantic Entrypoint Schema

Part of Sprint 0411 - Semantic Entrypoint Engine (Task 23)

This document defines the schema for semantic entrypoint analysis, which enriches container scan results with application-level intent, capabilities, and threat modeling.


Overview

The Semantic Entrypoint Engine analyzes container entrypoints to infer:

  1. Application Intent - What kind of application is running (web server, worker, CLI, etc.)
  2. Capabilities - What system resources the application accesses (network, filesystem, database, etc.)
  3. Attack Surface - Potential security threat vectors based on capabilities
  4. Data Boundaries - Data flow boundaries with sensitivity classification

This semantic layer enables more precise vulnerability prioritization by understanding which code paths are actually reachable from the entrypoint.


Schema Definitions

SemanticEntrypoint

The root type representing semantic analysis of an entrypoint.

interface SemanticEntrypoint {
  id: string;                        // Unique identifier for this analysis
  specification: EntrypointSpecification;
  intent: ApplicationIntent;
  capabilities: CapabilityClass;     // Bitmask of detected capabilities
  attackSurface: ThreatVector[];
  dataBoundaries: DataFlowBoundary[];
  confidence: SemanticConfidence;
  language?: string;                 // Primary language (python, java, node, dotnet, go)
  framework?: string;                // Detected framework (django, spring-boot, express, etc.)
  frameworkVersion?: string;
  runtimeVersion?: string;
  analyzedAt: string;                // ISO-8601 timestamp
}

ApplicationIntent

Enumeration of application types.

Value Description Common Indicators
Unknown Intent could not be determined Fallback
WebServer HTTP/HTTPS server Flask, Django, Express, ASP.NET Core, Gin
Worker Background job processor Celery, Sidekiq, BackgroundService
CliTool Command-line interface Click, argparse, Cobra, Picocli
Serverless FaaS function Lambda handler, Cloud Functions
StreamProcessor Event stream handler Kafka Streams, Flink
RpcServer RPC/gRPC server gRPC, Thrift
Daemon Long-running service Custom main loops
TestRunner Test execution pytest, JUnit, xunit
BatchJob Scheduled/periodic task Cron-style entry
Proxy Network proxy/gateway Envoy, nginx config

CapabilityClass (Bitmask)

Flags indicating detected capabilities. Multiple flags can be combined.

Flag Value Description
None 0x0 No capabilities detected
NetworkListen 0x1 Binds to network ports
NetworkOutbound 0x2 Makes outbound network requests
FileRead 0x4 Reads from filesystem
FileWrite 0x8 Writes to filesystem
ProcessSpawn 0x10 Spawns child processes
DatabaseSql 0x20 SQL database access
DatabaseNoSql 0x40 NoSQL database access
MessageQueue 0x80 Message queue producer/consumer
CacheAccess 0x100 Cache system access (Redis, Memcached)
CryptoSign 0x200 Cryptographic signing operations
CryptoEncrypt 0x400 Encryption/decryption operations
UserInput 0x800 Processes user input
SecretAccess 0x1000 Reads secrets/credentials
CloudSdk 0x2000 Cloud provider SDK usage
ContainerApi 0x4000 Container/orchestration API access
SystemCall 0x8000 Direct syscall/FFI usage

ThreatVector

Represents a potential attack vector.

interface ThreatVector {
  type: ThreatVectorType;
  confidence: number;                // 0.0 to 1.0
  contributingCapabilities: CapabilityClass;
  evidence: string[];
  cweId?: number;                    // CWE identifier
  owaspCategory?: string;            // OWASP category
}

ThreatVectorType

Type CWE OWASP Triggered By
SqlInjection 89 A03:Injection DatabaseSql + UserInput
CommandInjection 78 A03:Injection ProcessSpawn + UserInput
PathTraversal 22 A01:Broken Access Control FileRead/FileWrite + UserInput
Ssrf 918 A10:SSRF NetworkOutbound + UserInput
Xss 79 A03:Injection NetworkListen + UserInput
InsecureDeserialization 502 A08:Software and Data Integrity UserInput + dynamic types
SensitiveDataExposure 200 A02:Cryptographic Failures SecretAccess + NetworkListen
BrokenAuthentication 287 A07:Identification and Auth NetworkListen + SecretAccess
InsufficientLogging 778 A09:Logging Failures NetworkListen without logging
CryptoWeakness 327 A02:Cryptographic Failures CryptoSign/CryptoEncrypt

DataFlowBoundary

Represents a data flow boundary crossing.

interface DataFlowBoundary {
  type: DataFlowBoundaryType;
  direction: DataFlowDirection;      // Inbound | Outbound | Bidirectional
  sensitivity: DataSensitivity;      // Public | Internal | Confidential | Restricted
  confidence: number;
  port?: number;                     // For network boundaries
  protocol?: string;                 // http, grpc, amqp, etc.
  evidence: string[];
}

DataFlowBoundaryType

Type Security Sensitive Description
HttpRequest Yes HTTP/HTTPS endpoint
GrpcCall Yes gRPC service
WebSocket Yes WebSocket connection
DatabaseQuery Yes Database queries
MessageBroker No Message queue pub/sub
FileSystem No File I/O boundary
Cache No Cache read/write
ExternalApi Yes Third-party API calls
CloudService Yes Cloud provider services

SemanticConfidence

Confidence scoring for semantic analysis.

interface SemanticConfidence {
  score: number;                     // 0.0 to 1.0
  tier: ConfidenceTier;
  reasons: string[];
}

enum ConfidenceTier {
  Unknown = 0,
  Low = 1,
  Medium = 2,
  High = 3,
  Definitive = 4
}
Tier Score Range Description
Unknown 0.0 No analysis possible
Low 0.0-0.4 Heuristic guess only
Medium 0.4-0.7 Partial evidence
High 0.7-0.9 Strong indicators
Definitive 0.9-1.0 Explicit declaration found

SBOM Property Extensions

When semantic data is included in CycloneDX or SPDX SBOMs, the following property namespace is used:

stellaops:semantic.*

Property Names

Property Type Description
stellaops:semantic.intent string ApplicationIntent value
stellaops:semantic.capabilities string Comma-separated capability names
stellaops:semantic.capability.count int Number of detected capabilities
stellaops:semantic.threats JSON Array of threat vector summaries
stellaops:semantic.threat.count int Number of identified threats
stellaops:semantic.risk.score float Overall risk score (0.0-1.0)
stellaops:semantic.confidence float Confidence score (0.0-1.0)
stellaops:semantic.confidence.tier string Confidence tier name
stellaops:semantic.language string Primary language
stellaops:semantic.framework string Detected framework
stellaops:semantic.framework.version string Framework version
stellaops:semantic.boundary.count int Number of data boundaries
stellaops:semantic.boundary.sensitive.count int Security-sensitive boundaries
stellaops:semantic.owasp.categories string Comma-separated OWASP categories
stellaops:semantic.cwe.ids string Comma-separated CWE IDs

RichGraph Integration

Semantic data is attached to richgraph-v1 nodes via the Attributes dictionary:

Attribute Key Description
semantic_intent ApplicationIntent value
semantic_capabilities Comma-separated capability flags
semantic_threats Comma-separated threat types
semantic_risk_score Risk score (formatted to 3 decimal places)
semantic_confidence Confidence score
semantic_confidence_tier Confidence tier name
semantic_framework Framework name
semantic_framework_version Framework version
is_entrypoint "true" if node is an entrypoint
semantic_boundaries JSON array of boundary types
owasp_category OWASP category if applicable
cwe_id CWE identifier if applicable

Language Adapter Support

The following language-specific adapters are available:

Language Adapter Supported Frameworks
Python PythonSemanticAdapter Django, Flask, FastAPI, Celery, Click
Java JavaSemanticAdapter Spring Boot, Quarkus, Micronaut, Kafka Streams
Node.js NodeSemanticAdapter Express, NestJS, Fastify, Koa
.NET DotNetSemanticAdapter ASP.NET Core, Worker Service, Console
Go GoSemanticAdapter net/http, Gin, Echo, Cobra, gRPC

Configuration

Semantic analysis is configured via the Scanner:EntryTrace:Semantic configuration section:

Scanner:
  EntryTrace:
    Semantic:
      Enabled: true
      ThreatConfidenceThreshold: 0.3
      MaxThreatVectors: 50
      IncludeLowConfidenceCapabilities: false
      EnabledLanguages: []  # Empty = all languages
Option Default Description
Enabled true Enable semantic analysis
ThreatConfidenceThreshold 0.3 Minimum confidence for threat vectors
MaxThreatVectors 50 Maximum threats per entrypoint
IncludeLowConfidenceCapabilities false Include low-confidence capabilities
EnabledLanguages [] Languages to analyze (empty = all)

Determinism Guarantees

All semantic analysis outputs are deterministic:

  1. Capability ordering - Flags are ordered by value (bitmask position)
  2. Threat vector ordering - Ordered by ThreatVectorType enum value
  3. Data boundary ordering - Ordered by (Type, Direction) tuple
  4. Evidence ordering - Alphabetically sorted within each element
  5. JSON serialization - Uses camelCase naming, consistent formatting

This enables reliable diffing of semantic analysis results across scan runs.


CLI Usage

Semantic analysis can be enabled via the CLI --semantic flag:

stella scan --semantic docker.io/library/python:3.12

Output includes semantic summary when enabled:

Semantic Analysis:
  Intent: WebServer
  Framework: flask (v3.0.0)
  Capabilities: NetworkListen, DatabaseSql, FileRead
  Threat Vectors: 2 (SqlInjection, Ssrf)
  Risk Score: 0.72
  Confidence: High (0.85)

References