stella-ops.org/git.stella-ops.org

Fork 0

Files

StellaOps Bot f1a39c4ce3

AOC Guard CI / aoc-guard (push) Has been cancelled

Details

AOC Guard CI / aoc-verify (push) Has been cancelled

Details

Docs CI / lint-and-preview (push) Has been cancelled

Details

Notify Smoke Test / Notify Unit Tests (push) Has been cancelled

Details

Notify Smoke Test / Notifier Service Tests (push) Has been cancelled

Details

Notify Smoke Test / Notification Smoke Test (push) Has been cancelled

Details

Policy Lint & Smoke / policy-lint (push) Has been cancelled

Details

Scanner Analyzers / Discover Analyzers (push) Has been cancelled

Details

Scanner Analyzers / Build Analyzers (push) Has been cancelled

Details

Scanner Analyzers / Test Language Analyzers (push) Has been cancelled

Details

Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled

Details

Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled

Details

Signals CI & Image / signals-ci (push) Has been cancelled

Details

Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled

Details

Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled

Details

Manifest Integrity / Validate Schema Integrity (push) Has been cancelled

Details

Manifest Integrity / Validate Contract Documents (push) Has been cancelled

Details

Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled

Details

Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled

Details

Manifest Integrity / Verify Merkle Roots (push) Has been cancelled

Details

devportal-offline / build-offline (push) Has been cancelled

Details

Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled

Details

2025-12-13 18:08:55 +02:00

11 KiB

Raw Blame History

Semantic Entrypoint Schema

Part of Sprint 0411 - Semantic Entrypoint Engine (Task 23)

This document defines the schema for semantic entrypoint analysis, which enriches container scan results with application-level intent, capabilities, and threat modeling.

Overview

The Semantic Entrypoint Engine analyzes container entrypoints to infer:

Application Intent - What kind of application is running (web server, worker, CLI, etc.)
Capabilities - What system resources the application accesses (network, filesystem, database, etc.)
Attack Surface - Potential security threat vectors based on capabilities
Data Boundaries - Data flow boundaries with sensitivity classification

This semantic layer enables more precise vulnerability prioritization by understanding which code paths are actually reachable from the entrypoint.

Schema Definitions

SemanticEntrypoint

The root type representing semantic analysis of an entrypoint.

interface SemanticEntrypoint {
  id: string;                        // Unique identifier for this analysis
  specification: EntrypointSpecification;
  intent: ApplicationIntent;
  capabilities: CapabilityClass;     // Bitmask of detected capabilities
  attackSurface: ThreatVector[];
  dataBoundaries: DataFlowBoundary[];
  confidence: SemanticConfidence;
  language?: string;                 // Primary language (python, java, node, dotnet, go)
  framework?: string;                // Detected framework (django, spring-boot, express, etc.)
  frameworkVersion?: string;
  runtimeVersion?: string;
  analyzedAt: string;                // ISO-8601 timestamp
}

ApplicationIntent

Enumeration of application types.

Value	Description	Common Indicators
`Unknown`	Intent could not be determined	Fallback
`WebServer`	HTTP/HTTPS server	Flask, Django, Express, ASP.NET Core, Gin
`Worker`	Background job processor	Celery, Sidekiq, BackgroundService
`CliTool`	Command-line interface	Click, argparse, Cobra, Picocli
`Serverless`	FaaS function	Lambda handler, Cloud Functions
`StreamProcessor`	Event stream handler	Kafka Streams, Flink
`RpcServer`	RPC/gRPC server	gRPC, Thrift
`Daemon`	Long-running service	Custom main loops
`TestRunner`	Test execution	pytest, JUnit, xunit
`BatchJob`	Scheduled/periodic task	Cron-style entry
`Proxy`	Network proxy/gateway	Envoy, nginx config

CapabilityClass (Bitmask)

Flags indicating detected capabilities. Multiple flags can be combined.

Flag	Value	Description
`None`	0x0	No capabilities detected
`NetworkListen`	0x1	Binds to network ports
`NetworkOutbound`	0x2	Makes outbound network requests
`FileRead`	0x4	Reads from filesystem
`FileWrite`	0x8	Writes to filesystem
`ProcessSpawn`	0x10	Spawns child processes
`DatabaseSql`	0x20	SQL database access
`DatabaseNoSql`	0x40	NoSQL database access
`MessageQueue`	0x80	Message queue producer/consumer
`CacheAccess`	0x100	Cache system access (Redis, Memcached)
`CryptoSign`	0x200	Cryptographic signing operations
`CryptoEncrypt`	0x400	Encryption/decryption operations
`UserInput`	0x800	Processes user input
`SecretAccess`	0x1000	Reads secrets/credentials
`CloudSdk`	0x2000	Cloud provider SDK usage
`ContainerApi`	0x4000	Container/orchestration API access
`SystemCall`	0x8000	Direct syscall/FFI usage

ThreatVector

Represents a potential attack vector.

interface ThreatVector {
  type: ThreatVectorType;
  confidence: number;                // 0.0 to 1.0
  contributingCapabilities: CapabilityClass;
  evidence: string[];
  cweId?: number;                    // CWE identifier
  owaspCategory?: string;            // OWASP category
}

ThreatVectorType

Type	CWE	OWASP	Triggered By
`SqlInjection`	89	A03:Injection	DatabaseSql + UserInput
`CommandInjection`	78	A03:Injection	ProcessSpawn + UserInput
`PathTraversal`	22	A01:Broken Access Control	FileRead/FileWrite + UserInput
`Ssrf`	918	A10:SSRF	NetworkOutbound + UserInput
`Xss`	79	A03:Injection	NetworkListen + UserInput
`InsecureDeserialization`	502	A08:Software and Data Integrity	UserInput + dynamic types
`SensitiveDataExposure`	200	A02:Cryptographic Failures	SecretAccess + NetworkListen
`BrokenAuthentication`	287	A07:Identification and Auth	NetworkListen + SecretAccess
`InsufficientLogging`	778	A09:Logging Failures	NetworkListen without logging
`CryptoWeakness`	327	A02:Cryptographic Failures	CryptoSign/CryptoEncrypt

DataFlowBoundary

Represents a data flow boundary crossing.

interface DataFlowBoundary {
  type: DataFlowBoundaryType;
  direction: DataFlowDirection;      // Inbound | Outbound | Bidirectional
  sensitivity: DataSensitivity;      // Public | Internal | Confidential | Restricted
  confidence: number;
  port?: number;                     // For network boundaries
  protocol?: string;                 // http, grpc, amqp, etc.
  evidence: string[];
}

DataFlowBoundaryType

Type	Security Sensitive	Description
`HttpRequest`	Yes	HTTP/HTTPS endpoint
`GrpcCall`	Yes	gRPC service
`WebSocket`	Yes	WebSocket connection
`DatabaseQuery`	Yes	Database queries
`MessageBroker`	No	Message queue pub/sub
`FileSystem`	No	File I/O boundary
`Cache`	No	Cache read/write
`ExternalApi`	Yes	Third-party API calls
`CloudService`	Yes	Cloud provider services

SemanticConfidence

Confidence scoring for semantic analysis.

interface SemanticConfidence {
  score: number;                     // 0.0 to 1.0
  tier: ConfidenceTier;
  reasons: string[];
}

enum ConfidenceTier {
  Unknown = 0,
  Low = 1,
  Medium = 2,
  High = 3,
  Definitive = 4
}

Tier	Score Range	Description
`Unknown`	0.0	No analysis possible
`Low`	0.0-0.4	Heuristic guess only
`Medium`	0.4-0.7	Partial evidence
`High`	0.7-0.9	Strong indicators
`Definitive`	0.9-1.0	Explicit declaration found

SBOM Property Extensions

When semantic data is included in CycloneDX or SPDX SBOMs, the following property namespace is used:

stellaops:semantic.*

Property Names

Property	Type	Description
`stellaops:semantic.intent`	string	ApplicationIntent value
`stellaops:semantic.capabilities`	string	Comma-separated capability names
`stellaops:semantic.capability.count`	int	Number of detected capabilities
`stellaops:semantic.threats`	JSON	Array of threat vector summaries
`stellaops:semantic.threat.count`	int	Number of identified threats
`stellaops:semantic.risk.score`	float	Overall risk score (0.0-1.0)
`stellaops:semantic.confidence`	float	Confidence score (0.0-1.0)
`stellaops:semantic.confidence.tier`	string	Confidence tier name
`stellaops:semantic.language`	string	Primary language
`stellaops:semantic.framework`	string	Detected framework
`stellaops:semantic.framework.version`	string	Framework version
`stellaops:semantic.boundary.count`	int	Number of data boundaries
`stellaops:semantic.boundary.sensitive.count`	int	Security-sensitive boundaries
`stellaops:semantic.owasp.categories`	string	Comma-separated OWASP categories
`stellaops:semantic.cwe.ids`	string	Comma-separated CWE IDs

RichGraph Integration

Semantic data is attached to richgraph-v1 nodes via the Attributes dictionary:

Attribute Key	Description
`semantic_intent`	ApplicationIntent value
`semantic_capabilities`	Comma-separated capability flags
`semantic_threats`	Comma-separated threat types
`semantic_risk_score`	Risk score (formatted to 3 decimal places)
`semantic_confidence`	Confidence score
`semantic_confidence_tier`	Confidence tier name
`semantic_framework`	Framework name
`semantic_framework_version`	Framework version
`is_entrypoint`	"true" if node is an entrypoint
`semantic_boundaries`	JSON array of boundary types
`owasp_category`	OWASP category if applicable
`cwe_id`	CWE identifier if applicable

Language Adapter Support

The following language-specific adapters are available:

Language	Adapter	Supported Frameworks
Python	`PythonSemanticAdapter`	Django, Flask, FastAPI, Celery, Click
Java	`JavaSemanticAdapter`	Spring Boot, Quarkus, Micronaut, Kafka Streams
Node.js	`NodeSemanticAdapter`	Express, NestJS, Fastify, Koa
.NET	`DotNetSemanticAdapter`	ASP.NET Core, Worker Service, Console
Go	`GoSemanticAdapter`	net/http, Gin, Echo, Cobra, gRPC

Configuration

Semantic analysis is configured via the Scanner:EntryTrace:Semantic configuration section:

Scanner:
  EntryTrace:
    Semantic:
      Enabled: true
      ThreatConfidenceThreshold: 0.3
      MaxThreatVectors: 50
      IncludeLowConfidenceCapabilities: false
      EnabledLanguages: []  # Empty = all languages

Option	Default	Description
`Enabled`	true	Enable semantic analysis
`ThreatConfidenceThreshold`	0.3	Minimum confidence for threat vectors
`MaxThreatVectors`	50	Maximum threats per entrypoint
`IncludeLowConfidenceCapabilities`	false	Include low-confidence capabilities
`EnabledLanguages`	[]	Languages to analyze (empty = all)

Determinism Guarantees

All semantic analysis outputs are deterministic:

Capability ordering - Flags are ordered by value (bitmask position)
Threat vector ordering - Ordered by ThreatVectorType enum value
Data boundary ordering - Ordered by (Type, Direction) tuple
Evidence ordering - Alphabetically sorted within each element
JSON serialization - Uses camelCase naming, consistent formatting

This enables reliable diffing of semantic analysis results across scan runs.

CLI Usage

Semantic analysis can be enabled via the CLI --semantic flag:

stella scan --semantic docker.io/library/python:3.12

Output includes semantic summary when enabled:

Semantic Analysis:
  Intent: WebServer
  Framework: flask (v3.0.0)
  Capabilities: NetworkListen, DatabaseSql, FileRead
  Threat Vectors: 2 (SqlInjection, Ssrf)
  Risk Score: 0.72
  Confidence: High (0.85)

11 KiB Raw Blame History

Semantic Entrypoint Schema

Overview

Schema Definitions

SemanticEntrypoint

ApplicationIntent

CapabilityClass (Bitmask)

ThreatVector

ThreatVectorType

DataFlowBoundary

DataFlowBoundaryType

SemanticConfidence

SBOM Property Extensions

Property Names

RichGraph Integration

Language Adapter Support

Configuration

Determinism Guarantees

CLI Usage

References

11 KiB

Raw Blame History