Files
git.stella-ops.org/docs/modules/scanner/operations/entrypoint-semantic.md
StellaOps Bot f1a39c4ce3
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
up
2025-12-13 18:08:55 +02:00

9.6 KiB

Semantic Entrypoint Analysis

Part of Sprint 0411 - Semantic Entrypoint Engine

Overview

The Semantic Entrypoint Engine provides deep understanding of container entrypoints by inferring:

  • Application Intent - What the application is designed to do (web server, CLI tool, worker, etc.)
  • Capabilities - What system resources and external services the application uses
  • Attack Surface - Potential security vulnerabilities based on detected patterns
  • Data Boundaries - I/O edges where data enters or leaves the application

This semantic layer enables more accurate vulnerability prioritization, reachability analysis, and policy decisioning.

Schema Definition

SemanticEntrypoint Record

The core output of semantic analysis:

public sealed record SemanticEntrypoint
{
    public required string Id { get; init; }
    public required EntrypointSpecification Specification { get; init; }
    public required ApplicationIntent Intent { get; init; }
    public required CapabilityClass Capabilities { get; init; }
    public required ImmutableArray<ThreatVector> AttackSurface { get; init; }
    public required ImmutableArray<DataFlowBoundary> DataBoundaries { get; init; }
    public required SemanticConfidence Confidence { get; init; }
    public string? Language { get; init; }
    public string? Framework { get; init; }
    public string? FrameworkVersion { get; init; }
    public string? RuntimeVersion { get; init; }
    public ImmutableDictionary<string, string>? Metadata { get; init; }
}

Application Intent

Enumeration of recognized application types:

Intent Description Example Frameworks
WebServer HTTP/HTTPS listener Django, Express, ASP.NET Core
CliTool Command-line utility Click, Cobra, System.CommandLine
Worker Background job processor Celery, Sidekiq, Hangfire
BatchJob One-shot data processing MapReduce, ETL scripts
Serverless FaaS handler Lambda, Azure Functions
Daemon Long-running background service systemd units
StreamProcessor Real-time data pipeline Kafka Streams, Flink
RpcServer gRPC/Thrift server grpc-go, grpc-dotnet
GraphQlServer GraphQL API Apollo, Hot Chocolate
DatabaseServer Database engine PostgreSQL, Redis
MessageBroker Message queue server RabbitMQ, NATS
CacheServer Cache/session store Redis, Memcached
ProxyGateway Reverse proxy, API gateway Envoy, NGINX

Capability Classes

Flags enum representing detected capabilities:

Capability Description Detection Signals
NetworkListen Opens listening socket http.ListenAndServe, app.listen()
NetworkConnect Makes outbound connections requests, http.Client
FileRead Reads from filesystem open(), File.ReadAllText()
FileWrite Writes to filesystem File write operations
ProcessSpawn Spawns child processes subprocess, exec.Command
DatabaseSql SQL database access psycopg2, SqlConnection
DatabaseNoSql NoSQL database access pymongo, redis
MessageQueue Message broker client pika, kafka-python
CacheAccess Cache client operations redis, memcached
ExternalHttpApi External HTTP API calls REST clients
Authentication Auth operations passport, JWT libraries
SecretAccess Accesses secrets/credentials Vault clients, env secrets

Threat Vectors

Inferred security threats:

Threat Type CWE ID OWASP Category Contributing Capabilities
SqlInjection 89 A03:2021 DatabaseSql + UserInput
Xss 79 A03:2021 NetworkListen + UserInput
Ssrf 918 A10:2021 ExternalHttpApi + UserInput
Rce 94 A03:2021 ProcessSpawn + UserInput
PathTraversal 22 A01:2021 FileRead + UserInput
InsecureDeserialization 502 A08:2021 Deserialization patterns
AuthenticationBypass 287 A07:2021 Auth patterns detected
CommandInjection 78 A03:2021 ProcessSpawn patterns

Data Flow Boundaries

I/O edges for data flow analysis:

Boundary Type Direction Security Relevance
HttpRequest Inbound User input entry point
HttpResponse Outbound Data exposure point
DatabaseQuery Outbound SQL injection surface
FileInput Inbound Path traversal surface
EnvironmentVar Inbound Config injection surface
MessageReceive Inbound Deserialization surface
ProcessSpawn Outbound Command injection surface

Confidence Scoring

All inferences include confidence scores:

public sealed record SemanticConfidence
{
    public double Score { get; init; }           // 0.0-1.0
    public ConfidenceTier Tier { get; init; }    // Unknown, Low, Medium, High, Definitive
    public ImmutableArray<string> ReasoningChain { get; init; }
}
Tier Score Range Description
Definitive 0.95-1.0 Framework explicitly declared
High 0.8-0.95 Strong pattern match
Medium 0.5-0.8 Multiple weak signals
Low 0.2-0.5 Heuristic inference
Unknown 0.0-0.2 No reliable signals

Language Adapters

Semantic analysis uses language-specific adapters:

Python Adapter

  • Django: Detects manage.py, INSTALLED_APPS, migrations
  • Flask/FastAPI: Detects Flask(__name__), FastAPI() patterns
  • Celery: Detects Celery() app, @task decorators
  • Click/Typer: Detects CLI decorators
  • Lambda: Detects lambda_handler pattern

Java Adapter

  • Spring Boot: Detects @SpringBootApplication, starter dependencies
  • Quarkus: Detects io.quarkus packages
  • Kafka Streams: Detects kafka-streams dependency
  • Main-Class: Falls back to manifest analysis

Node Adapter

  • Express: Detects express() + listen()
  • NestJS: Detects @nestjs/core dependency
  • Fastify: Detects fastify() patterns
  • CLI bin: Detects bin field in package.json

.NET Adapter

  • ASP.NET Core: Detects Microsoft.AspNetCore references
  • Worker Service: Detects BackgroundService inheritance
  • Console: Detects OutputType=Exe without web deps

Go Adapter

  • net/http: Detects http.ListenAndServe patterns
  • Cobra: Detects github.com/spf13/cobra import
  • gRPC: Detects google.golang.org/grpc import

Integration Points

Entry Trace Pipeline

Semantic analysis integrates after entry trace resolution:

Container Image
     ↓
EntryTraceAnalyzer.ResolveAsync()
     ↓
EntryTraceGraph (nodes, edges, terminals)
     ↓
SemanticEntrypointOrchestrator.AnalyzeAsync()
     ↓
SemanticEntrypoint (intent, capabilities, threats)

SBOM Output

Semantic data appears in CycloneDX properties:

{
  "properties": [
    { "name": "stellaops:semantic.intent", "value": "WebServer" },
    { "name": "stellaops:semantic.capabilities", "value": "NetworkListen,DatabaseSql" },
    { "name": "stellaops:semantic.threats", "value": "[{\"type\":\"SqlInjection\",\"confidence\":0.7}]" },
    { "name": "stellaops:semantic.risk.score", "value": "0.7" },
    { "name": "stellaops:semantic.framework", "value": "django" }
  ]
}

RichGraph Output

Semantic attributes on entrypoint nodes:

{
  "kind": "entrypoint",
  "attributes": {
    "semantic_intent": "WebServer",
    "semantic_capabilities": "NetworkListen,DatabaseSql,UserInput",
    "semantic_threats": "SqlInjection,Xss",
    "semantic_risk_score": "0.7",
    "semantic_confidence": "0.85",
    "semantic_confidence_tier": "High"
  }
}

Usage Examples

CLI Usage

# Scan with semantic analysis
stella scan myimage:latest --semantic

# Output includes semantic fields
stella scan myimage:latest --format json | jq '.semantic'

Programmatic Usage

// Create orchestrator
var orchestrator = new SemanticEntrypointOrchestrator();

// Create context from entry trace result
var context = orchestrator.CreateContext(entryTraceResult, fileSystem, containerMetadata);

// Run analysis
var result = await orchestrator.AnalyzeAsync(context);

if (result.Success && result.Entrypoint is not null)
{
    Console.WriteLine($"Intent: {result.Entrypoint.Intent}");
    Console.WriteLine($"Capabilities: {result.Entrypoint.Capabilities}");
    Console.WriteLine($"Risk Score: {result.Entrypoint.AttackSurface.Max(t => t.Confidence)}");
}

Extending the Engine

Adding a New Language Adapter

  1. Implement ISemanticEntrypointAnalyzer:
public sealed class RubySemanticAdapter : ISemanticEntrypointAnalyzer
{
    public IReadOnlyList<string> SupportedLanguages => new[] { "ruby" };
    public int Priority => 100;

    public ValueTask<SemanticEntrypoint> AnalyzeAsync(
        SemanticAnalysisContext context,
        CancellationToken cancellationToken)
    {
        // Detect Rails, Sinatra, Sidekiq, etc.
    }
}
  1. Register in SemanticEntrypointOrchestrator.CreateDefaultAdapters().

Adding a New Capability

  1. Add to CapabilityClass flags enum
  2. Update CapabilityDetector with detection patterns
  3. Update ThreatVectorInferrer if capability contributes to threats
  4. Update DataBoundaryMapper if capability implies I/O boundaries