# Callgraph Schema Reference This document describes the `stella.callgraph.v1` schema used for representing call graphs in StellaOps. ## Schema Version **Current Version:** `stella.callgraph.v1` All call graphs should include the `schema` field set to `stella.callgraph.v1`. Legacy call graphs without this field are automatically migrated on ingestion. ## Document Structure A `CallgraphDocument` contains the following top-level fields: | Field | Type | Required | Description | |-------|------|----------|-------------| | `schema` | string | Yes | Schema identifier: `stella.callgraph.v1` | | `scanKey` | string | No | Scan context identifier | | `language` | CallgraphLanguage | No | Primary language of the call graph | | `artifacts` | CallgraphArtifact[] | No | Artifacts included in the graph | | `nodes` | CallgraphNode[] | Yes | Graph nodes representing symbols | | `edges` | CallgraphEdge[] | Yes | Call edges between nodes | | `entrypoints` | CallgraphEntrypoint[] | No | Discovered entrypoints | | `metadata` | CallgraphMetadata | No | Graph-level metadata | | `id` | string | Yes | Unique graph identifier | | `component` | string | No | Component name | | `version` | string | No | Component version | | `ingestedAt` | DateTimeOffset | No | Ingestion timestamp (ISO 8601) | | `graphHash` | string | No | Content hash for deduplication | ### Legacy Fields These fields are preserved for backward compatibility: | Field | Type | Description | |-------|------|-------------| | `languageString` | string | Legacy language string | | `roots` | CallgraphRoot[] | Legacy root/entrypoint representation | | `schemaVersion` | string | Legacy schema version field | ## Enumerations ### CallgraphLanguage Supported languages for call graph analysis: | Value | Description | |-------|-------------| | `Unknown` | Language not determined | | `DotNet` | .NET (C#, F#, VB.NET) | | `Java` | Java and JVM languages | | `Node` | Node.js / JavaScript / TypeScript | | `Python` | Python | | `Go` | Go | | `Rust` | Rust | | `Ruby` | Ruby | | `Php` | PHP | | `Binary` | Native binary (ELF, PE) | | `Swift` | Swift | | `Kotlin` | Kotlin | ### SymbolVisibility Access visibility levels for symbols: | Value | Description | |-------|-------------| | `Unknown` | Visibility not determined | | `Public` | Publicly accessible | | `Internal` | Internal to assembly/module | | `Protected` | Protected (subclass accessible) | | `Private` | Private to containing type | ### EdgeKind Edge classification based on analysis confidence: | Value | Description | Confidence | |-------|-------------|------------| | `Static` | Statically determined call | High | | `Heuristic` | Heuristically inferred | Medium | | `Runtime` | Runtime-observed edge | Highest | ### EdgeReason Reason codes explaining why an edge exists (critical for explainability): | Value | Description | Typical Kind | |-------|-------------|--------------| | `DirectCall` | Direct method/function call | Static | | `VirtualCall` | Virtual/interface dispatch | Static | | `ReflectionString` | Reflection-based invocation | Heuristic | | `DiBinding` | Dependency injection binding | Heuristic | | `DynamicImport` | Dynamic import/require | Heuristic | | `NewObj` | Constructor/object instantiation | Static | | `DelegateCreate` | Delegate/function pointer creation | Static | | `AsyncContinuation` | Async/await continuation | Static | | `EventHandler` | Event handler subscription | Heuristic | | `GenericInstantiation` | Generic type instantiation | Static | | `NativeInterop` | Native interop (P/Invoke, JNI, FFI) | Static | | `RuntimeMinted` | Runtime-minted edge from execution | Runtime | | `Unknown` | Reason could not be determined | - | ### EntrypointKind Types of entrypoints: | Value | Description | |-------|-------------| | `Unknown` | Type not determined | | `Http` | HTTP endpoint | | `Grpc` | gRPC endpoint | | `Cli` | CLI command handler | | `Job` | Background job | | `Event` | Event handler | | `MessageQueue` | Message queue consumer | | `Timer` | Timer/scheduled task | | `Test` | Test method | | `Main` | Main entry point | | `ModuleInit` | Module initializer | | `StaticConstructor` | Static constructor | ### EntrypointFramework Frameworks that expose entrypoints: | Value | Description | Language | |-------|-------------|----------| | `Unknown` | Framework not determined | - | | `AspNetCore` | ASP.NET Core | DotNet | | `MinimalApi` | ASP.NET Core Minimal APIs | DotNet | | `Spring` | Spring Framework | Java | | `SpringBoot` | Spring Boot | Java | | `Express` | Express.js | Node | | `Fastify` | Fastify | Node | | `NestJs` | NestJS | Node | | `FastApi` | FastAPI | Python | | `Flask` | Flask | Python | | `Django` | Django | Python | | `Rails` | Ruby on Rails | Ruby | | `Gin` | Gin | Go | | `Echo` | Echo | Go | | `Actix` | Actix Web | Rust | | `Rocket` | Rocket | Rust | | `AzureFunctions` | Azure Functions | Multi | | `AwsLambda` | AWS Lambda | Multi | | `CloudFunctions` | Google Cloud Functions | Multi | ### EntrypointPhase Execution phase for entrypoints: | Value | Description | |-------|-------------| | `ModuleInit` | Module/assembly initialization | | `AppStart` | Application startup (Main) | | `Runtime` | Runtime request handling | | `Shutdown` | Shutdown/cleanup handlers | ## Node Structure A `CallgraphNode` represents a symbol (method, function, type) in the call graph: ```json { "id": "n001", "nodeId": "n001", "name": "GetWeatherForecast", "kind": "method", "namespace": "SampleApi.Controllers", "file": "WeatherForecastController.cs", "line": 15, "symbolKey": "SampleApi.Controllers.WeatherForecastController::GetWeatherForecast()", "artifactKey": "SampleApi.dll", "visibility": "Public", "isEntrypointCandidate": true, "attributes": { "returnType": "IEnumerable", "httpMethod": "GET", "route": "/weatherforecast" }, "flags": 3 } ``` ### Node Fields | Field | Type | Required | Description | |-------|------|----------|-------------| | `id` | string | Yes | Unique identifier within the graph | | `nodeId` | string | No | Alias for id (v1 schema convention) | | `name` | string | Yes | Human-readable symbol name | | `kind` | string | Yes | Symbol kind (method, function, class) | | `namespace` | string | No | Namespace or module path | | `file` | string | No | Source file path | | `line` | int | No | Source line number | | `symbolKey` | string | No | Canonical symbol key (v1) | | `artifactKey` | string | No | Reference to containing artifact | | `visibility` | SymbolVisibility | No | Access visibility | | `isEntrypointCandidate` | bool | No | Whether node is an entrypoint candidate | | `purl` | string | No | Package URL for external packages | | `symbolDigest` | string | No | Content-addressed symbol digest | | `attributes` | object | No | Additional attributes | | `flags` | int | No | Bitmask for efficient filtering | ### Symbol Key Format The `symbolKey` follows a canonical format: ``` {Namespace}.{Type}[`Arity][+Nested]::{Method}[`Arity]({ParamTypes}) ``` Examples: - `System.String::Concat(string, string)` - `MyApp.Controllers.UserController::GetUser(int)` - `System.Collections.Generic.List`1::Add(T)` ## Edge Structure A `CallgraphEdge` represents a call relationship between two symbols: ```json { "sourceId": "n001", "targetId": "n002", "from": "n001", "to": "n002", "type": "call", "kind": "Static", "reason": "DirectCall", "weight": 1.0, "offset": 42, "isResolved": true, "provenance": "static-analysis" } ``` ### Edge Fields | Field | Type | Required | Description | |-------|------|----------|-------------| | `sourceId` | string | Yes | Source node ID (caller) | | `targetId` | string | Yes | Target node ID (callee) | | `from` | string | No | Alias for sourceId (v1) | | `to` | string | No | Alias for targetId (v1) | | `type` | string | No | Legacy edge type | | `kind` | EdgeKind | No | Edge classification | | `reason` | EdgeReason | No | Reason for edge existence | | `weight` | double | No | Confidence weight (0.0-1.0) | | `offset` | int | No | IL/bytecode offset | | `isResolved` | bool | No | Whether target was fully resolved | | `provenance` | string | No | Provenance information | | `candidates` | string[] | No | Virtual dispatch candidates | ## Entrypoint Structure A `CallgraphEntrypoint` represents a discovered entrypoint: ```json { "nodeId": "n001", "kind": "Http", "route": "/api/users/{id}", "httpMethod": "GET", "framework": "AspNetCore", "source": "attribute", "phase": "Runtime", "order": 0 } ``` ### Entrypoint Fields | Field | Type | Required | Description | |-------|------|----------|-------------| | `nodeId` | string | Yes | Reference to the node | | `kind` | EntrypointKind | Yes | Type of entrypoint | | `route` | string | No | HTTP route pattern | | `httpMethod` | string | No | HTTP method (GET, POST, etc.) | | `framework` | EntrypointFramework | No | Framework exposing the entrypoint | | `source` | string | No | Discovery source | | `phase` | EntrypointPhase | No | Execution phase | | `order` | int | No | Deterministic ordering | ## Determinism Requirements For reproducible analysis, call graphs must be deterministic: 1. **Stable Ordering** - Nodes must be sorted by `id` (ordinal string comparison) - Edges must be sorted by `sourceId`, then `targetId` - Entrypoints must be sorted by `order` 2. **Enum Serialization** - All enums serialize as camelCase strings - Example: `EdgeReason.DirectCall` → `"directCall"` 3. **Timestamps** - All timestamps must be UTC ISO 8601 format - Example: `2025-01-15T10:00:00Z` 4. **Content Hashing** - The `graphHash` field should contain a stable content hash - Hash algorithm: SHA-256 - Format: `sha256:{hex-digest}` ## Schema Migration Legacy call graphs without the `schema` field are automatically migrated: 1. **Schema Field**: Set to `stella.callgraph.v1` 2. **Language Parsing**: String language converted to `CallgraphLanguage` enum 3. **Visibility Inference**: Inferred from symbol key patterns: - Contains `.Internal.` → `Internal` - Contains `._` or `<` → `Private` - Default → `Public` 4. **Edge Reason Inference**: Based on legacy `type` field: - `call`, `direct` → `DirectCall` - `virtual`, `callvirt` → `VirtualCall` - `newobj` → `NewObj` - etc. 5. **Entrypoint Inference**: Built from legacy `roots` and candidate nodes 6. **Symbol Key Generation**: Built from namespace and name if missing ## Validation Rules Call graphs are validated against these rules: 1. All node `id` values must be unique 2. All edge `sourceId` and `targetId` must reference existing nodes 3. All entrypoint `nodeId` must reference existing nodes 4. Edge `weight` must be between 0.0 and 1.0 5. Artifacts referenced by nodes must exist in the `artifacts` list ## Golden Fixtures Reference fixtures for testing are located at: `tests/reachability/fixtures/callgraph-schema-v1/` | Fixture | Description | |---------|-------------| | `dotnet-aspnetcore-minimal.json` | ASP.NET Core application | | `java-spring-boot.json` | Spring Boot application | | `node-express-api.json` | Express.js API | | `go-gin-api.json` | Go Gin API | | `legacy-no-schema.json` | Legacy format for migration testing | | `all-edge-reasons.json` | All 13 edge reason codes | | `all-visibility-levels.json` | All 5 visibility levels | ## Related Documentation - [Reachability Analysis Technical Reference](../reachability/README.md) - [Schema Migration Implementation](../../src/Signals/StellaOps.Signals/Parsing/CallgraphSchemaMigrator.cs) - [SPRINT_1100: CallGraph Schema Enhancement](../implplan/SPRINT_1100_0001_0001_callgraph_schema_enhancement.md)