semi implemented and features implemented save checkpoint

This commit is contained in:
master
2026-02-08 18:00:49 +02:00
parent 04360dff63
commit 1bf6bbf395
20895 changed files with 716795 additions and 64 deletions

View File

@@ -0,0 +1,51 @@
# Scanner Multi-Language License Detection Framework
## Module
Scanner
## Status
IMPLEMENTED
## Description
Comprehensive license detection framework with SPDX expression categorization service, license text extraction from source files, copyright notice extraction, per-language detectors (Python, Java, Go, Rust, JavaScript, .NET), and an aggregation service that merges results across analyzers. No direct match in known features list.
## Implementation Details
- **Core Licensing Framework**:
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang/Core/Licensing/LicenseCategorizationService.cs` - `LicenseCategorizationService` categorizing SPDX license expressions (permissive, copyleft, commercial, etc.)
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang/Core/Licensing/ILicenseCategorizationService.cs` - Interface for license categorization
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang/Core/Licensing/LicenseTextExtractor.cs` - `LicenseTextExtractor` extracting license text from source files (LICENSE, COPYING, etc.)
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang/Core/Licensing/ILicenseTextExtractor.cs` - Interface for text extraction
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang/Core/Licensing/CopyrightExtractor.cs` - `CopyrightExtractor` extracting copyright notices from source files
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang/Core/Licensing/ICopyrightExtractor.cs` - Interface for copyright extraction
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang/Core/Licensing/LicenseDetectionAggregator.cs` - `LicenseDetectionAggregator` merging license detection results across multiple per-language analyzers
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang/Core/Licensing/ILicenseDetectionAggregator.cs` - Interface for aggregation
- **Result Models**:
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang/Core/Licensing/LicenseDetectionResult.cs` - Per-package license detection result
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang/Core/Licensing/LicenseDetectionSummary.cs` - Summary across all packages
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang/Core/Licensing/LicenseTextExtractionResult.cs` - License text extraction result
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang/Core/Licensing/CopyrightNotice.cs` - Copyright notice model
- **Per-Language Detectors**:
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python/Internal/Licensing/PythonLicenseDetector.cs` - Python license detection (setup.py, pyproject.toml, PKG-INFO)
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python/Internal/Licensing/SpdxLicenseNormalizer.cs` - SPDX normalization for Python classifiers
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java/Internal/License/JavaLicenseDetector.cs` - Java license detection (pom.xml, META-INF)
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Go/Internal/GoLicenseDetector.cs` - Go license detection (go.mod, vendor)
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Go/Internal/EnhancedGoLicenseDetector.cs` - Enhanced Go license detection with source analysis
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Rust/Internal/EnhancedRustLicenseDetector.cs` - Rust license detection (Cargo.toml)
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node/Internal/Licensing/NodeLicenseDetector.cs` - Node.js license detection (package.json)
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.DotNet/Internal/Licensing/DotNetLicenseDetector.cs` - .NET license detection (.csproj, .nuspec)
- **Evidence**:
- `src/Scanner/__Libraries/StellaOps.Scanner.Emit/Evidence/LicenseEvidenceBuilder.cs` - `LicenseEvidenceBuilder` building license evidence for attestation
- **Tests**:
- `src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Tests/Licensing/LicenseCategorizationServiceTests.cs` - Categorization tests
- `src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Tests/Licensing/LicenseTextExtractorTests.cs` - Text extraction tests
- `src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Tests/Licensing/CopyrightExtractorTests.cs` - Copyright extraction tests
- `src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Tests/Licensing/LicenseDetectionAggregatorTests.cs` - Aggregation tests
- `src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Tests/Licensing/LicenseDetectionIntegrationTests.cs` - Integration tests
## E2E Test Plan
- [ ] Scan a multi-language container image (Python + Java + Node.js) and verify license detection aggregates results from all per-language detectors
- [ ] Verify the `LicenseCategorizationService` correctly classifies SPDX expressions (MIT as permissive, GPL-3.0 as copyleft, etc.)
- [ ] Verify `LicenseTextExtractor` extracts full license text from LICENSE/COPYING files and embedded license headers
- [ ] Verify `CopyrightExtractor` captures copyright notices with correct year ranges and holder names
- [ ] Verify the `LicenseDetectionAggregator` merges results from multiple analyzers without duplicates
- [ ] Verify each per-language detector handles its ecosystem-specific license metadata correctly (Python classifiers, Maven POM licenses, package.json license field)