RAECM (Router Asset Evidence-Centric Multi-agent) is an autonomous framework for Internet-scale IPv6 router attribute identification via an evidence-centric multi-agent approach.
RAECM addresses the persistent challenge of converting heterogeneous and noisy service artifacts into fine-grained, auditable semantic labels at Internet scale. It leverages Large Language Models (LLMs) to transform lightweight multi-port probing measurements into structured, verifiable, and traceable asset labels.
- High Accuracy: Achieves strong performance on ground-truth benchmark dataset
- Significant Improvement: Substantial accuracy gains over unconstrained direct LLM inference
- Cost Reduction: Significantly lower cost compared to direct inference approach
- Efficient Student Model: Distilled model maintains strong accuracy for high-throughput scenarios
- Specialized Analysts: Decompose semantic labeling into extensible specialized agents
- Explicit Evidence: Each prediction grounded in verifiable evidence with reliability weights
- Post-hoc Verification: CheckAnalyst performs consistency validation and conservative correction
- Retrieval-Augmented Generation: External knowledge base supports evidence-grounded reasoning
- Content Hashing: Deduplication and cache reuse for repeated observations
- Entropy-Guided Sorting: Prioritize high-signal observations for efficient processing
- Teacher-Student Architecture: Distilled student model handles routine cases
- Lightweight Probing: Efficient multi-port scanning with minimal overhead
- Fingerprint Construction: Automated generation from evidence-linked structured outputs
- Longitudinal Monitoring: Support for drift detection and temporal analysis
- Auditable Outputs: Traceable and maintainable identification results
6Analyst-master/
βββ README.md # This file - Project overview
β
βββ recog/ # Teacher-side LLM identification pipeline
β βββ 6Analyst/ # Core implementation
β βββ run_6analyst.py # Main entry point
β βββ README.md # Complete documentation
β
βββ model/ # Student model distillation
βββ training/ # Training core
βββ configs/ # Model configurations
βββ input/ # Training data
βββ train.py # Training entry point
βββ evaluate.py # Evaluation entry point
βββ README.md # Complete documentation
High-accuracy identification using LLMs:
cd recog
pip install openai requests
# Configure API in 6Analyst/config.py
python run_6analyst.pySee recog/README.md for complete documentation.
High-throughput identification using distilled models:
cd model
pip install -r requirements.txt
# Download model
python train.py --mt vd --model qwen2.5-3b
python evaluate.py --mt vdSee model/README.md for complete documentation.
Raw Scanning Data
β
Data Cleaning & Normalization
ββ Remove sensitive fields
ββ Filter noise
ββ Calculate entropy
β
Product Identification (Specialized Analysts)
ββ Vendor identification
ββ OS identification
ββ Device type identification
β
Consistency Checking (CheckAnalyst)
ββ Cross-field validation
ββ Evidence sufficiency check
ββ Conservative correction
β
Structured Output with Evidence
Teacher Pipeline (LLM-based)
ββ High accuracy
ββ Evidence generation
ββ Complex case handling
β
Knowledge Distillation
β
Student Model (Distilled)
ββ High throughput
ββ Cost effective
ββ Large-scale deployment
Network Scanning β RAECM Teacher β Asset Inventory
- Perform lightweight multi-port probing
- Run RAECM identification pipeline
- Generate structured asset inventory with evidence
Continuous Scanning β RAECM Student β Rapid Classification
- Deploy distilled student model
- Process large-scale observations
- Achieve high throughput with maintained accuracy
RAECM Outputs β Evidence Clustering β Automated Fingerprints
- Collect evidence-linked structured outputs
- Cluster by evidence patterns
- Generate maintainable fingerprint rules
- Python 3.8+
- OpenAI-compatible API access
- Network scanning data
- Python 3.8+
- PyTorch 2.0+
- CUDA 11.8+ (GPU training)
- 8GB+ GPU memory
Each module contains complete, standalone documentation:
-
README.md (this file) - Project overview
-
recog/README.md - Teacher pipeline complete guide
- Quick start and installation
- Configuration reference
- Data formats and processing
- API integration
- Performance optimization
- FAQ and troubleshooting
-
model/README.md - Student model complete guide
- Training procedures
- Model configurations
- Evaluation metrics
- Deployment strategies
- Performance tuning
- FAQ and best practices
- Teacher (recog): High-accuracy identification, evidence generation, complex case handling
- Student (model): High-throughput processing, cost-effective deployment, large-scale scenarios
| Feature | Traditional Fingerprinting | RAECM |
|---|---|---|
| Adaptability | Low (manual updates) | High (LLM-based) |
| Evidence | Implicit | Explicit and traceable |
| Cross-Port Reasoning | Limited | Comprehensive |
| Maintenance | High manual effort | Automated |
| Auditability | Limited | Full provenance |
- Use teacher pipeline (recog) to generate labeled data
- Manual annotation of scanning results
- Existing labeled datasets
| Metric | Teacher | Student |
|---|---|---|
| Accuracy | Highest | Strong |
| Throughput | Low | High |
| Cost | High | Low |
| Latency | High | Low |
For accuracy: Use teacher pipeline with high-quality models (GPT-4, Claude)
For throughput: Deploy student model with batch processing
For cost: Use student model for large-scale deployment
For auditability: Use teacher pipeline to enable evidence generation and verification
Based on ground-truth benchmark evaluation:
| Metric | Description |
|---|---|
| Teacher Accuracy | High accuracy on benchmark dataset |
| Student Accuracy | Strong accuracy with efficient inference |
| Accuracy Improvement | Significant gains vs. direct inference |
| Cost Reduction | Substantial cost savings |
| Task | Teacher | Student |
|---|---|---|
| Vendor Identification | High accuracy | Strong accuracy |
| OS Identification | High accuracy | Strong accuracy |
| Device Type | High accuracy | Strong accuracy |
- Adaptability: Handles unseen models and evolving firmware
- Cross-Port Reasoning: Integrates evidence from multiple services
- Reduced Maintenance: Automated fingerprint generation
- Evidence Grounding: Every prediction backed by verifiable evidence
- Higher Accuracy: Significant improvement through multi-agent framework
- Lower Cost: Substantial cost reduction through optimization
- Better Reliability: Conservative abstention on insufficient evidence
- Auditability: Explicit evidence chains and provenance