aegis

AI-Powered SAST Framework

Multi-model security scanning with LLMs, ML, AI (Local & Cloud)

Aegis -of the vulnerable-

Demos

Aegis v1: https://youtu.be/ZmT-3UpVOz8
Legacy version: https://youtu.be/StXTwdxQyQI

About

Aegis is an open-source SAST framework for multi-model orchestration, bridging generative and discriminative AI for security analysis. It provides standardized infrastructure to integrate, benchmark, and compare diverse architectures—from Large Language Models (OpenAI, Anthropic, Google, Ollama) to specialized classifiers (CodeBERT, VulBERTa) via HuggingFace, and traditional ML models (sklearn).

Every model is a pluggable component within a unified registry, enabling complex workflows such as consensus-based decision making, multi-layered scanning, and cross-architecture validation.

Key Features:

Multi-Provider Support: Ollama, HuggingFace, OpenAI, Anthropic, Google, Claude Code, sklearn
Consensus Strategies: Union, Majority, Judge, Cascade (coming soon), Weighted (coming soon)
Model Catalog: Pre-configured models with one-click installation
Real-time Progress: SSE streaming for live scan updates
Export Formats: SARIF, CSV, JSON
Extensible: Add custom models, parsers, and providers

Architecture

┌──────────────────────────────────────────────────────────────┐
│                         User Upload                          │
│                    (ZIP, Code, Git Repo)                     │
└────────────────────────┬─────────────────────────────────────┘
                         │
                         ▼
┌──────────────────────────────────────────────────────────────┐
│                    Scan Worker (Queue)                       │
│             Background processing with SSE updates           │
└────────────────────────┬─────────────────────────────────────┘
                         │
                         ▼
┌──────────────────────────────────────────────────────────────┐
│                  Pipeline Executor (Chunks)                  │
│           Parallel execution with ThreadPoolExecutor         │
└────────┬────────────────────────────────────────────┬────────┘
         │                                            │
         ▼                                            ▼
┌────────────────────┐                    ┌────────────────────┐
│   Model Registry   │                    │   Prompt Builder   │
│  (SQLite + Cache)  │                    │  (Role Templates)  │
└────────┬───────────┘                    └──────────┬─────────┘
         │                                           │
         ▼                                           ▼
┌──────────────────────────────────────────────────────────────┐
│                   Provider Layer (Adapters)                  │
│      Ollama  │  HuggingFace  │  Cloud  │  Classic ML         │
└────────────────────────┬─────────────────────────────────────┘
                         │
                         ▼
┌──────────────────────────────────────────────────────────────┐
│                    Parsers (JSON/Binary)                     │
│         Schema validation + fallback regex extraction        │
└────────────────────────┬─────────────────────────────────────┘
                         │
                         ▼
┌──────────────────────────────────────────────────────────────┐
│                  Consensus Engine (Merge)                    │
│      Union │ Majority │ Judge │ Cascade │ Weighted           │
└────────────────────────┬─────────────────────────────────────┘
                         │
                         ▼
┌──────────────────────────────────────────────────────────────┐
│              Database (Scans, Findings, History)             │
│                Export: SARIF, CSV, JSON                      │
└──────────────────────────────────────────────────────────────┘

Quick Start

1. Install Dependencies

python -m venv .venv
.venv\Scripts\activate  # Windows
# source .venv/bin/activate  # Linux/Mac

pip install -r requirements/requirements.txt

Installation Options:

Standard (CPU): pip install -r requirements/requirements.txt
GPU (CUDA): pip install -r requirements/requirements-gpu.txt

2. Configure API Keys (Optional)

Create .env file for cloud providers:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...

3. Start Aegis

python app.py

Open: http://localhost:5000

Consensus Strategies

Strategy	Description	Best For
Union	Combines all findings from all models	Maximum coverage
Majority	Only findings detected by 2+ models	Reducing false positives
Judge	A judge model evaluates and merges findings	High-precision analysis
Cascade (coming soon)	Two-pass: fast triage → deep scan on flagged files	Large codebases
Weighted (coming soon)	Confidence-weighted voting	Balanced precision/recall

Model Catalog

Pre-configured models available in the UI catalog:

HuggingFace Models

Model	Type	Role	Hardware
CodeBERT Insecure Detector	Classifier	Triage	CPU
CodeBERT PrimeVul-BigVul	Classifier	Triage	CPU
VulBERTa MLP Devign	Classifier (C/C++)	Triage	CPU
UnixCoder PrimeVul-BigVul	Classifier	Triage	CPU
Qwen 2.5 0.5B Instruct	Generative	Triage/Deep	CPU
CodeAstra 7B	Generative	Deep Scan	GPU
Phi-3.5 Mini Instruct	Generative	Deep Scan	GPU
DeepSeek Coder V2 Lite	Generative	Deep Scan	GPU
StarCoder2 15B Instruct	Generative	Deep Scan	GPU

Cloud Models (API Key Required)

Model	Provider	Role
GPT-4o	OpenAI	Deep Scan, Judge
GPT-4o Mini	OpenAI	Triage, Deep Scan
Claude Sonnet 4	Anthropic	Deep Scan, Judge
Gemini 1.5 Pro	Google	Deep Scan
Gemini 1.5 Flash	Google	Triage, Deep Scan

Ollama Models (Requires Ollama)

Model	Role
Qwen 2.5 Coder 7B	Deep Scan
Llama 3.2	Deep Scan
CodeLlama 7B	Deep Scan
Mistral 7B	Deep Scan
DeepSeek Coder 6.7B	Deep Scan

Agentic Models

Aegis v1.2 introduces agentic security scanning, a new provider category that invokes AI coding agents as subprocesses for deep SAST analysis. Agentic models pipe code with a JSON schema constraint and a security-focused system prompt, producing structured findings that enter the consensus engine like any other model.

Model	Variant	Role
Claude Code Security (Quick)	Sonnet	Triage, Deep Scan
Claude Code Security (Deep)	Opus	Deep Scan, Judge

Classic ML Models

Model	Type	Role	Hardware
Kaggle RF C-Functions	Random Forest (C/C++)	Triage	CPU

Using Models

Ollama (Local LLMs)

Install Ollama: https://ollama.ai
Pull a model: ollama pull llama3.2
In Aegis UI → Models → OLLAMA → Click Discover
Click Register on detected models

Recommended: llama3.2, qwen2.5-coder:7b, codellama:7b

HuggingFace (Local Transformers)

In Aegis UI → Models → HUGGING FACE
Browse the Model Catalog for pre-configured models
Click Install to download and register

Cloud LLMs (OpenAI, Anthropic, Google)

In Aegis UI → Models → CLOUD
Select provider and model from the catalog
Click Add (requires API key in .env)

Classic ML Models (sklearn)

In Aegis UI → Models → ML MODELS
Click Install on available presets
Model downloads automatically

Extending Aegis

Adding HuggingFace Models

Add new HuggingFace models to the catalog in aegis/models/catalog.py:

HF_MY_MODEL = {
    "catalog_id": "my_model",
    "category": CatalogCategory.HUGGINGFACE,
    "display_name": "My Custom Model",
    "description": "Description of what this model does",
    "provider_id": "huggingface",
    "model_type": "hf_local",
    "model_name": "organization/model-name",  # HuggingFace model ID
    "task_type": "text-classification",  # or "text-generation"
    "roles": ["triage"],  # triage, deep_scan, judge
    "parser_id": "hf_classification",  # Parser to use
    "parser_config": {
        "positive_labels": ["LABEL_1", "vulnerable"],
        "negative_labels": ["LABEL_0", "safe"],
        "threshold": 0.5,
    },
    "hf_kwargs": {
        "trust_remote_code": False,
    },
    "generation_kwargs": {
        "max_new_tokens": 512,
        "temperature": 0.1,
    },
    "size_mb": 500,
    "requires_gpu": False,
    "tags": ["classifier", "triage"],
}

# Add to MODEL_CATALOG list
MODEL_CATALOG.append(HF_MY_MODEL)

Adding Cloud Models

Add cloud model presets in aegis/models/catalog.py:

CLOUD_MY_MODEL = {
    "catalog_id": "my_cloud_model",
    "category": CatalogCategory.CLOUD,
    "display_name": "My Cloud Model",
    "description": "Cloud LLM for security analysis",
    "provider_id": "openai",  # openai, anthropic, google
    "model_type": "openai_cloud",  # openai_cloud, anthropic_cloud, google_cloud
    "model_name": "gpt-4o-mini",  # Provider's model name
    "task_type": "text-generation",
    "roles": ["deep_scan", "judge"],
    "parser_id": "json_schema",
    "generation_kwargs": {
        "max_tokens": 2048,
        "temperature": 0.1,
    },
    "requires_api_key": True,
    "tags": ["cloud", "llm", "deep_scan"],
}

Adding ML Models (sklearn)

Train and save your model as a Pipeline:

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier
import joblib

pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(max_features=5000)),
    ('clf', RandomForestClassifier(n_estimators=100))
])
pipeline.fit(X_train, y_train)
joblib.dump(pipeline, 'my_model.pkl')

Host the model (e.g., GitHub Releases)
Add to catalog in aegis/models/catalog.py:

ML_MY_MODEL = {
    "catalog_id": "my_ml_model",
    "category": CatalogCategory.CLASSIC_ML,
    "display_name": "My ML Classifier",
    "description": "Custom sklearn model for vulnerability detection",
    "provider_id": "tool_ml",
    "model_type": "tool_ml",
    "model_name": "my_ml_model",
    "tool_id": "sklearn_classifier",
    "task_type": "classification",
    "roles": ["triage"],
    "parser_id": "tool_result",
    "settings": {
        "model_url": "https://github.com/user/repo/releases/download/v1/my_model.pkl",
        "threshold": 0.5,
    },
    "size_mb": 50,
    "requires_gpu": False,
    "tags": ["ml", "sklearn", "triage"],
}

Adding Custom Parsers

Create a new parser in aegis/parsers/:

# aegis/parsers/my_parser.py
from aegis.parsers.base import BaseParser
from aegis.models.schema import ParserResult, FindingCandidate

class MyParser(BaseParser):
    """Custom parser for specific model output format."""

    parser_id = "my_parser"

    def parse(self, raw_output: str, context: dict) -> ParserResult:
        """Parse model output into findings."""
        findings = []

        # Your parsing logic here
        # Extract vulnerabilities from raw_output

        for vuln in extracted_vulns:
            findings.append(FindingCandidate(
                file_path=context.get("file_path", "unknown"),
                line_start=vuln.get("line", 1),
                line_end=vuln.get("line", 1),
                snippet=vuln.get("code", ""),
                title=vuln.get("title", "Vulnerability"),
                category=vuln.get("category", "security"),
                cwe=vuln.get("cwe"),
                severity=vuln.get("severity", "medium"),
                description=vuln.get("description", ""),
                recommendation=vuln.get("fix", ""),
                confidence=vuln.get("confidence", 0.5),
            ))

        return ParserResult(
            findings=findings,
            raw_output=raw_output,
        )

Register the parser in aegis/parsers/__init__.py:

from aegis.parsers.my_parser import MyParser

PARSER_REGISTRY = {
    # ... existing parsers
    "my_parser": MyParser,
}

Project Structure

aegis/
├── api/                  # REST API routes
├── consensus/            # Multi-model merging strategies
├── database/             # SQLite repositories
├── models/               # Model registry and catalog
│   ├── catalog.py       # Pre-configured model definitions
│   ├── registry.py      # Model registration (database)
│   └── runtime.py       # Model loading + caching
├── parsers/              # Output parsers (JSON, classification, etc.)
├── providers/            # Provider adapters (Ollama, HF, Cloud)
├── services/             # Scan worker + background queue
├── tools/                # ML tool plugins (sklearn, regex)
├── static/               # Web UI assets
└── templates/            # Jinja2 HTML templates

config/
├── models.yaml           # Legacy model presets
└── pipelines/            # Pipeline definitions

data/aegis.db             # SQLite database (auto-created)

Troubleshooting

Model Loading Issues

HuggingFace models fail to load:

Ensure 8GB+ RAM (16GB recommended)
Use 4-bit quantization in model settings
Check CUDA: python -c "import torch; print(torch.cuda.is_available())"

Ollama models not detected:

Verify Ollama is running: curl http://localhost:11434/api/tags
Set custom URL: OLLAMA_BASE_URL=http://host:11434

sklearn model errors:

Ensure model is saved as a Pipeline (includes vectorizer)
Check sklearn version compatibility

Scan Issues

Scans stuck at "Running":

Check browser console for SSE errors
First scan may take 1-2 minutes (model loading)
Check logs/aegis.log

Cloud API rate limits:

Use Ollama/HF for high-throughput scanning
Increase delay between requests

API Reference

See API_REFERENCE.md for detailed endpoint documentation.

License

MIT License - see LICENSE

Contributing

Pull requests welcome! Please:

Follow existing code style (PEP 8)
Add type hints to new functions
Test with multiple providers

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
aegis		aegis
config		config
data		data
requirements		requirements
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
API_REFERENCE.md		API_REFERENCE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

aegis

AI-Powered SAST Framework

Aegis -of the vulnerable-

Demos

About

Architecture

Quick Start

1. Install Dependencies

2. Configure API Keys (Optional)

3. Start Aegis

Consensus Strategies

Model Catalog

HuggingFace Models

Cloud Models (API Key Required)

Ollama Models (Requires Ollama)

Agentic Models

Classic ML Models

Using Models

Ollama (Local LLMs)

HuggingFace (Local Transformers)

Cloud LLMs (OpenAI, Anthropic, Google)

Classic ML Models (sklearn)

Extending Aegis

Adding HuggingFace Models

Adding Cloud Models

Adding ML Models (sklearn)

Adding Custom Parsers

Project Structure

Troubleshooting

Model Loading Issues

Scan Issues

API Reference

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages