You have to be logged in to leave a comment.

sidebar_label	sidebar_position
Advanced Usage	120

Advanced Usage

This page covers advanced ModelAudit features including cloud storage integration, CI/CD workflows, and programmatic usage.

Remote Model Scanning

ModelAudit can scan models directly from various remote sources without manual downloading.

HuggingFace URL Scanning

# Standard HuggingFace URL
promptfoo scan-model https://huggingface.co/bert-base-uncased

# Short HuggingFace URL
promptfoo scan-model https://hf.co/gpt2

# HuggingFace protocol
promptfoo scan-model hf://microsoft/resnet-50

# Private models (requires HF_TOKEN environment variable)
export HF_TOKEN=your_token_here
promptfoo scan-model hf://your-org/private-model

# Using .env file (create a .env file in your project root)
echo "HF_TOKEN=your_token_here" > .env
promptfoo scan-model hf://your-org/private-model

Cloud Storage

Amazon S3

# Using environment variables
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"

promptfoo scan-model s3://my-bucket/model.pkl

Google Cloud Storage

# Using service account
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
promptfoo scan-model gs://my-bucket/model.pt

Cloudflare R2

# R2 uses S3-compatible authentication
export AWS_ACCESS_KEY_ID="your-r2-access-key"
export AWS_SECRET_ACCESS_KEY="your-r2-secret-key"
export AWS_ENDPOINT_URL="https://your-account.r2.cloudflarestorage.com"

promptfoo scan-model r2://my-bucket/model.safetensors

Model Registries

MLflow

# Set MLflow tracking URI
export MLFLOW_TRACKING_URI=http://mlflow-server:5000

# Scan specific version
promptfoo scan-model models:/MyModel/1

# Scan latest version
promptfoo scan-model models:/MyModel/Latest

# With custom registry URI
promptfoo scan-model models:/MyModel/1 --registry-uri https://mlflow.company.com

JFrog Artifactory

# Using API token (recommended)
export JFROG_API_TOKEN=your_token_here
promptfoo scan-model https://company.jfrog.io/artifactory/models/model.pkl

# Or pass directly
promptfoo scan-model https://company.jfrog.io/artifactory/models/model.pkl --jfrog-api-token YOUR_TOKEN

# Using .env file (recommended for CI/CD)
echo "JFROG_API_TOKEN=your_token_here" > .env
promptfoo scan-model https://company.jfrog.io/artifactory/models/model.pkl

DVC Integration

ModelAudit automatically resolves DVC pointer files:

# Scans the actual model file referenced by the .dvc file
promptfoo scan-model model.pkl.dvc

Configuration Options

ModelAudit's behavior can be customized through command-line options. While configuration files are not currently supported, you can achieve similar results using CLI flags:

# Set blacklist patterns
modelaudit scan models/ \
  --blacklist "deepseek" \
  --blacklist "qwen" \
  --blacklist "unsafe_model"

# Set resource limits
modelaudit scan models/ \
  --max-file-size 1073741824 \
  --max-total-size 5368709120 \
  --timeout 600

# Combine multiple options
modelaudit scan models/ \
  --blacklist "suspicious_pattern" \
  --max-file-size 1073741824 \
  --timeout 600 \
  --verbose

Note: Advanced scanner-specific configurations (like pickle opcodes limits or weight distribution thresholds) are currently hardcoded and cannot be modified via CLI.

CI/CD Integration

GitHub Actions

# .github/workflows/model-security.yml
name: Model Security Scan

on:
  push:
    paths:
      - 'models/**'
      - '**.pkl'
      - '**.h5'
      - '**.pb'
      - '**.pt'
      - '**.pth'

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          npm install -g promptfoo
          pip install modelaudit[all]

      - name: Scan models
        run: promptfoo scan-model models/ --format json --output scan-results.json

      - name: Check for critical issues
        run: |
          if grep -q '"severity":"critical"' scan-results.json; then
            echo "Critical security issues found in models!"
            exit 1
          fi

      - name: Upload scan results
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: model-scan-results
          path: scan-results.json

GitLab CI

# .gitlab-ci.yml
model_security_scan:
  stage: test
  image: python:3.10
  script:
    - pip install modelaudit[all]
    - npm install -g promptfoo
    - promptfoo scan-model models/ --format json --output scan-results.json
    - if grep -q '"severity":"critical"' scan-results.json; then echo "Critical security issues found!"; exit 1; fi
  artifacts:
    paths:
      - scan-results.json
    when: always
  only:
    changes:
      - models/**
      - '**/*.pkl'
      - '**/*.h5'
      - '**/*.pb'
      - '**/*.pt'
      - '**/*.pth'

Pre-commit Hook

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: modelaudit
        name: ModelAudit
        entry: promptfoo scan-model
        language: system
        files: '\.(pkl|h5|pb|pt|pth|keras|hdf5|json|yaml|yml|zip|onnx|safetensors|bin|tflite|msgpack|pmml|joblib|npy|gguf|ggml)$'
        pass_filenames: true

Programmatic Usage

You can use ModelAudit programmatically in your Python code:

from modelaudit.core import scan_model_directory_or_file

# Scan a single model
results = scan_model_directory_or_file("path/to/model.pkl")

# Scan a HuggingFace model URL
results = scan_model_directory_or_file("https://huggingface.co/bert-base-uncased")

# Check for issues
if results["issues"]:
    print(f"Found {len(results['issues'])} issues:")
    for issue in results["issues"]:
        print(f"- {issue['severity'].upper()}: {issue['message']}")
else:
    print("No issues found!")

# Scan with custom configuration
config = {
    "blacklist_patterns": ["unsafe_model", "malicious_net"],
    "max_file_size": 1073741824,  # 1GB
    "timeout": 600  # 10 minutes
}

results = scan_model_directory_or_file("path/to/models/", **config)

JSON Output Format

When using --format json, ModelAudit outputs structured results:

{
  "scanner_names": ["pickle"],
  "start_time": 1750168822.481906,
  "bytes_scanned": 74,
  "issues": [
    {
      "message": "Found REDUCE opcode - potential __reduce__ method execution",
      "severity": "warning",
      "location": "evil.pickle (pos 71)",
      "details": {
        "position": 71,
        "opcode": "REDUCE"
      },
      "timestamp": 1750168822.482304
    },
    {
      "message": "Suspicious module reference found: posix.system",
      "severity": "critical",
      "location": "evil.pickle (pos 28)",
      "details": {
        "module": "posix",
        "function": "system",
        "position": 28,
        "opcode": "STACK_GLOBAL"
      },
      "timestamp": 1750168822.482378,
      "why": "The 'os' module provides direct access to operating system functions."
    }
  ],
  "has_errors": false,
  "files_scanned": 1,
  "duration": 0.0005328655242919922,
  "assets": [
    {
      "path": "evil.pickle",
      "type": "pickle"
    }
  ]
}

Software Bill of Materials (SBOM)

Generate CycloneDX-compliant SBOMs with license information:

promptfoo scan-model models/ --sbom model-sbom.json

The SBOM includes:

Component information (files, types, sizes, checksums)
License metadata (detected licenses, copyright holders)
Risk scoring based on scan findings
Model/dataset classification

Advanced Security Features

File Type Validation

ModelAudit performs comprehensive file type validation:

# File type mismatches are flagged
⚠ File type validation failed: extension indicates tensor_binary but magic bytes indicate pickle.
   This could indicate file spoofing, corruption, or a security threat.

Resource Exhaustion Protection

Built-in protection against various attacks:

Zip bombs: Detects suspicious compression ratios (>100x)
Decompression bombs: Limits decompressed file sizes
Memory exhaustion: Enforces limits on array sizes and nested structures
Infinite recursion: Limits nesting depth in recursive formats
DoS prevention: Enforces timeouts and maximum file sizes

Path Traversal Protection

Automatic protection in archives:

🔴 Archive entry ../../etc/passwd attempted path traversal outside the archive

Troubleshooting

Common Issues

Missing Dependencies

Error: h5py not installed, cannot scan Keras H5 files

Solution: Install the required dependencies:

pip install h5py tensorflow

Timeout Errors

Error: Scan timeout after 300 seconds

Solution: Increase the timeout:

promptfoo scan-model model.pkl --timeout 600

File Size Limits

Warning: File too large to scan: 2147483648 bytes (max: 1073741824)

Solution: Increase the maximum file size:

promptfoo scan-model model.pkl --max-file-size 3221225472

Unknown Format
```
Warning: Unknown or unhandled format
```
Solution: Ensure the file is in a supported format or create a custom scanner.
Binary File Format Detection
```
Info: Detected safetensors format in .bin file
```
Note: ModelAudit automatically detects the actual format of .bin files and applies the appropriate scanner.

Extending ModelAudit

Creating Custom Scanners

You can create custom scanners by extending the BaseScanner class:

from modelaudit.scanners.base import BaseScanner, ScanResult, IssueSeverity

class CustomModelScanner(BaseScanner):
    """Scanner for custom model format"""
    name = "custom_format"
    description = "Scans custom model format for security issues"
    supported_extensions = [".custom", ".mymodel"]

    @classmethod
    def can_handle(cls, path: str) -> bool:
        """Check if this scanner can handle the given path"""
        return path.endswith(tuple(cls.supported_extensions))

    def scan(self, path: str) -> ScanResult:
        """Scan the model file for security issues"""
        result = self._create_result()

        try:
            # Your custom scanning logic here
            with open(path, 'rb') as f:
                content = f.read()

            if b'malicious_pattern' in content:
                result.add_issue(
                    "Suspicious pattern found",
                    severity=IssueSeverity.WARNING,
                    location=path,
                    details={"pattern": "malicious_pattern"}
                )

        except Exception as e:
            result.add_issue(
                f"Error scanning file: {str(e)}",
                severity=IssueSeverity.CRITICAL,
                location=path,
                details={"exception": str(e)}
            )

        result.finish(success=True)
        return result

from modelaudit.scanners import SCANNER_REGISTRY
from my_custom_scanner import CustomModelScanner

# Register the custom scanner
SCANNER_REGISTRY.append(CustomModelScanner)

# Now you can use it
from modelaudit.core import scan_model_directory_or_file
results = scan_model_directory_or_file("path/to/custom_model.mymodel")

Tip!

Press p or to see the previous file or, n or to see the next file

Specify your S3 bucket

Bucket name cannot be the same as the repository name. Please change one of them.

Bucket url and prefix

Region

Endpoint Url

Disable SSL verification

nirbarazida / promptfoo mirror of https://github.com/promptfoo/promptfoo

usage.md 11 KB Permalink History Raw

Advanced Usage

Remote Model Scanning

HuggingFace URL Scanning

Cloud Storage

Amazon S3

Google Cloud Storage

Cloudflare R2

Model Registries

MLflow

JFrog Artifactory

DVC Integration

Configuration Options

CI/CD Integration

GitHub Actions

GitLab CI

Pre-commit Hook

Programmatic Usage

JSON Output Format

Software Bill of Materials (SBOM)

Advanced Security Features

File Type Validation

Resource Exhaustion Protection

Path Traversal Protection

Troubleshooting

Common Issues

Extending ModelAudit

Creating Custom Scanners

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida
/
promptfoo
mirror of https://github.com/promptfoo/promptfoo

usage.md 11 KB

Permalink History Raw