You have to be logged in to leave a comment.

date	image
2024-11-23	/img/blog/llama-red-team/llama-hacker.webp

How to Red Team an Ollama Model

Want to test the safety and security of a model hosted on Ollama? This guide shows you how to use Promptfoo to systematically probe for vulnerabilities through adversarial testing (red teaming).

We'll use Llama 3.2 3B as an example, but this guide works with any Ollama model.

Here's an example of what the red team report looks like:

Prerequisites

Before you begin, ensure you have:

Node.js: Install Node.js version 18 or later. Download Node.js
Ollama: Install Ollama from ollama.ai
Promptfoo: No prior installation needed; we'll use npx to run commands

First, make sure you've pulled the model you want to test on Ollama:

ollama pull llama3.2

Setting Up the Environment

Create a new directory for your red teaming project and initialize it:

mkdir ollama-redteam
cd ollama-redteam
npx promptfoo@latest redteam init --no-gui --no-interactive

This creates a promptfooconfig.yaml file that we'll customize for Ollama.

Configuring the Ollama Provider

Edit promptfooconfig.yaml to use Ollama as the target:

targets:
  - id: ollama:chat:llama3.2
    label: llama3.2-redteam
    config:
      temperature: 0.7
      max_tokens: 150

purpose: 'The system is a helpful chatbot assistant that answers questions and helps with tasks.'

redteam:
  plugins:
    # Replace these with the plugins you want to test
    - harmful
    - pii
    - contracts
    - hallucination
    - imitation
  strategies:
    - jailbreak
    - prompt-injection
  numTests: 5

To see the full configuration example on Github, click here.

Configuration Explained

targets: Specifies Llama 3.2 as our target model
purpose: Describes the intended behavior to guide test generation. A high-quality purpose definition is critical for generating high-quality adversarial tests, so be sure to include as much detail as possible (including the AI's objective, user context, access controls, and connected systems).
plugins: Various vulnerability types to test (see full list):
- harmful: Tests for harmful content generation
- pii: Tests for PII leakage
- contracts: Tests if model makes unauthorized commitments
- hallucination: Tests for false information
- imitation: Tests if model impersonates others
strategies: Techniques for delivering adversarial inputs (see full list):
- jailbreak: Tests if model can escape its constraints
- prompt-injection: Tests if model is susceptible to injected instructions
numTests: Number of test cases per plugin

Running the Red Team Evaluation

Generate and run the adversarial test cases:

npx promptfoo@latest redteam run

This command:

Generates test cases based on your configuration
Runs them against the Llama model
Grades the responses for vulnerabilities

Analyzing the Results

Generate a report of the findings:

npx promptfoo@latest redteam report

The report shows:

Vulnerability categories discovered
Severity levels of issues
Specific test cases that exposed vulnerabilities
Suggested mitigations

Here's an example report card:

You can click on each category to see the specific test cases and results:

It includes a breakdown of the performance of the model for each vulnerability category.

Example Findings

Meta puts a lot of work into making their models safe, but it's hard to test for everything and smaller models tend to have more issues.

Here are some common issues you might find when red teaming Llama models:

Prompt Injection: Llama models can be vulnerable to injected instructions that override their original behavior.
Harmful Content: The model may generate harmful content when prompted with adversarial inputs.
Hallucination: The model might confidently state incorrect information.
PII Handling: The model could inappropriately handle or disclose personal information.

Mitigating Vulnerabilities

Remeditations will depend on your test results, but in general some things to keep in mind are:

System Prompts: Add explicit safety constraints in your system prompts
Input Validation: Implement pre-processing to catch malicious inputs
Output Filtering: Add post-processing to filter harmful content
Temperature Adjustment: Lower temperature values can reduce erratic behavior

Additional Resources

Tip!

Press p or to see the previous file or, n or to see the next file

Specify your S3 bucket

Bucket name cannot be the same as the repository name. Please change one of them.

Bucket url and prefix

Region

Endpoint Url

Disable SSL verification

red-team-ollama-model.md 5.1 KB

History Raw

How to Red Team an Ollama Model

Prerequisites

Setting Up the Environment

Configuring the Ollama Provider

Configuration Explained

Running the Red Team Evaluation

Analyzing the Results

Example Findings

Mitigating Vulnerabilities

Additional Resources

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida / promptfoo mirror of https://github.com/promptfoo/promptfoo

red-team-ollama-model.md 5.1 KB History Raw

How to Red Team an Ollama Model

Prerequisites

Setting Up the Environment

Configuring the Ollama Provider

Configuration Explained

Running the Red Team Evaluation

Analyzing the Results

Example Findings

Mitigating Vulnerabilities

Additional Resources

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida
/
promptfoo
mirror of https://github.com/promptfoo/promptfoo

red-team-ollama-model.md 5.1 KB

History Raw