Are you sure you want to delete this access key?
date | image |
---|---|
2024-11-23 | /img/blog/llama-red-team/llama-hacker.webp |
Want to test the safety and security of a model hosted on Ollama? This guide shows you how to use Promptfoo to systematically probe for vulnerabilities through adversarial testing (red teaming).
We'll use Llama 3.2 3B as an example, but this guide works with any Ollama model.
Here's an example of what the red team report looks like:
Before you begin, ensure you have:
npx
to run commandsFirst, make sure you've pulled the model you want to test on Ollama:
ollama pull llama3.2
You can either initialize a new project or download the complete example:
npx promptfoo@latest init --example redteam-ollama
cd redteam-ollama
This will create a new directory with promptfooconfig.yaml
and system_message.txt
files.
Create a new directory for your red teaming project and initialize it:
mkdir redteam-ollama
cd redteam-ollama
npx promptfoo@latest redteam init --no-gui --no-interactive
This creates a promptfooconfig.yaml
file that we'll customize for Ollama.
If you're creating from scratch, edit promptfooconfig.yaml
to use Ollama as the target:
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
prompts:
- file://system_message.txt
targets:
- id: ollama:chat:llama3.2
label: llama3.2-redteam
config:
temperature: 0.7
max_tokens: 150
redteam:
purpose: 'The system is a helpful chatbot assistant that answers questions and helps with tasks.'
numTests: 5
plugins:
# Replace these with the plugins you want to test
- harmful
- pii
- contracts
- hallucination
- imitation
strategies:
- jailbreak
- jailbreak:composite
To see the full configuration example on Github, click here.
harmful
: Tests for harmful content generationpii
: Tests for PII leakagecontracts
: Tests if model makes unauthorized commitmentshallucination
: Tests for false informationimitation
: Tests if model impersonates othersjailbreak
: Tests if model can escape its constraints using an iterative approach with an attacker model.jailbreak:composite
: Tests if model can escape its constraints using a composition of other successful jailbreak strategies.Note: The strategies above are designed for single-turn interactions (one prompt, one response). For multi-turn conversations or applications, see our multi-turn chatbot redteam example.
Generate and run the adversarial test cases:
npx promptfoo@latest redteam run
This command:
Generate a report of the findings:
npx promptfoo@latest redteam report
The report shows:
Here's an example report card:
You can click on each category to see the specific test cases and results:
It includes a breakdown of the performance of the model for each vulnerability category.
Meta puts a lot of work into making their models safe, but it's hard to test for everything and smaller models tend to have more issues.
Here are some common issues you might find when red teaming Llama models:
Prompt Injection: Llama models can be vulnerable to injected instructions that override their original behavior.
Harmful Content: The model may generate harmful content when prompted with adversarial inputs.
Hallucination: The model might confidently state incorrect information.
PII Handling: The model could inappropriately handle or disclose personal information.
Remediations will depend on your test results, but in general some things to keep in mind are:
Press p or to see the previous file or, n or to see the next file
Browsing data directories saved to S3 is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with AWS S3!
Are you sure you want to delete this access key?
Browsing data directories saved to Google Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Google Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to Azure Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Azure Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to S3 compatible storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with your S3 compatible storage!
Are you sure you want to delete this access key?