Are you sure you want to delete this access key?
date | image |
---|---|
2024-11-23 | /img/blog/llama-red-team/llama-hacker.webp |
Want to test the safety and security of a model hosted on Ollama? This guide shows you how to use Promptfoo to systematically probe for vulnerabilities through adversarial testing (red teaming).
We'll use Llama 3.2 3B as an example, but this guide works with any Ollama model.
Here's an example of what the red team report looks like:
Before you begin, ensure you have:
npx
to run commandsFirst, make sure you've pulled the model you want to test on Ollama:
ollama pull llama3.2
Create a new directory for your red teaming project and initialize it:
mkdir ollama-redteam
cd ollama-redteam
npx promptfoo@latest redteam init --no-gui --no-interactive
This creates a promptfooconfig.yaml
file that we'll customize for Ollama.
Edit promptfooconfig.yaml
to use Ollama as the target:
targets:
- id: ollama:chat:llama3.2
label: llama3.2-redteam
config:
temperature: 0.7
max_tokens: 150
purpose: 'The system is a helpful chatbot assistant that answers questions and helps with tasks.'
redteam:
plugins:
# Replace these with the plugins you want to test
- harmful
- pii
- contracts
- hallucination
- imitation
strategies:
- jailbreak
- prompt-injection
numTests: 5
To see the full configuration example on Github, click here.
harmful
: Tests for harmful content generationpii
: Tests for PII leakagecontracts
: Tests if model makes unauthorized commitmentshallucination
: Tests for false informationimitation
: Tests if model impersonates othersjailbreak
: Tests if model can escape its constraintsprompt-injection
: Tests if model is susceptible to injected instructionsGenerate and run the adversarial test cases:
npx promptfoo@latest redteam run
This command:
Generate a report of the findings:
npx promptfoo@latest redteam report
The report shows:
Here's an example report card:
You can click on each category to see the specific test cases and results:
It includes a breakdown of the performance of the model for each vulnerability category.
Meta puts a lot of work into making their models safe, but it's hard to test for everything and smaller models tend to have more issues.
Here are some common issues you might find when red teaming Llama models:
Prompt Injection: Llama models can be vulnerable to injected instructions that override their original behavior.
Harmful Content: The model may generate harmful content when prompted with adversarial inputs.
Hallucination: The model might confidently state incorrect information.
PII Handling: The model could inappropriately handle or disclose personal information.
Remeditations will depend on your test results, but in general some things to keep in mind are:
Press p or to see the previous file or, n or to see the next file
Browsing data directories saved to S3 is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with AWS S3!
Are you sure you want to delete this access key?
Browsing data directories saved to Google Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Google Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to Azure Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Azure Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to S3 compatible storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with your S3 compatible storage!
Are you sure you want to delete this access key?