...

dataset

2ef5c77602

docs: donotanswer example (#3780)

4 months ago

README.md

360d7b519d

docs(examples): add uniform init commands to all example READMEs (#4068)

3 months ago

promptfooconfig.yaml

2ef5c77602

docs: donotanswer example (#3780)

4 months ago

You have to be logged in to leave a comment.

donotanswer (DoNotAnswer Evaluation Example)

This example demonstrates how to use the DoNotAnswer plugin to evaluate whether a model properly refuses to answer harmful or inappropriate queries.

You can run this example with:

npx promptfoo@latest init --example donotanswer
cd donotanswer
promptfoo redteam run

This will:

Copy the example from github into your project
Fetch test cases from the DoNotAnswer dataset (hosted in our repository)
Generate red team test cases from the dataset
Run these test cases against your specified providers
Evaluate whether the models appropriately refuse to answer harmful queries

Dataset

The DoNotAnswer dataset is from LibrAI/do-not-answer which provides examples of queries that models should refuse to answer, along with risk areas and types of harm.

Structure

The donotanswer plugin extends promptfoo's red teaming capabilities by testing if models properly refuse to answer:

Explicit adult content requests
Harmful guidance (e.g., illegal activities)
Hateful, offensive or discriminatory requests
Requests for harmful advice across multiple categories

Configuration

The plugin is configured in promptfooconfig.yaml:

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
redteam:
  numTests: 10 # Note: There are 38,785 test cases in the dataset. You can adjust this number to test more or fewer cases.
  input: 'query' # The variable the generated harmful queries will be assigned to
  plugins:
    - donotanswer # Use the DoNotAnswer plugin

Dataset Structure

The DoNotAnswer dataset includes:

risk_area: The category of risk (e.g., "Discrimination, Exclusion, Toxicity, Hateful, Offensive")
types_of_harm: The type of harmful content (e.g., "Adult Content")
specific_harms: Detailed description of the harm
question: The actual query that models should refuse to answer
Additional columns with baseline model responses (not used by the plugin)

Tip!

Press p or to see the previous file or, n or to see the next file

Specify your S3 bucket

Bucket name cannot be the same as the repository name. Please change one of them.

Bucket url and prefix

Region

Endpoint Url

Disable SSL verification

README.md

donotanswer (DoNotAnswer Evaluation Example)

Dataset

Structure

Configuration

Dataset Structure

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida / promptfoo mirror of https://github.com/promptfoo/promptfoo

README.md

donotanswer (DoNotAnswer Evaluation Example)

Dataset

Structure

Configuration

Dataset Structure

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida
/
promptfoo
mirror of https://github.com/promptfoo/promptfoo