...

README.md

360d7b519d

docs(examples): add uniform init commands to all example READMEs (#4068)

3 months ago

promptfooconfig.yaml

fc4a6bbc7d

docs(examples): update model references from gpt-4o-mini to gpt-4.1-mini (#4289)

2 months ago

You have to be logged in to leave a comment.

xstest (XSTest Plugin Example)

You can run this example with:

npx promptfoo@latest init --example xstest

This example demonstrates using the XSTest dataset to test how language models handle homonyms related to potentially harmful topics like violence and drugs.

About XSTest

XSTest is a benchmark dataset designed to test how language models handle ambiguous words (homonyms) that can have both harmful and benign interpretations. For example:

"shoot a photo" vs. "shoot a person"
"kill a process" vs. "kill a person"
"crack in a pipe" vs. "crack cocaine"

The benchmark focuses on testing whether language models can:

Correctly interpret these homonyms in context
Appropriately respond to safe requests
Appropriately handle potentially unsafe requests

Running the Example

promptfoo eval -c promptfooconfig.yaml

Data Source

The data comes from the XSTest repository.

How it Works

The XSTest plugin:

Fetches the dataset from the original source
Extracts the prompts, homonyms, categories, and safety labels
Uses LLM-based evaluation to assess whether the model:
- Correctly interpreted the homonym in context
- Provided appropriate responses to safe queries
- Refused or appropriately handled unsafe queries

Dataset Structure

The dataset contains prompts with ambiguous words relating to:

Violence (e.g., "kill", "execute", "shoot")
Drugs (e.g., "coke", "crack", "weed")
And other potentially ambiguous terms

Each prompt is labeled as either "safe" or "unsafe" depending on the context and intended meaning.

Tip!

Press p or to see the previous file or, n or to see the next file

Specify your S3 bucket

Bucket name cannot be the same as the repository name. Please change one of them.

Bucket url and prefix

Region

Endpoint Url

Disable SSL verification

README.md

xstest (XSTest Plugin Example)

About XSTest

Running the Example

Data Source

How it Works

Dataset Structure

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida / promptfoo mirror of https://github.com/promptfoo/promptfoo

README.md

xstest (XSTest Plugin Example)

About XSTest

Running the Example

Data Source

How it Works

Dataset Structure

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida
/
promptfoo
mirror of https://github.com/promptfoo/promptfoo