Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  git github
Ian Webster 27661c949d
mitmproxy support
1 year ago
0093c7cda5
add openai api key envar placeholder to github workflow
1 year ago
d3873a3f83
Expand prettier and add prettierignore
1 year ago
67fd82e0b3
fix typo in jest example
1 year ago
src
27661c949d
mitmproxy support
1 year ago
3d74acd634
Add support for grading by semantic similarity (#7)
1 year ago
f73ddcf94d
ignore vim swapfiles
1 year ago
d3873a3f83
Expand prettier and add prettierignore
1 year ago
d3873a3f83
Expand prettier and add prettierignore
1 year ago
0cd577bc00
Add license
1 year ago
ffe51c03b8
Separate test logic and export assertion test helpers
1 year ago
10655f2787
exclude examples from ci
1 year ago
27661c949d
mitmproxy support
1 year ago
27661c949d
mitmproxy support
1 year ago
b1d98ab747
bail on esm pain
1 year ago
Storage Buckets

README.md

You have to be logged in to leave a comment. Sign In

promptfoo: a prompt engineering tool

npm GitHub Workflow Status

promptfoo helps you tune LLM prompts systematically across many relevant test cases.

With promptfoo, you can:

  • Test multiple prompts against predefined test cases
  • Evaluate quality and catch regressions by comparing LLM outputs side-by-side
  • Speed up evaluations by running tests concurrently
  • Flag bad outputs automatically by setting "expectations"
  • Use as a command line tool, or integrate into your workflow as a library
  • Use OpenAI models, open-source models like Llama and Vicuna, or integrate custom API providers for any LLM API

» View full documentation «

promptfoo produces matrix views that allow you to quickly review prompt outputs across many inputs. The goal: tune prompts systematically across all relevant test cases, instead of testing prompts by trial and error.

Here's an example of a side-by-side comparison of multiple prompts and inputs:

Prompt evaluation matrix - web viewer

It works on the command line too: Prompt evaluation

Usage (command line & web viewer)

To get started, run the following command:

npx promptfoo init

This will create some templates in your current directory: prompts.txt, vars.csv, and promptfooconfig.js.

After editing the prompts and variables to your liking, run the eval command to kick off an evaluation:

npx promptfoo eval

If you're looking to customize your usage, you have the full set of parameters at your disposal:

npx promptfoo eval -p <prompt_paths...> -o <output_path> -r <providers> [-v <vars_path>] [-j <max_concurrency] [-c <config_path>] [--grader <grading_provider>]
  • <prompt_paths...>: Paths to prompt file(s)
  • <output_path>: Path to output CSV, JSON, YAML, or HTML file. Defaults to terminal output
  • <providers>: One or more of: openai:<model_name>, or filesystem path to custom API caller module
  • <vars_path> (optional): Path to CSV, JSON, or YAML file with prompt variables
  • <max_concurrency> (optional): Number of simultaneous API requests. Defaults to 4
  • <config_path> (optional): Path to configuration file
  • <grading_provider>: A provider that handles the grading process, if you are using LLM grading

After running an eval, you may optionally use the view command to open the web viewer:

npx promptfoo view

Examples

Prompt quality

In this example, we evaluate whether adding adjectives to the personality of an assistant bot affects the responses:

npx promptfoo eval -p prompts.txt -v vars.csv -r openai:gpt-3.5-turbo

This command will evaluate the prompts in prompts.txt, substituing the variable values from vars.csv, and output results in your terminal.

Have a look at the setup and full output here.

You can also output a nice spreadsheet, JSON, YAML, or an HTML file:

Table output

Model quality

In this example, we evaluate the difference between GPT 3 and GPT 4 outputs for a given prompt:

npx promptfoo eval -p prompts.txt -r openai:gpt-3.5-turbo openai:gpt-4 -o output.html

Produces this HTML table:

Side-by-side evaluation of LLM model quality, gpt3 vs gpt4, html output

Full setup and output here.

Usage (node package)

You can also use promptfoo as a library in your project by importing the evaluate function. The function takes the following parameters:

  • providers: a list of provider strings or ApiProvider objects, or just a single string or ApiProvider.

  • options: the prompts and variables you want to test:

    {
      prompts: string[];
      vars?: Record<string, string>;
    }
    

Example

promptfoo exports an evaluate function that you can use to run prompt evaluations.

import promptfoo from 'promptfoo';

const options = {
  prompts: ['Rephrase this in French: {{body}}', 'Rephrase this like a pirate: {{body}}'],
  vars: [{ body: 'Hello world' }, { body: "I'm hungry" }],
};

(async () => {
  const summary = await promptfoo.evaluate('openai:gpt-3.5-turbo', options);
  console.log(summary);
})();

This code imports the promptfoo library, defines the evaluation options, and then calls the evaluate function with these options. The results are logged to the console:

{
  "results": [
    {
      "prompt": {
        "raw": "Rephrase this in French: Hello world",
        "display": "Rephrase this in French: {{body}}"
      },
      "vars": {
        "body": "Hello world"
      },
      "response": {
        "output": "Bonjour le monde",
        "tokenUsage": {
          "total": 19,
          "prompt": 16,
          "completion": 3
        }
      }
    },
    // ...
  ],
  "stats": {
    "successes": 4,
    "failures": 0,
    "tokenUsage": {
      "total": 120,
      "prompt": 72,
      "completion": 48
    }
  },
  "table": [
    // ...
  ]
}

See full example here

Configuration

Installation

See installation docs

API Providers

We support OpenAI's API as well as a number of open-source models. It's also to set up your own custom API provider. See Provider documentation for more details.

Development

Contributions are welcome! Please feel free to submit a pull request or open an issue.

promptfoo includes several npm scripts to make development easier and more efficient. To use these scripts, run npm run <script_name> in the project directory.

Here are some of the available scripts:

  • build: Transpile TypeScript files to JavaScript
  • build:watch: Continuously watch and transpile TypeScript files on changes
  • test: Run test suite
  • test:watch: Continuously run test suite on changes

» View full documentation «

Tip!

Press p or to see the previous file or, n or to see the next file

About

Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.

Collaborators 1

Comments

Loading...