Integration: git github

New topics will be added to the General category. Only displaying topics that weren't autoscanned from topics file

511 Branches 341 Releases

.github

0093c7cda5

add openai api key envar placeholder to github workflow

2 years ago

.jest

d3873a3f83

Expand prettier and add prettierignore

2 years ago

examples

0fd9542046

Add support for var arrays (#21)

2 years ago

src

b9eede737a

Add support for named prompts (#28)

2 years ago

test

b9eede737a

Add support for named prompts (#28)

2 years ago

.gitignore

f73ddcf94d

ignore vim swapfiles

2 years ago

.prettierignore

d3873a3f83

Expand prettier and add prettierignore

2 years ago

.prettierrc.yaml

d3873a3f83

Expand prettier and add prettierignore

2 years ago

LICENSE

0cd577bc00

Add license

2 years ago

README.md

9763589e2e

Update README.md

2 years ago

jest.config.ts

10655f2787

exclude examples from ci

2 years ago

package-lock.json

6cd3459c53

Bump version to 0.10.0

2 years ago

package.json

6cd3459c53

Bump version to 0.10.0

2 years ago

tsconfig.json

b1d98ab747

bail on esm pain

2 years ago

DagsHub Storage

You have to be logged in to leave a comment.

promptfoo: test your prompts

promptfoo is a tool for testing and evaluating LLM prompt quality.

With promptfoo, you can:

Systematically test prompts against predefined test cases
Evaluate quality and catch regressions by comparing LLM outputs side-by-side
Speed up evaluations with caching and concurrent tests
Score outputs automatically by defining "expectations"
Use as a CLI, or integrate into your workflow as a library
Use OpenAI models, open-source models like Llama and Vicuna, or integrate custom API providers for any LLM API

The goal: test-driven prompt engineering, rather than trial-and-error.

» View full documentation «

promptfoo produces matrix views that let you quickly evaluate outputs across many prompts.

Here's an example of a side-by-side comparison of multiple prompts and inputs:

It works on the command line too:

Workflow

Start by establishing a handful of test cases - core use cases and failure cases that you want to ensure your prompt can handle.

As you explore modifications to the prompt, use promptfoo eval to rate all outputs. This ensures the prompt is actually improving overall.

As you collect more examples and establish a user feedback loop, continue to build the pool of test cases.

Usage

To get started, run this command:

npx promptfoo init

This will create some placeholders in your current directory: prompts.txt and promptfooconfig.yaml.

After editing the prompts and variables to your liking, run the eval command to kick off an evaluation:

npx promptfoo eval

Configuration

The YAML configuration format runs each prompt through a series of example inputs (aka "test case") and checks if they meet requirements (aka "assert").

See the Configuration docs for a detailed guide.

prompts: [prompts.txt]
providers: [openai:gpt-3.5-turbo]
tests:
  - description: First test case - automatic review
    vars:
      var1: first variable's value
      var2: another value
      var3: some other value
    assert:
      - type: equality
        value: expected LLM output goes here
      - type: function
        value: output.includes('some text')

  - description: Second test case - manual review
    # Test cases don't need assertions if you prefer to review the output yourself
    vars:
      var1: new value
      var2: another value
      var3: third value

  - description: Third test case - other types of automatic review
    vars:
      var1: yet another value
      var2: and another
      var3: dear llm, please output your response in json format
    assert:
      - type: contains-json
      - type: similarity
        value: ensures that output is semantically similar to this text
      - type: llm-rubric
        value: ensure that output contains a reference to X

Tests on spreadsheet

Some people prefer to configure their LLM tests in a CSV. In that case, the config is pretty simple:

prompts: [prompts.txt]
providers: [openai:gpt-3.5-turbo]
tests: tests.csv

See example CSV.

Command-line

If you're looking to customize your usage, you have a wide set of parameters at your disposal.

Option	Description
`-p, --prompts <paths...>`	Paths to prompt files, directory, or glob
`-r, --providers <name or path...>`	One of: openai:chat, openai:completion, openai:model-name, localai:chat:model-name, localai:completion:model-name. See API providers
`-o, --output <path>`	Path to output file (csv, json, yaml, html)
`--tests <path>`	Path to external test file
`-c, --config <path>`	Path to configuration file. `promptfooconfig.js/json/yaml` is automatically loaded if present
`-j, --max-concurrency <number>`	Maximum number of concurrent API calls
`--table-cell-max-length <number>`	Truncate console table cells to this length
`--prompt-prefix <path>`	This prefix is prepended to every prompt
`--prompt-suffix <path>`	This suffix is append to every prompt
`--grader`	Provider that will conduct the evaluation, if you are using LLM to grade your output

After running an eval, you may optionally use the view command to open the web viewer:

npx promptfoo view

Examples

Prompt quality

In this example, we evaluate whether adding adjectives to the personality of an assistant bot affects the responses:

npx promptfoo eval -p prompts.txt -r openai:gpt-3.5-turbo -t tests.csv

This command will evaluate the prompts in prompts.txt, substituing the variable values from vars.csv, and output results in your terminal.

You can also output a nice spreadsheet, JSON, YAML, or an HTML file:

Model quality

In the next example, we evaluate the difference between GPT 3 and GPT 4 outputs for a given prompt:

npx promptfoo eval -p prompts.txt -r openai:gpt-3.5-turbo openai:gpt-4 -o output.html

Produces this HTML table:

Usage (node package)

You can also use promptfoo as a library in your project by importing the evaluate function. The function takes the following parameters:

testSuite: the Javascript equivalent of the promptfooconfig.yaml

interface TestSuiteConfig {
  providers: string[]; // Valid provider name (e.g. openai:gpt-3.5-turbo)
  prompts: string[]; // List of prompts
  tests: string | TestCase[]; // Path to a CSV file, or list of test cases

  defaultTest?: Omit<TestCase, 'description'>; // Optional: add default vars and assertions on test case
  outputPath?: string; // Optional: write results to file
}

interface TestCase {
  description?: string;
  vars?: Record<string, string>;
  assert?: Assertion[];

  prompt?: PromptConfig;
  grading?: GradingConfig;
}

interface Assertion {
  type: 'equality' | 'is-json' | 'contains-json' | 'function' | 'similarity' | 'llm-rubric';
  value?: string;
  threshold?: number; // For similarity assertions
  provider?: ApiProvider; // For assertions that require an LLM provider
}

options: misc options related to how the tests are run

interface EvaluateOptions {
  maxConcurrency?: number;
  showProgressBar?: boolean;
  generateSuggestions?: boolean;
}

Example

promptfoo exports an evaluate function that you can use to run prompt evaluations.

import promptfoo from 'promptfoo';

const results = await promptfoo.evaluate({
  prompts: ['Rephrase this in French: {{body}}', 'Rephrase this like a pirate: {{body}}'],
  providers: ['openai:gpt-3.5-turbo'],
  tests: [
    {
      vars: {
        body: 'Hello world',
      },
    },
    {
      vars: {
        body: "I'm hungry",
      },
    },
  ],
});

This code imports the promptfoo library, defines the evaluation options, and then calls the evaluate function with these options.

See the full example here, which includes an example results object.

Configuration

Main guide: Learn about how to configure your YAML file, setup prompt files, etc.
Configuring test cases: Learn more about how to configure expected outputs and test assertions.

Installation

See installation docs

API Providers

We support OpenAI's API as well as a number of open-source models. It's also to set up your own custom API provider. See Provider documentation for more details.

Development

Contributions are welcome! Please feel free to submit a pull request or open an issue.

promptfoo includes several npm scripts to make development easier and more efficient. To use these scripts, run npm run <script_name> in the project directory.

Here are some of the available scripts:

build: Transpile TypeScript files to JavaScript
build:watch: Continuously watch and transpile TypeScript files on changes
test: Run test suite
test:watch: Continuously run test suite on changes

» View full documentation «

Tip!

Press p or to see the previous file or, n or to see the next file

Use S3 like remote

Select bucket

Access key

Finish

Use any S3 compatible storage!

Browsing data directories saved to S3 compatible storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!

Specify your S3 bucket

Bucket name cannot be the same as the repository name. Please change one of them.

Bucket url and prefix

Region

Endpoint Url

Disable SSL verification

Congratulations!

promptfoo is now integrated with your S3 compatible storage!

README.md

promptfoo: test your prompts

» View full documentation «

Workflow

Usage

Configuration

Tests on spreadsheet

Command-line

Examples

Prompt quality

Model quality

Usage (node package)

Example

Configuration

Installation

API Providers

Development

» View full documentation «

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida / promptfoo mirror of https://github.com/promptfoo/promptfoo

README.md

promptfoo: test your prompts

» View full documentation «

Workflow

Usage

Configuration

Tests on spreadsheet

Command-line

Examples

Prompt quality

Model quality

Usage (node package)

Example

Configuration

Installation

API Providers

Development

» View full documentation «

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida
/
promptfoo
mirror of https://github.com/promptfoo/promptfoo