Are you sure you want to delete this access key?
sidebar_label |
---|
CI/CD |
When scaling an LLM app, it's essential to be able to measure the impact of any prompt or model change. This guide shows how to use integrate promptfoo with CI/CD workflows to automatically evaluate test cases and ensure quality.
This approach works for any CI system. If you're using Github, you can skip directly to the Github Actions tutorial or view the action on the Github Marketplace.
Monitor Changes: Configure your CI/CD pipeline to trigger on changes to prompt files. This can be done by setting path filters for pull requests or merge requests.
Install promptfoo: Ensure that the `promptfoo`` CLI is installed in the CI/CD environment. You can install it using package managers like npm:
npm install -g promptfoo
See Getting Started for more info.
Set API Keys: Set the necessary API keys as environment variables in your CI/CD configuration. This may include keys for OpenAI, Azure, or other LLM providers.
Run Evaluation: Create a step in your pipeline to execute the promptfoo evaluation. Use the promptfoo eval
command, specifying the configuration file and the prompts to evaluate.
promptfoo eval -c path/to/config.yaml --prompts path/to/prompts/**/*.json --share -o output.json
If do not want to automatically create a web-accessible eval view, remove the --share
option.
Handle Results: After running the evaluation, you can parse the results and take actions such as commenting on pull requests, failing the build if there are issues, or posting the results to a dashboard.
The schema of the output.json
file is defined here, and follows this format:
interface OutputFile {
evalId?: string;
results: EvaluateSummary;
config: Partial<UnifiedConfig>;
shareableUrl: string | null;
}
See definitions of EvaluateSummary and UnifiedConfig.
Here's an example of how you can use the output:
// Parse the output file to get the evaluation results
const output: OutputFile = JSON.parse(fs.readFileSync('output.json', 'utf8'));
// Log the number of successful and failed evaluations
console.log(`Successes: ${output.results.stats.successes}`);
console.log(`Failures: ${output.results.stats.failures}`);
console.log(`View eval results: ${output.shareableUrl}`);
For a real-world example, see the Github Action source code.
Cache Results: To improve efficiency and reduce API calls, you can enable caching in your CI/CD pipeline. This will reuse results from previous LLM requests and outputs for subsequent evaluations.
Configure caching by setting the PROMPTFOO_CACHE_PATH
environment variable to a persistent directory in your CI environment. You can also control cache behavior using other environment variables such as PROMPTFOO_CACHE_TYPE
, PROMPTFOO_CACHE_MAX_FILE_COUNT
, and PROMPTFOO_CACHE_TTL
. For more details on caching configuration, refer to the caching documentation.
Here's an example of how to set up caching in a GitHub Actions workflow:
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- name: Set up caching for promptfoo
uses: actions/cache@v2
with:
path: ~/.promptfoo/cache
key: ${{ runner.os }}-promptfoo-${{ hashFiles('**/prompts/**') }}
restore-keys: |
${{ runner.os }}-promptfoo-
Ensure that the PROMPTFOO_CACHE_PATH
environment variable in your promptfoo eval
command matches the path specified in the cache action.
Here's a simplified example of how you might set up a GitHub Actions workflow to evaluate prompts on every pull request:
name: 'LLM Prompt Evaluation'
on:
pull_request:
paths:
- 'path/to/prompts/**'
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up promptfoo
run: npm install -g promptfoo
- name: Run promptfoo evaluation
run: promptfoo eval -c path/to/config.yaml --prompts path/to/prompts/**/*.json -o output.json
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
If you're using Github, there's a full solution in the Github Actions tutorial, and you can also view the action on the Github Marketplace.
Press p or to see the previous file or, n or to see the next file
Browsing data directories saved to S3 is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with AWS S3!
Are you sure you want to delete this access key?
Browsing data directories saved to Google Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Google Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to Azure Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Azure Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to S3 compatible storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with your S3 compatible storage!
Are you sure you want to delete this access key?