Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  git github
Michael D'Angelo 8303c307ee
Start to add feedback
7 months ago
f984945770
fix(devcontainer): simplify and standardize development environment (#1547)
1 year ago
890a1364c0
ci(workflows): add actionlint GitHub Action for workflow validation (#2604)
7 months ago
8449ea3341
Normalize eval results in db (#1776)
10 months ago
2ac78ff2d2
feat(redteam): add Likert-based jailbreak strategy (#2614)
7 months ago
8afa0ffe8b
chore: separate errors from assert failures (#2214)
8 months ago
ac58592414
fix(ci): updated yanked dependency and ruff format (#2608)
7 months ago
a3cf92ebea
feat: helm chart for self hosted (#2003)
9 months ago
8494c43824
chore: replace node-fetch with native fetch API (#1968)
10 months ago
62580a9eb0
docs(strategy-table): enhance grouping and ordering logic (#2640)
7 months ago
src
8303c307ee
Start to add feedback
7 months ago
2ac78ff2d2
feat(redteam): add Likert-based jailbreak strategy (#2614)
7 months ago
ee622119a1
feat: Migrate NextUI to a React App (#1637)
11 months ago
efc922a371
chore(examples): add image saving hook for DALL-E outputs in redteam-dalle (#2607)
7 months ago
e1aa6ab106
docs: Merge docs into main repo (#317)
1 year ago
2041217c15
chore: update Node.js to v20.18.1 (#2342)
8 months ago
a3cf92ebea
feat: helm chart for self hosted (#2003)
9 months ago
6e4fdcd886
chore: sort imports (#1006)
1 year ago
dc93d92fa2
0.103.9
7 months ago
242b7cbbaa
docs: add contributing guide (#1150)
1 year ago
5afbef5f30
chore: dont run docker as root (#1884)
10 months ago
e1d3b0f2e1
docs(license): update year and clarify licensing terms (#2596)
7 months ago
c905d3bb4f
docs: readme overhaul (#2502)
8 months ago
dcddee95ee
chore: migrate drizzle (#1922)
10 months ago
ed7b0e710e
chore(deps): update patch and minor dependencies (#2064)
9 months ago
4749c57232
feat: add install script for pre-built binary installation (#1755)
11 months ago
d5b1130e26
ci(tests): separate unit and integration tests in CI pipeline (#1849)
10 months ago
d5b1130e26
ci(tests): separate unit and integration tests in CI pipeline (#1849)
10 months ago
8449ea3341
Normalize eval results in db (#1776)
10 months ago
dc93d92fa2
0.103.9
7 months ago
dc93d92fa2
0.103.9
7 months ago
23b3665aa7
chore: invariant (#2363)
8 months ago
Storage Buckets

README.md

You have to be logged in to leave a comment. Sign In

Promptfoo: LLM evals & red teaming

npm npm GitHub Workflow Status MIT license Discord

promptfoo is a developer-friendly local tool for testing LLM applications. Stop the trial-and-error approach - start shipping secure, reliable AI apps.

Quick Start

# Install and initialize project
npx promptfoo@latest init

# Run your first evaluation
npx promptfoo eval

See Getting Started (evals) or Red Teaming (vulnerability scanning) for more.

What can you do with Promptfoo?

  • Test your prompts and models with automated evaluations
  • Secure your LLM apps with red teaming and vulnerability scanning
  • Compare models side-by-side (OpenAI, Anthropic, Azure, Bedrock, Ollama, and more)
  • Automate checks in CI/CD
  • Share results with your team

Here's what it looks like in action:

prompt evaluation matrix - web viewer

It works on the command line too:

prompt evaluation matrix - command line

It also can generate security vulnerability reports:

gen ai red team

Why promptfoo?

  • 🚀 Developer-first: Fast, with features like live reload and caching
  • 🔒 Private: Runs 100% locally - your prompts never leave your machine
  • 🔧 Flexible: Works with any LLM API or programming language
  • 💪 Battle-tested: Powers LLM apps serving 10M+ users in production
  • 📊 Data-driven: Make decisions based on metrics, not gut feel
  • 🤝 Open source: MIT licensed, with an active community

Learn More

Contributing

We welcome contributions! Check out our contributing guide to get started.

Join our Discord community for help and discussion.

Tip!

Press p or to see the previous file or, n or to see the next file

About

Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.

Collaborators 1

Comments

Loading...