Are you sure you want to delete this access key?
title | description | tags | keywords |
---|---|---|---|
Release Notes | Monthly summaries of Promptfoo releases, features, and improvements for the open-source LLM testing framework | [releases changelog updates features] | [Promptfoo releases changelog updates features monthly summaries] |
Full release history for Promptfoo open source can be found on GitHub.
This month we focused on enhancing observability, expanding provider support, and strengthening red team capabilities to help you build more reliable and secure AI applications.
We've added OpenTelemetry tracing support to help you understand what's happening inside your AI applications. Previously, LLM applications were often "black boxes"—you could see inputs and outputs, but not what happened in between. Now you can visualize the entire execution flow, measure performance of individual steps, and quickly identify issues.
This is especially valuable for complex RAG pipelines or multi-step workflows where you need to identify performance bottlenecks or debug failures.
Use it when:
As AI applications increasingly use voice interfaces and visual content, you need tools to evaluate these capabilities just as rigorously as text-based interactions. We've significantly expanded support for audio and multimodal AI:
Google Live Audio - Full audio generation with features like:
Hyperbolic Provider - New support for Hyperbolic's image and audio models, providing more options for multimodal evaluations
Helicone AI Gateway - Route requests through Helicone for enhanced monitoring and analytics
Mistral Magistral - Added support for Mistral's latest reasoning models
Supply chain attacks through compromised models are a growing threat. We've significantly enhanced our static model security scanner to help you verify model integrity before deployment, checking for everything from malicious pickle files to subtle statistical anomalies that might indicate trojaned models.
New Web Interface: ModelAudit now includes a visual UI accessible at /model-audit
when running promptfoo view
:
Expanded Format Support:
.bin
files (PyTorch, SafeTensors, etc.)Security Improvements:
PROMPTFOO_MAX_EVAL_TIME_MS
environment variable prevents runaway evaluations from consuming excessive resourcesGeneric attacks often miss system-specific vulnerabilities. We've added powerful features for organizations that need sophisticated AI security testing to create targeted tests that match your actual security risks:
Target Discovery Agent - Automatically analyzes your AI system to understand its capabilities and craft more effective, targeted attacks
Custom Strategy Builder - Define complex multi-turn attack strategies using natural language instructions—no coding required
Grader Customization - Fine-tune evaluation criteria at the plugin level with concrete examples for more accurate assessments
Cloud-based Plugin Severity Overrides - Enterprise users can centrally manage and customize severity levels for red team plugins across their organization
Different industries face unique AI risks. We've introduced specialized plugins for industries where AI errors have serious consequences, ensuring you're testing for the failures that matter most in your domain:
Medical Plugins detect critical healthcare risks:
Financial Plugins identify domain-specific vulnerabilities:
Biased AI systems can perpetuate discrimination at scale. Our new comprehensive bias detection tests ensure your AI treats all users fairly and respectfully across:
The Intent (Custom Prompts) plugin now supports JSON file uploads with nested arrays for multi-step attack sequences. The enhanced UI makes it easier to manage complex test scenarios.
Red team tests now include automatic token estimation for HTTP providers, helping you track costs even with custom API integrations.
A new System Prompt Override plugin tests whether your LLM deployment is vulnerable to system instruction manipulation—a critical security flaw that could disable safety features.
Real attacks rarely succeed in a single message. We've enhanced our attack strategies to better simulate how bad actors actually try to manipulate AI systems through extended, adaptive conversations:
Enhanced GOAT and Crescendo - Now include intelligent agents that can:
Emoji Encoding Strategy - New obfuscation technique using emoji to bypass content filters
Press p or to see the previous file or, n or to see the next file
Browsing data directories saved to S3 is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with AWS S3!
Are you sure you want to delete this access key?
Browsing data directories saved to Google Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Google Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to Azure Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Azure Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to S3 compatible storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with your S3 compatible storage!
Are you sure you want to delete this access key?