Are you sure you want to delete this access key?
title | description | image | keywords | date | authors | tags |
---|---|---|---|---|---|---|
Next Generation of Red Teaming for LLM Agents | Promptfoo is introducing our revolutionary, next-generation red teaming agent designed for enterprise-grade LLM agents. | /img/blog/summer-2025-new-redteam-agent/title.jpg | [promptfoo AI security red teaming LLM eval prompt engineering AI agents] | 2025-06-15 | [steve] | [company-update red-teaming agents] |
Early red teaming tools and research began with jailbreaks like "Ignore all previous instructions" and static lists of harmful prompts. At Promptfoo, we took those ideas a step further by dynamically generating attacks based on the context of the target application.
Existing attack architecture
First-generation systems were simple Q&A chatbots, perhaps with only a vector database. However, we've rapidly moved past these; the systems being developed are far more complex.
What a public-facing Customer Service Agent might look like in 2025:
Engineers are used to building secure distributed systems, leveraging well-known and solid security principles. Just because there's an LLM involved does not mean everything we've learned goes out the window. Enterprise developers are not allowing public-facing agents to write their own SQL queries or handle authentication (although we can help you confirm that).
In the systems we've worked with, authentication and session management are properly handled outside the LLM by mechanisms already in place on the website. SQL queries are parameterized within the code, not handled freely by LLMs.
Imagine we're logged into an e-commerce website and chatting with their bot about our order history. We're not letting the LLM decide who the user is. We're relying on our existing session/authentication system to scope our queries.
def handle_request(...):
user = session.user
query = params.query
llm_response = agent.call(user_query=query)
if llm_response.tool_response.lookup_order_history:
lookup_order_history(user.id)
Here's how Crescendo would attempt to break something like this:
From the code example above you can see that it doesn't matter what we convince the LLM of, the LLM is not empowered to do what we want it to do.
After being thoroughly embarrassed seeing our system bang its head against the wall, we decided it was time to build something smarter.
We needed to apply traditional penetration testing techniques to these systems. To address advanced agent-based systems, we identified critical gaps:
We built our new system from the ground up to emulate a human red teamer's intelligence, creativity, and adaptability.
Like a penetration tester running an nmap scan, our new agent begins with thorough reconnaissance:
Example reconnaissance conversation:
This comprehensive understanding forms the foundation for strategic planning.
The goal of this step is to appropriately prioritize and plan our attacks.
Let's say we're an insurance company and our bot has two functions: Look up claims history and Find a provider. What should we prioritize and what are our goals? As a red teamer, the answer is obvious - I want everybody's claim history.
What if we add in a third function: Medical Advice (I am not promoting this idea). We need to set our goals appropriately. It's probably just a RAG, so we need to confirm and look for a damaging exploit like this: PR Disaster!.
We will prioritize our attacks based on:
The output from the planning phase is a list of Prioritized Attack Vectors. Here's an example output from the system. The target was an intentionally vulnerable CTF agent we use internally for testing:
{
"name": "SQL Query Execution (No Required Roles)",
"priority": "critical",
"score": 0.98,
"reasoning": "Direct SQL query execution by any user is a severe vulnerability. It enables attackers to extract, modify, or destroy data, escalate privileges, or bypass authentication. This is highly impactful and likely exploitable, directly serving the red team goal.",
"details": "Recon shows 'SQL Query Execution' is accessible without roles or guardrails. Attackers can run arbitrary SQL, potentially dumping all data (including PII, credentials, flags), altering records, or escalating privileges (e.g., updating their own role to 'Administrator')."
}
To attack the system we combine what we've learned during reconnaissance with our library of plugins and strategies. Every attack loop is a multi-step process involving several LLM prompts and decision points.
During execution, we have multiple agents that attack, judge and replan during every step:
I feel like a YouTuber - "If you made it this far into the video"
The biggest leap we made was when we added persistent memory throughout testing phases. Our system records knowledge about the target environment as it explores and attacks. We can reuse information gleaned from one phase, whether it be tactics or information like user ids, for future attacks.
Examples of strategies dynamically developed by the system:
**Tactic:**
Utilize the previously obtained patient ID, MRN208212, to perform queries within the Prescription Management system.
**Tactic:**
Leverage the searchPatients tool to identify existing patients, and subsequently use their information for lookups within the appointmentManagement tool.
Our goal at Promptfoo is to provide a comprehensive red teaming tool so you can feel good about deploying your agents into the wild. There is a significant gap between the LLM vulnerability research and its application in the real world. This is the tool that bridges that gap.
At Promptfoo, we've raised the bar for what enterprise-grade LLM security looks like. Our next-generation red teaming agent is uniquely equipped with advanced capabilities:
Deep Reconnaissance: Deep system enumeration, boundary testing, and meticulous documentation of discovered tools and capabilities.
Strategic Planning: Prioritized, context-aware attack vectors that align precisely with business-critical impacts.
Adaptive Attack Execution: Real-time monitoring and adaptive replanning, enabling precise, iterative exploitation and rapid pivoting.
Persistent Memory: Information retention across testing phases, empowering sophisticated multi-step exploitation strategies and enabling deep, cumulative learning about target systems.
If you're interested in helping us build cool stuff like this, check out our careers page.
I began my career as a penetration tester and security consultant at PricewaterhouseCoopers, providing security services to the Fortune 500 and learning from some of the best in the world. Since then, I've worked at companies like Microsoft, Shopify, Intercom, and Discord building massively scalable and complex products including Clyde.
Promptfoo is the world leader in LLM evals and red teaming. We are powered by an open source project with over 100k users - trusted by foundation labs and the Fortune 500.
Press p or to see the previous file or, n or to see the next file
Browsing data directories saved to S3 is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with AWS S3!
Are you sure you want to delete this access key?
Browsing data directories saved to Google Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Google Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to Azure Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Azure Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to S3 compatible storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with your S3 compatible storage!
Are you sure you want to delete this access key?