Are you sure you want to delete this access key?
sidebar_label | description |
---|---|
Simulated User | Simulate realistic user interactions and behaviors for comprehensive testing of conversational AI systems and chatbots |
The Simulated User Provider enables testing of multi-turn conversations between an AI agent and a simulated user. This is particularly useful for testing chatbots, virtual assistants, and other conversational AI applications in realistic scenarios.
It works with both simple text-based agents and advanced function-calling agents, making it ideal for testing modern AI systems that use structured APIs.
It is inspired by Tau-bench, a benchmark for evaluating tool-assisted agents.
To use the Simulated User Provider, set the provider id
to promptfoo:simulated-user
and provide configuration options:
tests:
- provider:
id: 'promptfoo:simulated-user'
config:
maxTurns: 10
instructions: 'You are mia_li_3668. You want to fly from New York to Seattle on May 20 (one way). You do not want to fly before 11am EST. You want to fly in economy. You prefer direct flights but one stopover is also fine. If there are multiple options, you prefer the one with the lowest price. You have 3 bags. You do not want insurance. You want to use your two certificates to pay. If only one certificate can be used, you prefer using the larger one, and pay the rest with your 7447 card. You are reactive to the agent and will not say anything that is not asked. Your birthday is in your user profile so you do not prefer to provide it.'
You may also find it easiest to set the provider on defaultTest
, which turns every test into a simulated user conversation using the instructions
variable:
defaultTest:
provider:
id: 'promptfoo:simulated-user'
config:
maxTurns: 10
tests:
- vars:
instructions: 'You are mia_li_3668...'
The Simulated User Provider facilitates a back-and-forth conversation between:
For each turn:
Option | Type | Description |
---|---|---|
instructions |
string | Template for user instructions. Supports Nunjucks templating with access to test variables. |
maxTurns |
number | Maximum number of conversation turns. Defaults to 10. |
Here's a simple example testing a customer service agent:
prompts:
- You are a helpful customer service agent. Answer questions politely and try to resolve issues.
providers:
- openai:gpt-4o-mini
defaultTest:
provider:
id: 'promptfoo:simulated-user'
config:
maxTurns: 5
tests:
- vars:
instructions: You are a frustrated customer whose package was delivered to the wrong address. You want a refund but are willing to accept store credit if offered.
For complex scenarios with function calling, you can define structured APIs with mock implementations:
providers:
- id: openai:gpt-4.1-mini
config:
tools:
- file://functions/search_flights.json
functionToolCallbacks:
search_flights: file://callbacks/airline-functions.js:searchFlights
Where functions/search_flights.json
defines the function schema and callbacks/airline-functions.js
contains the mock implementation that returns realistic data.
The output will show the full conversation history with each turn separated by "---":
User: I need help booking a flight from New York to Seattle on May 20th
Assistant: I'd be happy to help! Could you provide your user ID so I can access your profile?
---
User: It's mia_li_3668
Assistant: [makes function call to search flights]
Let me search for flights from New York to Seattle on May 20th...
---
User: I prefer direct flights but one stop is okay if it's cheaper ###STOP###
You can add assertions to automatically evaluate conversation quality:
tests:
- vars:
instructions: You are a budget-conscious traveler wanting economy flights under $350
assert:
- type: llm-rubric
value: |
Did the budget traveler get what they wanted?
Pass if: Got economy flight under $350 and used certificates for payment
Fail if: Failed to book economy or got expensive flight over $400
This enables automatic evaluation of whether your agent successfully handles different customer types and scenarios.
For a complete working example with 31 customer personas and comprehensive assertions, see the Simulated User example.
The Simulated User Provider works seamlessly with custom providers (Python, JavaScript, etc.). All test-level vars
are automatically passed to your custom provider's context, allowing you to access dynamic values like user IDs, session data, or routing information during conversations.
providers:
- id: file://my_custom_agent.py
config:
base_url: https://api.example.com
defaultTest:
provider:
id: 'promptfoo:simulated-user'
config:
maxTurns: 5
tests:
- vars:
workflow_id: 'wf-123'
session_id: 'sess-456'
instructions: |
You are booking a flight. Ask for the workflow ID to track your request.
In your custom provider, you can access these vars:
def call_api(prompt, options, context):
# Access vars from the simulated conversation
workflow_id = context['vars']['workflow_id'] # "wf-123"
session_id = context['vars']['session_id'] # "sess-456"
# Use them in your logic
response = f"I'll track this as workflow {workflow_id}..."
return {"output": response}
This enables sophisticated testing scenarios where your custom provider can:
When using promptfoo as a Node library, provide the equivalent configuration:
{
providers: [
{
id: 'promptfoo:simulated-user',
config: {
instructions: 'You are a customer with the following problem: {{problem}}',
maxTurns: 5,
},
},
];
}
The conversation will automatically stop when:
maxTurns
limit is reached###STOP###
anywhere in its responseThe ###STOP###
marker is useful for agents that can determine when a conversation has reached a natural conclusion (e.g., task completed, user satisfied).
The simulated user provider assumes that the target endpoint accepts messages in OpenAI chat format:
type Messages = {
role: 'user' | 'assistant' | 'system';
content: string;
}[];
The original prompt is sent as a system message to initialize the agent's behavior. For function-calling agents, include your function definitions in the provider configuration.
Set the environment variable LOG_LEVEL=debug
to see detailed logs of the conversation flow, including each message sent between the agent and simulated user.
LOG_LEVEL=debug promptfoo eval
Press p or to see the previous file or, n or to see the next file
Browsing data directories saved to S3 is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with AWS S3!
Are you sure you want to delete this access key?
Browsing data directories saved to Google Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Google Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to Azure Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Azure Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to S3 compatible storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with your S3 compatible storage!
Are you sure you want to delete this access key?