...

README.md

ccbbe46727

chore: add cursor AI rules for development workflow (#3326)

5 months ago

promptfooconfig.yaml

02af9d6e5b

refactor(providers): split anthropic provider into modular components (#3406)

5 months ago

You have to be logged in to leave a comment.

description
Demonstrates Claude's thinking capability for complex problem solving

Claude's Step-by-Step Thinking Demonstration

This example demonstrates Claude's "thinking" capability, which allows you to see the model's step-by-step reasoning process before it provides a final answer. The example compares thinking outputs from both the Anthropic API directly and Claude on AWS Bedrock.

You can run this example with:

npx promptfoo@latest init --example claude-thinking

What This Example Demonstrates

Using Claude's thinking feature to reveal step-by-step reasoning
Comparing thinking output quality between different Claude providers
Configuring the thinking token budget
Using LLM-based evaluation rubrics to assess reasoning quality

Environment Variables

This example requires at least one of the following sets of credentials:

For Anthropic API (Recommended)

ANTHROPIC_API_KEY - Your Anthropic API key from console.anthropic.com

For AWS Bedrock

AWS_ACCESS_KEY_ID - Your AWS access key
AWS_SECRET_ACCESS_KEY - Your AWS secret key
Or configure credentials via the AWS CLI: aws configure

Prerequisites

For AWS Bedrock, you must:

Enable Claude model access in your AWS account
- Go to the AWS Bedrock console
- Navigate to "Model access" in the left sidebar
- Find "Anthropic - Claude" and click "Edit"
- Enable the model and save changes
Ensure you have permissions to use the Claude model
Set the correct AWS region in the config (default: us-west-2)

Running the Example

After setting up environment variables:

# From the example directory
promptfoo eval
promptfoo view

Test Cases

This example includes several test cases of increasing complexity:

8 Balls Problem - A classic logic puzzle requiring careful reasoning
Train Meeting Problem - A traditional algebra word problem

These test cases are specifically designed to showcase Claude's ability to break down complex problems and show detailed thinking steps.

How Claude Thinking Works

The thinking feature is enabled by setting special parameters in the provider configuration:

thinking:
  type: 'enabled'
  budget_tokens: 4096 # Controls how many tokens are allocated for thinking
max_tokens: 8192 # Must be greater than budget_tokens

When enabled, Claude's response will include a "Thinking:" section that shows its reasoning process before the final answer:

Thinking: Let me solve this step by step...
1. First, I'll divide the 8 balls into three groups...
2. In the first weighing, I'll compare groups A and B...
3. Based on the result, I can determine...

Final answer: We need exactly 2 weighings to find the heavier ball.

Additional Resources

Tip!

Press p or to see the previous file or, n or to see the next file

Specify your S3 bucket

Bucket name cannot be the same as the repository name. Please change one of them.

Bucket url and prefix

Region

Endpoint Url

Disable SSL verification

README.md

Claude's Step-by-Step Thinking Demonstration

What This Example Demonstrates

Environment Variables

For Anthropic API (Recommended)

For AWS Bedrock

Prerequisites

Running the Example

Test Cases

How Claude Thinking Works

Additional Resources

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida / promptfoo mirror of https://github.com/promptfoo/promptfoo

README.md

Claude's Step-by-Step Thinking Demonstration

What This Example Demonstrates

Environment Variables

For Anthropic API (Recommended)

For AWS Bedrock

Prerequisites

Running the Example

Test Cases

How Claude Thinking Works

Additional Resources

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida
/
promptfoo
mirror of https://github.com/promptfoo/promptfoo