...

README.md

3abc6b0ad2

feat: add max-score assertion for objective output selection (#5067)

3 weeks ago

promptfooconfig.yaml

3abc6b0ad2

feat: add max-score assertion for objective output selection (#5067)

3 weeks ago

You have to be logged in to leave a comment.

Max-Score Selection Example

This example demonstrates the max-score assertion type for objective output selection based on aggregated scores from other assertions.

Overview

The max-score assertion provides a deterministic way to select the best output from multiple providers by:

Aggregating scores from other assertions (correctness, quality, documentation, etc.)
Applying configurable weights to different assertion types
Selecting the output with the highest weighted score
Providing objective, reproducible selection criteria

Key Differences from `select-best`

Objective: Uses quantifiable scores rather than LLM judgment
Deterministic: Same inputs always produce same selection
Transparent: Clear scoring methodology based on weighted assertions
Cost-effective: No additional LLM calls for selection

Configuration

- type: max-score
  value:
    method: average # 'average' (default) or 'sum'
    weights:
      python: 3 # Weight for Python code correctness tests
      llm-rubric: 1 # Weight for LLM-evaluated quality rubrics
      javascript: 2 # Weight for JavaScript tests
      contains: 0.5 # Weight for simple string matching
    threshold: 0.7 # Optional minimum score threshold

Options

method: How to aggregate scores
- average (default): Weighted average of assertion scores
- sum: Weighted sum of assertion scores
weights: Map of assertion types to their weights (default: 1.0)
threshold: Minimum score required for selection (optional)

Usage

Basic Example

# Run the main example (requires API keys for OpenAI/Anthropic)
npx promptfoo@latest eval

How It Works

Multiple Outputs Generated: Each provider generates a solution
Assertions Evaluated: All assertions run on each output:
- Python tests verify correctness (pass=1, fail=0)
- LLM rubrics evaluate quality aspects (0-1 score)
- Other assertions contribute their scores
Scores Aggregated: Max-score calculates weighted score for each output
Best Selected: Output with highest score is marked as passing
Results Shown: Clear indication of which output won and why

Example Scoring

Given three outputs with these assertion results:

Output A: python=1.0, documentation=0.5, efficiency=0.7
Output B: python=1.0, documentation=0.9, efficiency=0.8
Output C: python=0.0, documentation=1.0, efficiency=1.0

With weights: python=3, llm-rubric=1

Output A: (3×1.0 + 1×0.5 + 1×0.7) / 5 = 0.84
Output B: (3×1.0 + 1×0.9 + 1×0.8) / 5 = 0.94 ✓ (selected)
Output C: (3×0.0 + 1×1.0 + 1×1.0) / 5 = 0.40

When to Use max-score

Use max-score when:

You have objective criteria (tests, metrics)
You want reproducible results
You need to weight different aspects differently
You want to avoid additional API costs

Use select-best when:

You need subjective judgment
The criteria are hard to quantify
You want nuanced evaluation of quality

Tip!

Press p or to see the previous file or, n or to see the next file

Specify your S3 bucket

Bucket name cannot be the same as the repository name. Please change one of them.

Bucket url and prefix

Region

Endpoint Url

Disable SSL verification

README.md

Max-Score Selection Example

Overview

Key Differences from `select-best`

Configuration

Options

Usage

Basic Example

How It Works

Example Scoring

When to Use max-score

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida / promptfoo mirror of https://github.com/promptfoo/promptfoo

README.md

Max-Score Selection Example

Overview

Key Differences from select-best

Configuration

Options

Usage

Basic Example

How It Works

Example Scoring

When to Use max-score

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

nirbarazida
/
promptfoo
mirror of https://github.com/promptfoo/promptfoo

Key Differences from `select-best`