Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

promptfooconfig.yaml 1.5 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
  1. # yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
  2. description: 'DeepSeek-R1 vs o1 comparison on MMLU reasoning tasks'
  3. prompts:
  4. - |
  5. You are an expert test taker. Please solve the following multiple choice question step by step.
  6. Question: {{question}}
  7. Options:
  8. A) {{choices[0]}}
  9. B) {{choices[1]}}
  10. C) {{choices[2]}}
  11. D) {{choices[3]}}
  12. Think through this step by step, then provide your final answer in the format "Therefore, the answer is A/B/C/D."
  13. providers:
  14. - openai:o1
  15. - deepseek:deepseek-reasoner
  16. defaultTest:
  17. assert:
  18. # Inference should complete within 60 seconds
  19. - type: latency
  20. threshold: 60000
  21. # Check for step-by-step reasoning
  22. - type: llm-rubric
  23. value: Response must include clear step-by-step reasoning
  24. # Check that it ends with a clear answer choice
  25. - type: regex
  26. value: "Therefore, the answer is [ABCD]\\."
  27. tests:
  28. # Load MMLU test sets for reasoning-heavy subjects
  29. - huggingface://datasets/cais/mmlu?split=test&subset=abstract_algebra&config=abstract_algebra&limit=10
  30. # Optionally load other subjects
  31. # - huggingface://datasets/cais/mmlu?split=test&subset=formal_logic&config=formal_logic&limit=10
  32. # - huggingface://datasets/cais/mmlu?split=test&subset=high_school_mathematics&config=high_school_mathematics&limit=10
  33. # - huggingface://datasets/cais/mmlu?split=test&subset=college_mathematics&config=college_mathematics&limit=10
  34. # - huggingface://datasets/cais/mmlu?split=test&subset=logical_fallacies&config=logical_fallacies&limit=10
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...