Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

promptfooconfig.yaml 1.5 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
  1. description: 'DeepSeek-R1 vs o1 comparison on MMLU reasoning tasks'
  2. prompts:
  3. - |
  4. You are an expert test taker. Please solve the following multiple choice question step by step.
  5. Question: {{question}}
  6. Options:
  7. A) {{choices[0]}}
  8. B) {{choices[1]}}
  9. C) {{choices[2]}}
  10. D) {{choices[3]}}
  11. Think through this step by step, then provide your final answer in the format "Therefore, the answer is A/B/C/D."
  12. providers:
  13. - openai:o1
  14. - deepseek:deepseek-reasoner
  15. defaultTest:
  16. assert:
  17. # Inference should complete within 60 seconds
  18. - type: latency
  19. threshold: 60000
  20. # Check for step-by-step reasoning
  21. - type: llm-rubric
  22. value: Response must include clear step-by-step reasoning
  23. # Check that it ends with a clear answer choice
  24. - type: regex
  25. value: "Therefore, the answer is [ABCD]\\."
  26. tests:
  27. # Load MMLU test sets for reasoning-heavy subjects
  28. - huggingface://datasets/cais/mmlu?split=test&subset=abstract_algebra&config=abstract_algebra&limit=10
  29. # Optionally load other subjects
  30. # - huggingface://datasets/cais/mmlu?split=test&subset=formal_logic&config=formal_logic&limit=10
  31. # - huggingface://datasets/cais/mmlu?split=test&subset=high_school_mathematics&config=high_school_mathematics&limit=10
  32. # - huggingface://datasets/cais/mmlu?split=test&subset=college_mathematics&config=college_mathematics&limit=10
  33. # - huggingface://datasets/cais/mmlu?split=test&subset=logical_fallacies&config=logical_fallacies&limit=10
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...