Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

promptfooconfig.aime2024.yaml 1.2 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
  1. # yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
  2. description: Reproduce Mistral Magistral AIME2024 benchmark
  3. prompts:
  4. - |
  5. Solve this AIME mathematical problem step by step.
  6. Problem: {{question}}
  7. Think through this carefully and provide your final answer as a 3-digit integer (000-999).
  8. End with: "Therefore, the answer is [your answer]."
  9. providers:
  10. - id: mistral:magistral-medium-latest
  11. label: Magistral Medium
  12. config:
  13. temperature: 0.7
  14. top_p: 0.95
  15. max_tokens: 40960
  16. - id: mistral:magistral-small-latest
  17. label: Magistral Small
  18. config:
  19. temperature: 0.7
  20. top_p: 0.95
  21. max_tokens: 40960
  22. tests:
  23. - huggingface://datasets/sea-snell/aime-2024?split=test
  24. defaultTest:
  25. assert:
  26. - type: llm-rubric
  27. value: |
  28. Evaluate this mathematical solution to an AIME competition problem.
  29. The correct answer is: {{answer}}
  30. Grade as PASS if and only if:
  31. 1. The response shows clear step-by-step mathematical reasoning
  32. 2. The final answer presented equals {{answer}} exactly
  33. 3. The mathematical work supports the conclusion
  34. Grade as FAIL if the final answer is incorrect, regardless of the reasoning quality.
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...