Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

promptfooconfig.yaml 2.5 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
  1. # yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
  2. description: Max-score assertion for objective output selection
  3. # This example demonstrates how max-score can objectively select the best implementation
  4. # based on weighted scores from multiple assertions (correctness, documentation, efficiency)
  5. prompts:
  6. - 'Generate a Python function to {{task}}'
  7. - 'Write an efficient Python function to {{task}}'
  8. - 'Create a well-documented Python function to {{task}}'
  9. providers:
  10. - openai:o4-mini
  11. - anthropic:claude-4-sonnet
  12. - google:gemini-2.5-flash
  13. tests:
  14. - vars:
  15. task: 'merge two sorted lists into one sorted list'
  16. assert:
  17. # Correctness test
  18. - type: python
  19. value: |
  20. # Test the merge function
  21. list1 = [1, 3, 5, 7, 9]
  22. list2 = [2, 4, 6, 8, 10]
  23. result = merge_sorted_lists(list1, list2)
  24. assert result == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], f"Expected [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], got {result}"
  25. # Test with empty lists
  26. assert merge_sorted_lists([], [1, 2, 3]) == [1, 2, 3]
  27. assert merge_sorted_lists([1, 2, 3], []) == [1, 2, 3]
  28. assert merge_sorted_lists([], []) == []
  29. # Documentation test
  30. - type: llm-rubric
  31. value: 'Code includes a clear docstring explaining parameters, return value, and complexity'
  32. # Code structure and efficiency
  33. - type: llm-rubric
  34. value: 'The implementation uses an efficient algorithm (O(m+n) time complexity) and follows Python best practices'
  35. # Max-score selects the best implementation objectively
  36. - type: max-score
  37. value:
  38. weights:
  39. python: 3 # Correctness is most important
  40. llm-rubric: 1.5 # Documentation and code quality weighted together
  41. - vars:
  42. task: 'check if a string is a palindrome (ignoring case and spaces)'
  43. assert:
  44. # Correctness
  45. - type: python
  46. value: |
  47. assert is_palindrome("racecar") == True
  48. assert is_palindrome("A man a plan a canal Panama") == True
  49. assert is_palindrome("race a car") == False
  50. assert is_palindrome("hello") == False
  51. assert is_palindrome("") == True
  52. # Edge case handling
  53. - type: llm-rubric
  54. value: 'Handles edge cases like empty strings and single characters correctly'
  55. # Efficiency
  56. - type: llm-rubric
  57. value: 'Uses an efficient algorithm (O(n) time complexity)'
  58. # Simple max-score (all assertions equally weighted)
  59. - type: max-score
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...