Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

promptfooconfig.yaml 2.6 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
  1. # yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
  2. description: G-Eval criteria-based evaluation of LLM responses
  3. prompts:
  4. - 'Hello! How are you?'
  5. providers:
  6. - openai:gpt-4.1-mini
  7. tests:
  8. - assert: # Calculate score for one criteria only
  9. - type: g-eval
  10. value: >-
  11. Coherence - the collective quality of all sentences. We align this
  12. dimension with the DUC quality question of structure and coherence
  13. whereby "the reply should be well-structured and well-organized. The
  14. reply should not just be a heap of related information, but should
  15. build from sentence to a coherent body of information about a topic."
  16. - assert:
  17. - type: g-eval
  18. value: >-
  19. Consistency - the factual alignment between the reply and the source.
  20. A factually consistent reply contains only statements that are
  21. entailed by the source document. Annotators were also asked to
  22. penalize replies that contained hallucinated facts.
  23. - assert:
  24. - type: g-eval
  25. value: >-
  26. Fluency - the quality of the reply in terms of grammar, spelling,
  27. punctuation, word choice, and sentence structure.
  28. - assert:
  29. - type: g-eval
  30. value: >-
  31. Relevance - selection of important content for the source. The reply
  32. should include only important information for the source document.
  33. Annotators were instructed to penalize replies which contained
  34. redundancies and excess information.
  35. - assert: # Calculate average score among all criterias
  36. - type: g-eval
  37. value:
  38. - Coherence - the collective quality of all sentences. We align this dimension with the DUC quality question of structure and coherence whereby "the reply should be well-structured and well-organized. The reply should not just be a heap of related information, but should build from sentence to a coherent body of information about a topic."
  39. - Consistency - the factual alignment between the reply and the source. A factually consistent reply contains only statements that are entailed by the source document. Annotators were also asked to penalize replies that contained hallucinated facts.
  40. - Fluency - the quality of the reply in terms of grammar, spelling, punctuation, word choice, and sentence structure.
  41. - Relevance - selection of important content for the source. The reply should include only important information for the source document. Annotators were instructed to penalize replies which contained redundancies and excess information.
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...