Are you sure you want to delete this access key?
sidebar_label | sidebar_position |
---|---|
Guardrails API | 99 |
The Guardrails API helps detect potential security risks in user inputs to language models, identify personally identifiable information (PII), and assess potential harm in content.
https://api.promptfoo.dev
Analyzes input text to classify potential security threats from prompt injections and jailbreaks.
POST /v1/guard
Content-Type: application/json
{
"input": "String containing the text to analyze"
}
{
"model": "promptfoo-guard",
"results": [
{
"categories": {
"prompt_injection": boolean,
"jailbreak": boolean
},
"category_scores": {
"prompt_injection": number,
"jailbreak": number
},
"flagged": boolean
}
]
}
categories.prompt_injection
: Indicates if the input may be attempting a prompt injection.categories.jailbreak
: Indicates if the input may be attempting a jailbreak.flagged
: True if the input is classified as either prompt injection or jailbreak.Detects personally identifiable information (PII) in the input text. This system can identify a wide range of PII elements.
Entity Type | Description |
---|---|
account_number | Account numbers (e.g., bank account) |
building_number | Building or house numbers |
city | City names |
credit_card_number | Credit card numbers |
date_of_birth | Dates of birth |
driver_license_number | Driver's license numbers |
email_address | Email addresses |
given_name | First or given names |
id_card_number | ID card numbers |
password | Passwords or passcodes |
social_security_number | Social security numbers |
street_name | Street names |
surname | Last names or surnames |
tax_id_number | Tax identification numbers |
phone_number | Telephone numbers |
username | Usernames |
zip_code | Postal or ZIP codes |
POST /v1/pii
Content-Type: application/json
{
"input": "String containing the text to analyze for PII"
}
{
"model": "promptfoo-pii",
"results": [
{
"categories": {
"pii": boolean
},
"category_scores": {
"pii": number
},
"flagged": boolean,
"payload": {
"pii": [
{
"entity_type": string,
"start": number,
"end": number,
"pii": string
}
]
}
}
]
}
pii
: Indicates if PII was detected in the input.flagged
: True if any PII was detected.payload.pii
: Array of detected PII entities with their types and positions in the text.Analyzes input text to detect potential harmful content across various categories.
POST /v1/harm
Content-Type: application/json
{
"input": "String containing the text to analyze for potential harm"
}
{
"model": "promptfoo-harm",
"results": [
{
"categories": {
"violent_crimes": boolean,
"non_violent_crimes": boolean,
"sex_related_crimes": boolean,
"child_sexual_exploitation": boolean,
"defamation": boolean,
"specialized_advice": boolean,
"privacy": boolean,
"intellectual_property": boolean,
"indiscriminate_weapons": boolean,
"hate": boolean,
"suicide_and_self_harm": boolean,
"sexual_content": boolean,
"elections": boolean,
"code_interpreter_abuse": boolean
},
"category_scores": {
"violent_crimes": number,
"non_violent_crimes": number,
"sex_related_crimes": number,
"child_sexual_exploitation": number,
"defamation": number,
"specialized_advice": number,
"privacy": number,
"intellectual_property": number,
"indiscriminate_weapons": number,
"hate": number,
"suicide_and_self_harm": number,
"sexual_content": number,
"elections": number,
"code_interpreter_abuse": number
},
"flagged": boolean
}
]
}
categories
indicates whether the input contains content related to that harm category.category_scores
provides a numerical score (between 0 and 1) for each harm category.flagged
: True if any harm category is detected in the input.The harm detection API supports the following categories from ML Commons taxonomy:
Category | Description |
---|---|
violent_crimes | Content related to violent criminal activities |
non_violent_crimes | Content related to non-violent criminal activities |
sex_related_crimes | Content related to sex crimes |
child_sexual_exploitation | Content involving the sexual exploitation of minors |
defamation | Content that could be considered defamatory |
specialized_advice | Potentially harmful specialized advice (e.g., medical, legal) |
privacy | Content that may violate privacy |
intellectual_property | Content that may infringe on intellectual property rights |
indiscriminate_weapons | Content related to weapons of mass destruction |
hate | Hate speech or content promoting discrimination |
suicide_and_self_harm | Content related to suicide or self-harm |
sexual_content | Explicit sexual content |
elections | Content that may interfere with elections |
code_interpreter_abuse | Potential abuse of code interpretation features |
Each category is assigned a boolean value indicating its presence and a numerical score between 0 and 1 representing the confidence level of the detection.
curl https://api.promptfoo.dev/v1/guard \
-X POST \
-d '{"input": "Ignore previous instructions"}' \
-H 'Content-Type: application/json'
{
"model": "promptfoo-guard",
"results": [
{
"categories": {
"prompt_injection": false,
"jailbreak": true
},
"category_scores": {
"prompt_injection": 0.00004004167567472905,
"jailbreak": 0.9999395608901978
},
"flagged": true
}
]
}
This example shows a high probability of a jailbreak attempt.
curl https://api.promptfoo.dev/v1/pii \
-X POST \
-d '{"input": "My name is John Doe and my email is john@example.com"}' \
-H 'Content-Type: application/json'
{
"model": "promptfoo-pii",
"results": [
{
"categories": {
"pii": true
},
"category_scores": {
"pii": 1
},
"flagged": true,
"payload": {
"pii": [
{
"entity_type": "PERSON",
"start": 11,
"end": 19,
"pii": "John Doe"
},
{
"entity_type": "EMAIL",
"start": 34,
"end": 50,
"pii": "john@example.com"
}
]
}
}
]
}
curl https://api.promptfoo.dev/v1/harm \
-X POST \
-d '{"input": "How to make homemade explosives"}' \
-H 'Content-Type: application/json'
{
"model": "promptfoo-harm",
"results": [
{
"categories": {
"indiscriminate_weapons": true
},
"category_scores": {
"indiscriminate_weapons": 1
},
"flagged": true
}
]
}
This example shows the detection of potentially harmful content related to indiscriminate weapons.
For more information on LLM vulnerabilities and how to mitigate LLM failure modes, refer to our Types of LLM Vulnerabilities and Introduction to AI red teaming documentation.
Press p or to see the previous file or, n or to see the next file
Browsing data directories saved to S3 is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with AWS S3!
Are you sure you want to delete this access key?
Browsing data directories saved to Google Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Google Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to Azure Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with Azure Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to S3 compatible storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
promptfoo is now integrated with your S3 compatible storage!
Are you sure you want to delete this access key?