Guardrails reference

Idun Agent Platform supports 15 guardrail types that validate agent inputs, outputs, or both. Each guardrail has a config_id, a reject message returned when the guard triggers, and type-specific configuration fields.

Guardrail positions

Guardrails are placed in one of two positions:

Input: Applied to user messages before they reach the agent
Output: Applied to agent responses before they are returned to the user

You can place the same guardrail type in both positions.

config.yaml

guardrails:
  input:
    - config_id: detect_pii
      reject_message: "PII detected in your input"
      pii_entities: ["EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD"]
      on_fail: exception

  output:
    - config_id: toxic_language
      reject_message: "Response contains inappropriate language"
      threshold: 0.7

Guardrail types

BAN_LIST

Blocks messages containing specific words or phrases.

Field	Type	Description
`config_id`	`ban_list`
`reject_message`	`string`	Message returned when triggered
`banned_words`	`list[string]`	Words or phrases to block

- config_id: ban_list
  reject_message: "Message contains banned content"
  banned_words: ["banned-word", "another phrase"]

DETECT_PII

Detects personally identifiable information in text.

Field	Type	Description
`config_id`	`detect_pii`
`reject_message`	`string`	Message returned when triggered
`pii_entities`	`list[string]`	PII entity types to detect (e.g., `EMAIL_ADDRESS`, `PHONE_NUMBER`, `CREDIT_CARD`, `SSN`)
`on_fail`	`string`	Action on detection. Default: `exception`

- config_id: detect_pii
  reject_message: "Personal information detected"
  pii_entities: ["EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD"]
  on_fail: exception

NSFW_TEXT

Detects not-safe-for-work content.

Field	Type	Description
`config_id`	`nsfw_text`
`reject_message`	`string`	Message returned when triggered
`threshold`	`float`	Sensitivity level (0.0 to 1.0). Lower values are more sensitive

- config_id: nsfw_text
  reject_message: "Inappropriate content detected"
  threshold: 0.5

COMPETITION_CHECK

Flags mentions of competitor companies or products.

Field	Type	Description
`config_id`	`competition_check`
`reject_message`	`string`	Message returned when triggered
`competitors`	`list[string]`	Names of competitor companies or products

- config_id: competition_check
  reject_message: "Competitor reference detected"
  competitors: ["CompetitorA", "CompetitorB"]

BIAS_CHECK

Detects biased language in text.

Field	Type	Description
`config_id`	`bias_check`
`reject_message`	`string`	Message returned when triggered
`threshold`	`float`	Sensitivity level (0.0 to 1.0)

- config_id: bias_check
  reject_message: "Biased language detected"
  threshold: 0.7

CORRECT_LANGUAGE

Validates that text is in one of the expected languages.

Field	Type	Description
`config_id`	`correct_language`
`reject_message`	`string`	Message returned when triggered
`expected_languages`	`list[string]`	Valid ISO language codes (e.g., `en`, `fr`, `es`)

- config_id: correct_language
  reject_message: "Please use English or French"
  expected_languages: ["en", "fr"]

GIBBERISH_TEXT

Filters nonsensical or garbled input.

Field	Type	Description
`config_id`	`gibberish_text`
`reject_message`	`string`	Message returned when triggered
`threshold`	`float`	Sensitivity level (0.0 to 1.0)

- config_id: gibberish_text
  reject_message: "Input appears to be nonsensical"
  threshold: 0.8

TOXIC_LANGUAGE

Detects toxic or harmful language.

Field	Type	Description
`config_id`	`toxic_language`
`reject_message`	`string`	Message returned when triggered
`threshold`	`float`	Sensitivity level (0.0 to 1.0)

- config_id: toxic_language
  reject_message: "Toxic language detected"
  threshold: 0.7

RESTRICT_TO_TOPIC

Keeps conversations within a defined set of allowed topics.

Field	Type	Description
`config_id`	`restrict_to_topic`
`reject_message`	`string`	Message returned when triggered
`topics`	`list[string]`	List of allowed topics

- config_id: restrict_to_topic
  reject_message: "That topic is outside the scope of this agent"
  topics: ["customer support", "product information", "billing"]

DETECT_JAILBREAK

Detects jailbreak attempts in user input.

Field	Type	Description
`config_id`	`detect_jailbreak`
`reject_message`	`string`	Message returned when triggered
`threshold`	`float`	Sensitivity level (0.0 to 1.0)

- config_id: detect_jailbreak
  reject_message: "Jailbreak attempt detected"
  threshold: 0.5

PROMPT_INJECTION

Detects prompt injection attacks.

Field	Type	Description
`config_id`	`prompt_injection`
`reject_message`	`string`	Message returned when triggered
`threshold`	`float`	Sensitivity level (0.0 to 1.0)

- config_id: prompt_injection
  reject_message: "Prompt injection detected"
  threshold: 0.5

RAG_HALLUCINATION

Detects hallucinations in RAG (Retrieval-Augmented Generation) responses by comparing the response against the retrieved context.

Field	Type	Description
`config_id`	`rag_hallucination`
`reject_message`	`string`	Message returned when triggered
`threshold`	`float`	Sensitivity level (0.0 to 1.0)

- config_id: rag_hallucination
  reject_message: "Response may contain unsupported claims"
  threshold: 0.7

CODE_SCANNER

Scans and validates code in messages, restricting to allowed programming languages.

Field	Type	Description
`config_id`	`code_scanner`
`reject_message`	`string`	Message returned when triggered
`allowed_languages`	`list[string]`	List of allowed programming languages

- config_id: code_scanner
  reject_message: "Code in this language is not allowed"
  allowed_languages: ["python", "javascript", "sql"]

MODEL_ARMOR (Google Cloud)

Uses Google Cloud’s Model Armor service for content safety evaluation.

Field	Type	Description
`config_id`	`model_armor`
`name`	`string`	Name of the armor configuration
`project_id`	`string`	Google Cloud project ID
`location`	`string`	Google Cloud region (e.g., `us-central1`)
`template_id`	`string`	Model Armor template ID

- config_id: model_armor
  name: production-armor
  project_id: my-gcp-project
  location: us-central1
  template_id: my-armor-template

Model Armor requires a Google Cloud project with the Model Armor API enabled. This guardrail type does not use the Guardrails AI hub.

CUSTOM_LLM

Uses a large language model as a custom guardrail with a prompt you define.

Field	Type	Description
`config_id`	`custom_llm`
`name`	`string`	Name of the custom guardrail
`model`	`string`	LLM model to use for evaluation
`prompt`	`string`	System instruction prompt that defines the guardrail logic

Supported models:

Model ID	Name
`Gemini 2.5 flash lite`	Gemini 2.5 Flash Lite
`Gemini 2.5 flash`	Gemini 2.5 Flash
`Gemini 2.5 pro`	Gemini 2.5 Pro
`Gemini 3 pro`	Gemini 3 Pro
`OpenAi GPT-5.1`	OpenAI GPT-5.1
`OpenAi GPT-5 mini`	OpenAI GPT-5 Mini
`OpenAi GPT-5 nano`	OpenAI GPT-5 Nano

- config_id: custom_llm
  name: compliance-check
  model: "Gemini 2.5 flash"
  prompt: "Evaluate the following text for regulatory compliance violations. Return PASS if compliant, FAIL if not."

Custom LLM guardrails use a separate LLM call for evaluation. This adds latency and cost to each guarded request.

Summary table

Type	Config ID	Position	Key config fields
Ban list	`ban_list`	Input/Output	`banned_words`
Detect PII	`detect_pii`	Input/Output	`pii_entities`, `on_fail`
NSFW text	`nsfw_text`	Input/Output	`threshold`
Competition check	`competition_check`	Input/Output	`competitors`
Bias check	`bias_check`	Input/Output	`threshold`
Correct language	`correct_language`	Input/Output	`expected_languages`
Gibberish text	`gibberish_text`	Input/Output	`threshold`
Toxic language	`toxic_language`	Input/Output	`threshold`
Restrict to topic	`restrict_to_topic`	Input/Output	`topics`
Detect jailbreak	`detect_jailbreak`	Input	`threshold`
Prompt injection	`prompt_injection`	Input	`threshold`
RAG hallucination	`rag_hallucination`	Output	`threshold`
Code scanner	`code_scanner`	Input/Output	`allowed_languages`
Model Armor	`model_armor`	Input/Output	`project_id`, `location`, `template_id`
Custom LLM	`custom_llm`	Input/Output	`model`, `prompt`

Getting started

Manager

Agent frameworks

Guardrails

Memory

Observability

Tool governance

Authentication

Integrations

Deployment

Guides

Advanced

Guardrail positions

Guardrail types

BAN_LIST

DETECT_PII

NSFW_TEXT

COMPETITION_CHECK

BIAS_CHECK

CORRECT_LANGUAGE

GIBBERISH_TEXT

TOXIC_LANGUAGE

RESTRICT_TO_TOPIC

DETECT_JAILBREAK

PROMPT_INJECTION

RAG_HALLUCINATION

CODE_SCANNER

MODEL_ARMOR (Google Cloud)

CUSTOM_LLM

Summary table

Getting started

Manager

Agent frameworks

Guardrails

Memory

Observability

Tool governance

Authentication

Integrations

Deployment

Guides

Advanced

Documentation Index

​Guardrail positions

​Guardrail types

​BAN_LIST

​DETECT_PII

​NSFW_TEXT

​COMPETITION_CHECK

​BIAS_CHECK

​CORRECT_LANGUAGE

​GIBBERISH_TEXT

​TOXIC_LANGUAGE

​RESTRICT_TO_TOPIC

​DETECT_JAILBREAK

​PROMPT_INJECTION

​RAG_HALLUCINATION

​CODE_SCANNER

​MODEL_ARMOR (Google Cloud)

​CUSTOM_LLM

​Summary table

Guardrail positions

Guardrail types

BAN_LIST

DETECT_PII

NSFW_TEXT

COMPETITION_CHECK

BIAS_CHECK

CORRECT_LANGUAGE

GIBBERISH_TEXT

TOXIC_LANGUAGE

RESTRICT_TO_TOPIC

DETECT_JAILBREAK

PROMPT_INJECTION

RAG_HALLUCINATION

CODE_SCANNER

MODEL_ARMOR (Google Cloud)

CUSTOM_LLM

Summary table