Skip to main content

Security Guardrail

The Security Guardrail tab configures detection of jailbreaks, prompt injection, and similar attacks against your model.

Project Settings Security Guardrail tab with apply to input and output toggles and sensitivity
  • Apply to input / Apply to output — Turn the security guardrail on for user messages, assistant messages, or both. When enabled, risky content can be blocked according to your settings.
  • Sensitivity — From very permissive (only the highest-risk content) to very restrictive (stricter blocking).
  • Security violation message — The text your agent should return when a request is blocked by this guardrail (when configured).

Security results appear in monitoring and in the public API responses when the security guardrail is enabled for that direction (security_guardrail_results).