OpenAI Moderation
Overviewโ
| Property | Details | 
|---|---|
| Description | Use OpenAI's built-in Moderation API to detect and block harmful content including hate speech, harassment, self-harm, sexual content, and violence. | 
| Provider | OpenAI Moderation API | 
| Supported Actions | BLOCK(raises HTTP 400 exception when violations detected) | 
| Supported Modes | pre_call,during_call,post_call | 
| Streaming Support | โ Full support for streaming responses | 
| API Requirements | OpenAI API key | 
Quick Startโ
1. Define Guardrails on your LiteLLM config.yamlโ
Define your guardrails under the guardrails section:
- Config.yaml
- Environment Variables
model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY
guardrails:
  - guardrail_name: "openai-moderation-pre"
    litellm_params:
      guardrail: openai_moderation
      mode: "pre_call"
      api_key: os.environ/OPENAI_API_KEY  # Optional if already set globally
      model: "omni-moderation-latest"     # Optional, defaults to omni-moderation-latest
      api_base: "https://api.openai.com/v1"  # Optional, defaults to OpenAI API
Supported values for modeโ
- pre_callRun before LLM call, on user input
- during_callRun during LLM call, on user input. Same as- pre_callbut runs in parallel as LLM call. Response not returned until guardrail check completes.
- post_callRun after LLM call, on LLM response
Supported OpenAI Moderation Modelsโ
- omni-moderation-latest(default) - Latest multimodal moderation model
- text-moderation-latest- Latest text-only moderation model
Set your OpenAI API key:
export OPENAI_API_KEY="your-openai-api-key"
2. Start LiteLLM Gatewayโ
litellm --config config.yaml --detailed_debug
3. Test requestโ
- Blocked Request
- Successful Call
Expect this to fail since the request contains harmful content:
curl -i http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "I hate all people and want to hurt them"}
    ],
    "guardrails": ["openai-moderation-pre"]
  }'
Expected response on failure:
{
  "error": {
    "message": {
      "error": "Violated OpenAI moderation policy",
      "moderation_result": {
        "violated_categories": ["hate", "violence"],
        "category_scores": {
          "hate": 0.95,
          "violence": 0.87,
          "harassment": 0.12,
          "self-harm": 0.01,
          "sexual": 0.02
        }
      }
    },
    "type": "None",
    "param": "None", 
    "code": "400"
  }
}
curl -i http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "guardrails": ["openai-moderation-pre"]
  }'
Expected response:
{
  "id": "chatcmpl-4a1c1a4a-3e1d-4fa4-ae25-7ebe84c9a9a2",
  "created": 1741082354,
  "model": "gpt-4",
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "The capital of France is Paris.",
        "role": "assistant"
      }
    }
  ],
  "usage": {
    "completion_tokens": 8,
    "prompt_tokens": 13,
    "total_tokens": 21
  }
}
Advanced Configurationโ
Multiple Guardrails for Input and Outputโ
You can configure separate guardrails for user input and LLM responses:
guardrails:
  - guardrail_name: "openai-moderation-input" 
    litellm_params:
      guardrail: openai_moderation
      mode: "pre_call"
      api_key: os.environ/OPENAI_API_KEY
      
  - guardrail_name: "openai-moderation-output"
    litellm_params:
      guardrail: openai_moderation
      mode: "post_call" 
      api_key: os.environ/OPENAI_API_KEY
Custom API Configurationโ
Configure custom OpenAI API endpoints or different models:
guardrails:
  - guardrail_name: "openai-moderation-custom"
    litellm_params:
      guardrail: openai_moderation
      mode: "pre_call"
      api_key: os.environ/OPENAI_API_KEY
      api_base: "https://your-custom-openai-endpoint.com/v1"
      model: "text-moderation-latest"
Streaming Supportโ
The OpenAI Moderation guardrail fully supports streaming responses. When used in post_call mode, it will:
- Collect all streaming chunks
- Assemble the complete response
- Apply moderation to the full content
- Block the entire stream if violations are detected
- Return the original stream if content is safe
guardrails:
  - guardrail_name: "openai-moderation-streaming"
    litellm_params:
      guardrail: openai_moderation
      mode: "post_call"  # Works with streaming responses
      api_key: os.environ/OPENAI_API_KEY
Content Categoriesโ
The OpenAI Moderation API detects the following categories of harmful content:
| Category | Description | 
|---|---|
| hate | Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste | 
| harassment | Content that harasses, bullies, or intimidates an individual | 
| self-harm | Content that promotes, encourages, or depicts acts of self-harm | 
| sexual | Content meant to arouse sexual excitement or promote sexual services | 
| violence | Content that depicts death, violence, or physical injury | 
Each category is evaluated with both a boolean flag and a confidence score (0.0 to 1.0).
Error Handlingโ
When content violates OpenAI's moderation policy:
- HTTP Status: 400 Bad Request
- Error Type: HTTPException
- Error Details: Includes violated categories and confidence scores
- Behavior: Request is immediately blocked
Best Practicesโ
1. Use Pre-call for User Inputโ
guardrails:
  - guardrail_name: "input-moderation"
    litellm_params:
      guardrail: openai_moderation
      mode: "pre_call"  # Block harmful user inputs early
2. Use Post-call for LLM Responsesโ
guardrails:
  - guardrail_name: "output-moderation"
    litellm_params:
      guardrail: openai_moderation  
      mode: "post_call"  # Ensure LLM responses are safe
3. Combine with Other Guardrailsโ
guardrails:
  - guardrail_name: "openai-moderation"
    litellm_params:
      guardrail: openai_moderation
      mode: "pre_call"
      
  - guardrail_name: "custom-pii-detection"
    litellm_params:
      guardrail: presidio
      mode: "pre_call"
Troubleshootingโ
Common Issuesโ
- 
Invalid API Key: Ensure your OpenAI API key is correctly set export OPENAI_API_KEY="sk-your-actual-key"
- 
Rate Limiting: OpenAI Moderation API has rate limits. Monitor usage in high-volume scenarios. 
- 
Network Issues: Verify connectivity to OpenAI's API endpoints. 
Debug Modeโ
Enable detailed logging to troubleshoot issues:
litellm --config config.yaml --detailed_debug
Look for logs starting with OpenAI Moderation: to trace guardrail execution.
API Costsโ
The OpenAI Moderation API is free to use for content policy compliance. This makes it a cost-effective guardrail option compared to other commercial moderation services.
Need Help?โ
For additional support:
- Check the OpenAI Moderation API documentation
- Review LiteLLM Guardrails documentation
- Join our Discord community