Understanding OpenAI's Moderation API

OpenAI’s Moderation API is a powerful tool designed to check whether content complies with OpenAI’s usage policies. It helps developers identify content that violates the policies and take appropriate action, such as filtering it. The moderation endpoint is free to use when monitoring the inputs and outputs of OpenAI APIs, and it classifies content into various categories, including hate, harassment, self-harm, sexual content, and violence.

Categories of moderation

The Moderation API classifies content into the following categories:

  • Hate: Content that expresses or promotes hate based on race, gender, ethnicity, religion, etc.

  • Harassment: Content that incites or promotes harassing language.

  • Self-harm: Content that promotes or depicts acts of self-harm.

  • Sexual content: Content meant to arouse sexual excitement.

  • Violence: Content that depicts death, violence, or physical injury.

Each category may have subcategories, such as threatening, graphic, or involving minors, to provide more specific classifications.

QuickStart guide

To obtain a classification for a piece of text, a request can be made to the moderation endpoint. Here’s an example of how to do this using Python:

import openai
import os
openai.api_key = os.environ["SECRET_KEY"]
response = openai.Moderation.create(
input="Input text goes here"
)
print(response)

Note: This code will only be executable when you enter your API key. To learn how to obtain OpenAI's API key, click here.

The response will include fields such as flagged, which indicates if the content violates OpenAI’s policies, and categories, that cont a dictionary of per-category binary usage policies violation flags.

Working with different languages

While the Moderation API is continuously improving, support for non-English languages may be limited. For higher accuracy, it’s recommended to split long pieces of text into smaller chunks, each less than 2,000 characters.

Example with Go

Here’s an example using Go, demonstrating how to format the results as a JSON string:

package main
import (
"bytes"
"context"
"encoding/json"
"fmt"
openai "github.com/sashabaranov/go-openai"
)
func main() {
c := openai.NewClient(os.Getenv("SECRET_KEY"))
ctx := context.Background()
req := openai.ModerationRequest{
Input: "Educative is a website for developers",
}
resp, err := c.Moderations(ctx, req)
if err != nil {
panic(err)
}
b, err := json.Marshal(resp)
if err != nil {
panic(err)
}
indented := bytes.NewBuffer([]byte{})
err = json.Indent(indented, b, "", "\\t")
if err != nil {
panic(err)
}
fmt.Println(indented.String())
}

Code explanation

  • Lines 1–8: Import necessary packages, including OpenAI's client library.

  • Line 10: Initialize a new OpenAI client using an API key from the environment variables.

  • Line 11: Create a background context for the API call.

  • Lines 1214: Define a moderation request with a specific input string.

  • Lines 1618: Make a moderation request to OpenAI, capturing the response or error.

  • Lines 2022: Serialize the response to JSON, handling any errors.

  • Line 23: Create a buffer to hold the indented JSON string.

  • Line 24: Indent the JSON using tab characters for readability.

  • Line 28: Print the indented JSON string to the standard output.

Conclusion

OpenAI’s Moderation API is a valuable asset for developers who must ensure that content complies with OpenAI’s usage policies. By understanding the categories and implementing the API in various programming languages, developers can effectively filter and moderate content according to the guidelines.

Copyright ©2024 Educative, Inc. All rights reserved