Content Filtering in Azure OpenAI Services: Ensuring Safe and Appropriate Content

The rapid generation of text and content has opened up new possibilities for innovation and creativity. However, it has also brought forth the need for robust content filtering mechanisms to maintain a safe and appropriate online environment. Content filtering plays a pivotal role in ensuring that the content generated and shared through Azure OpenAI services aligns with ethical standards, legal requirements, and community guidelines.

This article explores the concepts of content filtering, exploring why it's essential, how it works, and how you can configure it within the Azure OpenAI ecosystem. Whether you're building chatbots, content generation tools, or any application that leverages OpenAI's powerful language models, understanding and implementing content filtering is crucial to protecting your users and maintaining a positive online presence.

What is Content Filtering?

Why it is important?

Content Filtering Categories

How does Content Filtering Work?

Configuring Content Filters

Conclusion

What is Content Filtering?

Content filtering in Azure OpenAI Service is a system that uses machine learning to detect and filter harmful content from text and code. The content filtering system works by running both the prompt and completion through an ensemble of classification models. These models are trained on a large dataset of text and code that contains both harmful and non-harmful content. The models are able to identify harmful content based on a variety of factors, including the use of certain words or phrases, the context in which the words are used, and the structure of the text.

The content filtering system supports the following languages: English, German, Japanese, Spanish, French, Italian, Portuguese, and Chinese. It might not be able to detect inappropriate content in languages that it has not been trained or tested to process.

Azure OpenAI Service also performs monitoring to detect content and/or behaviors that suggest the use of the service in a manner that may violate applicable product terms.

Why it is important?

The content filtering system in Azure OpenAI Service is important for a number of reasons:

Protects users from harmful content: Harmful content can include things like hate speech, violence, child sexual abuse content, and self-harm. Exposure to this type of content can have negative consequences for users, including emotional distress, psychological harm, and even physical harm.
Comply with government regulations: Some governments have laws that require businesses to block access to certain types of content. Content filtering in Azure OpenAI Service can help businesses to comply with these laws.
Reputation of business and organization: If a business or organization is associated with harmful content, it can damage its reputation and make it difficult to attract customers or partners. Content filtering in Azure OpenAI Service can help to prevent this from happening.

Content Filtering Categories

Here we have 4 different categories of content filtering:

Hate: This category describes language that is used to attack a person or group of people based on their race, ethnicity, nationality, gender identity and expression, sexual orientation, religion, immigration status, ability status, personal appearance, or body size. This includes pejorative or discriminatory language, as well as language that promotes violence or hatred against these groups.
Sexual: This category describes language that is related to sex, sexuality, and the human body. This includes anatomical terms, romantic relationships, and physical sexual acts. It also includes language that is used to objectify or exploit people, or that promotes violence or abuse against them.
Violence: This category describes language that is used to threaten or incite violence against a person or group of people. It also includes language that is used to describe or glorify violence.
Self-harm: This category describes language that is used to promote or encourage self-harm. It also includes language that is used to describe or glorify self-harm.

How does the content filtering system Work?

In Azure OpenAI Service, the content filtering system is integrated into the core models. This means that the content filtering system is always running and is constantly scanning for harmful content. When harmful content is detected, it is blocked and the user is notified.

The content filtering system in Azure OpenAI Service uses a neural multi-class classification system to detect harmful content. This system is trained on a large dataset of text and code that contains both harmful and non-harmful content. The system learns to identify the patterns that are associated with harmful content and uses these patterns to detect harmful content in new requests.

Consider the below image:

The diagram shows the following steps:

The user sends a request to the Azure OpenAI Service.
The Azure OpenAI Service receives the request and passes it to the content filtering system.
The content filtering system analyzes the request and determines if it contains harmful content.
If the content filtering system determines that the request contains harmful content, it blocks the request and returns an error message to the user.
If the content filtering system determines that the request does not contain harmful content, it passes the request to the core models.
The core models generate the response to the request and return it to the user.

Configuring Content Filters

Content filters in Azure OpenAI Service can be configured at the resource level. This means that you can create a custom content filtering configuration for your resource and then assign it to one or more deployments.

To set up a customized content filtering configuration for your resource, follow these steps:

STEP 1: Go to Azure OpenAI Studio and navigate to the Content Filters tab.

STEP 2: Create "Create new customized content filtering configuration".

STEP 3: In the configuration view, give your custom content filtering configuration a name.

For each of the four content categories (hate, sexual, violence, and self-harm), you can modify the content filtering severity level for both prompts and completions. The severity levels are low, medium, and high.

By default, content is filtered at the medium and high severity levels for all categories. However, you can modify this setting to filter at more or fewer severity levels, depending on your needs.

Scenario 1:

Consider the below image where the configuration is set to turn off filtering for the self-harm and violence categories in both prompts and completions. It is also set to low security for both prompts and completions.

The content filtering system will still attempt to detect harmful content in the prompts and completions, but it will not prevent the content from being returned.

For example, if you provide the prompt "Write a poem about suicide," the content filtering system will still detect that the prompt contains self-harm content. However, the system will not prevent the prompt from being processed and the completion from being returned.

If you are concerned about the possibility of harmful content being returned, you can use the content filtering system in conjunction with other measures, such as human review, to help ensure that only safe and appropriate content is returned.

Scenario 2:

Consider the below image where the configuration is set to filter out low, medium, and high severity content for the hate and sexual categories in both prompts and completions. It is also set to filter out medium and high severity content for the self-harm category in both prompts and completions.

Here is an example of a prompt and completion that would be filtered out by the content filtering configuration in the image:

Prompt: Write a poem about how much I hate my parents.

Completion: I hate my parents. They are always yelling at me and never let me do anything I want. I wish they would just leave me alone.

This prompt and completion would be filtered out because it contains hate speech that is classified as high severity.

Here is an example of a prompt and completion that would not be filtered out by the content filtering configuration in the image:

Prompt: Write a poem about love.

Completion: Love is a many splendored thing. It's the April rose that only grows in the early spring. Love is nature's way of giving us hope.

This prompt and completion would not be filtered out because it does not contain any harmful content.

Once you have created your custom content filtering configuration, you can assign it to one or more deployments. To do this, follow these steps:

STEP 1: Go to the Deployments tab.

STEP 2: Click the name of the deployment that you want to assign the content filtering configuration to.

STEP 3: In the Advanced options section.

Select the Content filter dropdown menu and choose the custom content filtering configuration that you want to assign.

STEP 4: Click Save and close.

You can also edit and delete a content filtering configuration if required. To do this, follow these steps:

STEP 1: Go to the Content Filters tab.

STEP 2: Click the name of the content filtering configuration that you want to edit or delete.

Content Filtering - edit or delete the content filtering configuration

STEP 3: Click the Edit or Delete button.

Conclusion

In our exploration of content filtering in Azure OpenAI services, we've uncovered its vital role in fostering a responsible digital environment. Content filtering strikes the balance between innovation and responsibility, ensuring AI-powered content generation adheres to ethical standards and community guidelines.

As you navigate the world of Azure OpenAI services, remember that content filtering is more than just compliance—it's a commitment to ethical content creation. It safeguards a future where technology and humanity coexist harmoniously, enriching our digital experiences.