How Data is Processes in Azure OpenAI Service

The Tech Platform
Jun 23, 2023
5 min read

The Azure OpenAI Service is a robust platform that processes various types of data to provide intelligent and dynamic responses. Understanding how data is processed within the service is essential to harness its capabilities effectively. In this article, we will explore the types of data processed by Azure OpenAI Service and the different methods employed for their processing. By gaining insights into the data processing workflow, users can use the service's potential to its fullest extent.

What kind of data is processed by Azure OpenAI Service?

The Azure OpenAI Service processes various types of data, including:

Prompts and Completions: Users submit prompts to the service, and the service generates completions as output. This interaction occurs through operations like /completions and /chat/completions. Prompts serve as input to the model, and completions are the generated responses.
Training & Validation Data: Users have the option to provide their own training data to fine-tune an OpenAI model. This training data typically consists of prompt-completion pairs, where prompts represent the input and completions are the desired output. By training the model on specific data, it can be customized for specific tasks or domains.
Results Data from Training Process: When training a fine-tuned model, the service provides metadata on the training job. This metadata includes information such as the number of tokens processed during training and validation scores at each step. These results data help evaluate the training progress and performance of the fine-tuned model.

How the Data is Processed?

The diagram below illustrates the data processing in Azure OpenAI Service, encompassing three key processes:

1. Fine-tuning Model Creation:

Your training data is utilized to create a fine-tuned or custom model in Azure OpenAI Service.
This process involves incorporating your specific training data to enhance the model's performance and tailor it to your specific requirements.

2. Text Prompts Processing for Completions and Embeddings:

The Azure OpenAI Service processes the text prompts you to provide to generate completions and embedding results.
Prompt input serves as a starting point, and the service generates completions as responses based on the provided prompts.
Additionally, the service generates embeddings, which are numerical representations of the prompts, capturing their contextual information.

3. Analysis for Abuse, Misuse, and Debugging:

Both the Azure OpenAI Service and Microsoft personnel analyze prompts and completions for various purposes.
This analysis aims to identify and prevent abuse, misuse, or the generation of harmful content.
It facilitates debugging in case of any failures or issues encountered during the processing.

Training data for fine-tuning:

When you provide training data (prompt-completion pairs) through the Fine-tunes API, it goes through quality checks and is imported to the model training component on the Azure OpenAI platform. The training data is used to modify the weights of your specific fine-tuned model. It's important to note that your training data is only used for your own model and not for training or improving any Microsoft models.

Text prompts for completions and embeddings:

Once your model is deployed in your Azure OpenAI resource, you can submit text prompts to the model using the Completions or Embeddings operations. The model generates text completions or embeddings based on your prompts, and the results are returned through the API. During this process, your data is processed through content filters to identify and filter potentially harmful content. No prompts or completions are stored in the model, and they are not used to train or improve the models.

Preventing Abuse and harmful content:

The Azure OpenAI Service has a system in place to manage content and ensure harmful content is filtered out. When you provide input prompts and receive generated completions, they undergo evaluation by classification models to detect any misuse or harmful content. If any harmful content is identified, you may receive an error message or a filtered response.

It's important to know that your prompts and completions are not stored or used to train or improve the classification models. They are only retained for a limited period of up to 30 days to monitor for any content or behaviors that may violate the product terms. Microsoft employees may review this data, but it is done through automated systems to investigate and confirm potential abuse.

If a policy violation is confirmed, you will be notified to take immediate action to address the issue and prevent further abuse. Failure to address the issue may lead to the suspension or termination of your access to Azure OpenAI resources.

Customers have the option to request modifications to the content filtering and abuse monitoring by submitting a form. If your request is approved and meets the requirements, your prompts and completions will not be stored.

Privacy and Security Concerns (Abuse Monitoring)

Some customers may have sensitive or highly confidential data that they want to process using the Azure OpenAI Service. They may believe that the risk of harmful outputs or misuse is low, and they prefer not to allow Microsoft to process their data for abuse detection purposes. This could be due to their internal policies or legal regulations.

To address these concerns, Microsoft offers the option for eligible customers to apply for modifications to the content management features of Azure OpenAI. These customers must meet specific criteria and attest to their use cases.

If a customer's request to modify abuse monitoring is approved by Microsoft, then no prompts and completions associated with their Azure subscription will be stored. This means that the data will not be stored in the Service Results Store, and there won't be any human review process performed.

By allowing customers to modify abuse monitoring, Microsoft respects their need for enhanced data protection and ensures that sensitive information is not stored or reviewed as part of the service.

How the Data is Retained?

The data retention process for different types of data in the Azure OpenAI Service can be summarized as follows:

1. Training, validation, and training results in data:

When customers upload their training data using the Files API, it is stored in Azure Storage. The data is encrypted at rest using Microsoft Managed keys. It remains within the same region as the resource and is logically isolated within the customer's Azure subscription and API Credentials. Customers have the option to delete the uploaded files using the DELETE API operation if they wish.

2. Fine-tuned OpenAI models:

Customers can create their own fine-tuned versions of OpenAI models by uploading their training data via the Files API. The trained fine-tuned models are stored in Azure Storage within the same region. They are encrypted at rest and logically isolated within the customer's Azure subscription and API credentials. If customers want to remove their fine-tuned models, they can do so by calling the DELETE API operation.

3. Prompts and completions data:

The prompts and completions data may be temporarily stored by the Azure OpenAI Service in the same region as the resource for a maximum of 30 days. This data is encrypted and can only be accessed by authorized Microsoft employees for specific purposes. These purposes include debugging in case of failures and investigating patterns of abuse and misuse to ensure compliance with product terms. However, if a customer is approved for modified abuse monitoring, their prompts and completions data is not stored, and Microsoft employees do not have access to it.

Customer Controls in Azure OpenAI Service

Customers using the Azure OpenAI Service have certain controls over their data, including:

1. Upload and deletion control:

Customers can upload their training data using the Files API and delete uploaded files if needed through the DELETE API operation. This provides customers with control over the data they contribute to the training process.

2. Fine-tuned model control:

Customers can create their own fine-tuned versions of OpenAI models based on their training data. They can also delete their fine-tuned models by calling the DELETE API operation. This control allows customers to manage their specific models as per their requirements.

3. Opting out of data storage:

Customers who are approved for modified abuse monitoring have the option to opt out of data storage. By choosing this option, their prompts and completions data are not stored, and Microsoft employees do not have access to it.

Conclusion

The Azure OpenAI Service processes various types of data, including prompts and completions, training data, and results data. It employs methods such as generating completions and embeddings, abuse detection, and content filtering. By understanding these data processing mechanisms, users can leverage the service effectively to build intelligent applications and ensure a safe user experience.