Azure Text Embeddings: A New Asynchronous Method for Generating Embeddings

Azure Text Embeddings is a cloud-based service that provides a variety of methods for generating embeddings for text data. Embeddings are a way of representing text data as vectors of numbers. This makes it possible to use machine learning algorithms to perform tasks such as text classification, sentiment analysis, and natural language search.

Azure Text Embeddings is available in two versions:

Synchronous version
Asynchronous version.

Synchronous method

The synchronous version generates embeddings for a list of texts in a single thread. This means that the entire process of generating embeddings for a list of texts will take as long as the longest text in the list.

The advantage of the synchronous method is that it is simpler to code and understand. The disadvantage is that it can be inefficient for applications that need to generate embeddings for a large number of texts.

Asynchronous method

The asynchronous version generates embeddings for a list of texts in parallel. This means that the embeddings for each text can be generated at the same time.

The advantage of the asynchronous method is that it is much more efficient for applications that need to generate embeddings for a large number of texts. The disadvantage is that it is more complex to code and understand.

The following table summarizes the advantages and disadvantages of the synchronous and asynchronous methods:

Method	Advantages	Disadvantages
Synchronous	Simpler to code and understand	Inefficient for large number of texts
Asynchronous	More efficient for large number of texts	More complex to code and understand

Old Method to Generate Embeddings for a list of texts (Synchronous Method)

The generate_embeddings() method takes a list of texts as input and returns a list of embeddings. The embeddings are represented as vectors of numbers.

from azure.ai.textanalytics import TextAnalyticsClient

client = TextAnalyticsClient("my-subscription-id", "my-resource-group", "my-workspace")

texts = ["This is a text", "This is another text"]

embeddings = client.generate_embeddings(texts)

for embedding in embeddings:
    print(embedding)

This code first imports the azure.ai.textanalytics module. The TextAnalyticsClient class is used to interact with the Azure Text Analytics service. Then it creates a TextAnalyticsClient object. The object is used to create a connection to the Azure Text Analytics service. Then the code gets the embeddings for a list of texts. The texts variable is a list of strings, and the embeddings variable is a list of embeddings. The embeddings are represented as vectors of numbers. At last, the code prints the embeddings.

The old method has the following problems:

It is inefficient for a large number of texts. This is because the entire process of generating embeddings for a list of texts will take as long as the longest text in the list.
It can block the calling thread. This means that the calling thread will not be able to do anything else until the embeddings for all of the texts have been generated.

New Method to Generate Embeddings for a list of texts (Asynchronous Method)

The AzureTextEmbeddings.GenerateEmbeddingsAsync() method is a new method that is available in the asynchronous version of Azure Text Embeddings. This method allows users to generate embeddings for a list of texts in an asynchronous way.

The AzureTextEmbeddings.GenerateEmbeddingsAsync() method takes two parameters:

texts: A list of strings.
max_inputs: The maximum number of inputs.

The texts parameter is a list of strings that you want to generate embeddings for. The max_inputs parameter is the maximum number of texts that can be processed at the same time.

The following code illustrates how to use the AzureTextEmbeddings.GenerateEmbeddingsAsync() method:

async def GenerateEmbeddingsAsync(texts: List[str], max_inputs: int = 1):
  """Generates embeddings for a list of texts.

  Args:
    texts: A list of strings.
    max_inputs: The maximum number of inputs.

  Returns:
    A list of embeddings.

  Raises:
    ValueError: If the number of inputs exceeds the max_inputs parameter, or if
      the inputs are not strings, or if you have exceeded your quota.
    HTTPError: If the request failed.
  """

  if len(texts) > max_inputs:
    raise ValueError("The number of inputs must not be greater than the max_inputs parameter.")
  if not all(isinstance(text, str) for text in texts):
    raise ValueError("The inputs must be strings.")
    
  # Check the quota
  if not _CheckQuota():
    raise ValueError("You have exceeded your quota.")
  
  # Make the request
  response = await _MakeRequest(texts)
  
  # Handle the response
  if response.status_code != 200:
    raise ValueError(f"The request failed with status code {response.status_code}.")
  return response.json()

The method first checks to make sure that the number of inputs does not exceed the max_inputs parameter. If it does, the method raises a ValueError exception.

The method then validates the input text to make sure that it is a valid string. If the input text is not a valid string, the method raises a ValueError exception.

The method then checks the quota to make sure that the user has enough quota to generate the embeddings. If the user does not have enough quota, the method raises a ValueError exception.

If the input text is valid and the user has enough quota, the method then makes a request to the Azure OpenAI service to generate the embeddings. The response from the Azure OpenAI service is a JSON object that contains the embeddings.

Which method is best?

The best method to use depends on the specific application. If the application needs to generate embeddings for a small number of texts, the synchronous method may be a good choice. If the application needs to generate embeddings for a large number of texts, the asynchronous method may be a better choice.