top of page

How to Create an Index in Azure AI Search?

In the era of big data, the ability to efficiently search and retrieve information from vast volumes of data is crucial. Whether you’re developing a web application, mobile app, or a software-as-a-service (SaaS) app, implementing a robust and efficient search capability can significantly enhance the user experience. This is where Azure AI Search comes into play.


Azure AI Search, a powerful cloud-based search-as-a-service solution by Microsoft, provides developers with the tools to add sophisticated search capabilities to their applications. One of the key components of Azure AI Search is the search index. A search index in Azure AI Search is akin to a database table that stores and organizes your searchable data.


This article will guide you through creating an Azure AI Search index. We will cover everything from connecting to your data source, defining your index schema, to finally creating and loading your index.


What is an Index in Azure AI Search?

An index in Azure Search is a data structure that contains searchable information from one or more data sources. It is essentially a structured representation of the data that needs to be searched. This structured format enables efficient querying and retrieval of relevant information.


Role of indexes in storing and organizing searchable information:

Indexes play a crucial role in facilitating search operations within Azure Search. They serve as the foundation for conducting searches by storing preprocessed and indexed data in a structured format.


Here are the key roles of indexes:

  1. Storage: Indexes store searchable data in a structured format, allowing for efficient storage and retrieval of information.

  2. Organization: Indexes organize the data into fields and documents to easily search and retrieve information. Each field in the index represents an attribute or property of the data, such as title, description, or date.

  3. Searchability: Azure Search enables fast and accurate search operations by indexing the data. The indexed data can be queried using keywords, filters, and other search parameters to retrieve relevant results quickly.

  4. Scalability: Indexes are designed to scale with the size of the data and the complexity of search queries. They can handle large volumes of data efficiently and provide fast search results even as the dataset grows.


How to Create an Azure AI Search Index

Below are the steps to create the index in Azure AI Search:

  1. Using the"+Add index" option

  2. Using the "Import data" option


Option 1: Create Azure AI Search Index Using the"+Add index" option

STEP 1: Sign in to the Azure portal. In the search box at the top, type “Azure AI Search” and select it from the dropdown menu.


STEP 2: Create an Azure AI Search service if you haven’t already. Once the service is created, go to your resource.


STEP 3: On the Overview page, click the "+ add index" option and select "Add index"


Click on + add index to create Azure AI Search

STEP 4: This will open an embedded editor where you can specify an index schema. Here, you’ll need to define the schema for your index.


enter index name to create Azure AI Search

 

This includes:

  • Specifying fields: Click “+ add field” to add a new field.

  • Setting data types for each field.

  • Configuring indexing options for each field.


add field in to create Azure AI Search


Step 5: Identify a document key. A document key is a unique identifier for each document in your index. It’s a single-string field from a source data field containing unique values.


Step 6: Once you’ve defined the schema and identified a document key, click “Create” to create your index.


Option 2: Create Azure AI Search Index Using the "Import data" option

STEP 1: Click on the "Import data" option.


STEP 2: Connect to Data Source

Expand the Data source dropdown and select "Samples". From the list of samples, choose the hotel sample.


connect to the data to create Azure AI Search


You can also connect to your data source. Azure AI Search supports various data sources such as:

  1. Azure SQL Database

  2. SQL Server on Azure VMs

  3. Azure Cosmos DB

  4. Azure Blob Storage

  5. Azure Data Lake Storage Gen2

  6. Azure Table Storage

  7. SharePoint Online (Preview)

  8. Azure File Storage (Preview)

  9. Azure Database for MySQL  (Preview)


Click "Next: Add cognitive skills (optional)".


STEP 3: Configure Cognitive Skills

You can configure cognitive skills here to add AI enrichment to your data. This step is optional and can be skipped if not needed.


Add cognitive skills in Azure AI Search

Click "Skip to: Customize Target index".


STEP 4: Customize target index

It automatically creates a schema based on the built-in hotels-sample data.


Customize target index in Azure AI Search

further customization in Azure AI Search

Accept the suggested values for the Index name (hotels-sample-index) and Key field (HotelId).


Accept the system-generated field attributes (unless you're rerunning the wizard with an existing data source).


An index requires at least an Index name and a collection of Fields. Each document needs a unique identifier defined by a Key field (always a string). The wizard automatically selects a suitable field for the key.


Each field has the following properties:

  • Name: A descriptive name for the field.

  • Data type: The type of data field (e.g., string, integer).

  • Attributes: These control how the field is used in search:

  • Retrievable: Whether the field is returned in search results.

  • Filterable: Whether the field can be used for filtering searches.

  • Sortable: Whether the field can be used for sorting search results.

  • Facetable: Whether the field can be used for faceted navigation.

  • Searchable: Whether the field is used in full-text search (strings are searchable by default).

  • Analyzers/Suggesters: Optional attributes for enabling features like autocomplete and suggested queries.


Click "Next: Create an indexer".


STEP 5: Create an Indexer

In this step, you’ll create an indexer that will connect to your data source, read the data, and pass it to the search engine for indexing.

  1. Specify the Indexer Name: This is a unique identifier for the indexer within the indexer collection.

  2. Set the Schedule: You can set the indexer to run once, hourly, daily, or on a custom schedule. This determines how often the indexer will run to update the index with any changes in the data source.


create an indexer in Azure AI Search

Configure Advanced Options:

Click on the advanced option to configure the following:

  • Base-64 Encode keys: If your document keys contain special characters, you can choose to Base-64 encode them.

  • Max Failed Items: This is the maximum number of items that can fail to be indexed before the entire indexer run is considered a failure.

  • Max Failed Items Per Batch: This is the maximum number of items that can fail in a single batch before the entire indexer run is considered a failure.

  • Batch Size: The number of items the indexer will attempt to index in a single batch.


advanced options to create an indexer in Azure AI Search

Once configured all the settings, click “Submit” to create the indexer.


The indexer will start running according to the schedule, and you can monitor its progress in the Azure portal.


Indexing Data in Azure AI Search

Azure AI Search utilizes indexers, and specialized crawlers that streamline data ingestion. These crawlers extract textual data from various cloud sources and populate a search index. This process, often a pull model, eliminates the need for custom code to add data to the index.


Enriching Data with AI and Skills:

Indexers act as catalysts for skillset execution and AI enrichment. Skills are configurable modules that perform additional processing on content before it's indexed. Examples include:

  • Optical Character Recognition (OCR) for extracting text from images

  • Text Split Skill for chunking large data into manageable pieces

  • Text Translation Skill for multilingual search capabilities


Supported Data Sources and Configuration:

Indexers target specific data sources. This involves defining a data source (origin) and a target search index (destination). Specific data sources, like Azure Blob Storage, might require additional configuration options tailored to their content type.


Scheduling Data Refresh:

You can run indexers either on-demand or set up recurring schedules. Schedules can be as frequent as every five minutes. For even more frequent updates, a push model is necessary. This model synchronizes data updates across Azure AI Search and the external source simultaneously.


Indexer Performance and Scalability:

A search service assigns a single indexer job per search unit. To achieve concurrent processing, ensure you have sufficient replicas allocated. Indexers are foreground processes, meaning heavy indexing activity might temporarily increase query throttling.


Data Ingestion Strategies:

Indexers offer flexibility for data ingestion. You can use them as the sole data source or combine them with other techniques. The search index can accept content from various sources, with each indexer contributing new data from its respective provider. Each source can contribute entire documents or populate specific fields within documents.


For parallelized indexing of massive datasets, consider a multi-indexer strategy. This approach assigns subsets of the data to individual indexers, enabling faster and more efficient processing.


Azure AI Search Schema Definition

In Azure Search, defining the schema for an index is a critical step in creating a searchable index. It involves specifying the structure of the data that will be stored in the index, including fields, data types, and indexing options.


Importance of defining the schema for an index:

  1. Data Consistency: By defining the schema, you ensure that the data stored in the index follows a consistent structure. This consistency is essential for accurate search results and efficient query processing.

  2. Search Relevance: The schema allows you to specify which fields are searchable, filterable, and sortable. By defining these properties, you can control the relevance of search results and provide users with more meaningful insights.

  3. Index Optimization: A well-defined schema enables Azure Search to optimize indexing and query processing.  You can improve performance and resource utilization by specifying data types and indexing options leading to faster search operations.

  4. Data Enrichment: The schema can include fields for storing additional metadata or derived attributes. This enables data enrichment processes, such as adding computed fields or extracting insights from the indexed data.


Conclusion

In this article, we’ve walked you through creating an Azure AI Search index, a fundamental of the Azure AI Search service. We’ve covered everything from connecting to your data source and defining your index schema to creating and loading your index.


If you found this article helpful, share it with others who might benefit. Also, if you have any additional or updated information, please leave a comment. Your feedback and contributions are greatly appreciated! 😊

Commentaires


bottom of page