Azure search: Introduction (part 1 of 3)
Azure search: Introduction (part 1 of 3)

Searchable - Fields marked as searchable will be searchable through the REST API. When a field is marked as searchable it undergoes token and word analysis. In the example above the description of the house is searchable. Therefore, if the description was "Spacious house with large garden" then the field will be broken down into words and undergo other lexical analysis, such as including word inflexions in the index. If the user searched for "gardening" or "gardens" then the example description would match, as they are word inflections of "garden". An important point about searchable fields is that they take up more space in the index, because Azure will store different variations of the word.
Filterable - Fields marked as filterable are fields which can be filtered with classic filters, such as equals, less than or more than. In the example above "lastRenovationDate" is marked as filterable. This allows the user to filter only for houses that have been renovated during a certain time frame, or houses that have been renovated recently.
Sortable - Fields marked as sortable can be specified to tell Azure Search how to order the results returned. By default Azure will return results in the order of the search score (based on how closely the search text matches a result in the index).
Analysers - Different analysers can be specified to tell Azure Search how to analyse the inputted data. For example, "lucene.fr" is used in the above example for the "description_fr" field. This means that the text will be analysed and suggestions, tokenisation and other analysis will be performed to better suit the French language. Various languages can be chosen, as well as various analysers.
Custom Analysers
In some scenarios you may want to analyse text differently to the standard approach taken by Azure Search. This can be done by standard or custom analysers. Analysers are configurations that filter or replace certain characters and symbols from the input text. The example above defines a custom analyser called "phonetic_ascii_analyzer". In this example a standard tokeniser is used but a custom analyser is created. The custom analyser will convert all input into lower case (search matching will happen on any case), ascii folding (normalises ?� or ?? to allow for easier matching) and phonetic (matches on phonetically similar words).
As well as custom analysers, a custom tokeniser can be created. A tokeniser defines how the input text can be split into independent tokens. For example, separating a sentence into words.
Data Sources and Indexers
A data source can be used (alongside an indexer) to sync data between a database and the Azure Search index. This can be done manually as a one-off job, or as a scheduled job of intervals up to 5 minutes. When defining a data source, you are defining the connection information for your database. This connection information is used by the indexer to sync the data.
Currently, there are 4 different types of data sources that can be used. Those types are: "azuresql", "documentdb" (Azure Cosmos DB), "azureblob" and "azuretable". A more advanced feature that can be specified as part of the data source definition is the high watermark change detection policy. This policy is used to specify when a column has been changed. This can be the row version or a last updated column (such as a timestamp). Another policy that can be specified is the SQL integration change detection policy. This is the most efficient change detection policy but can only be used by data sources that support change tracking (e.g Azure SQL DB V12). This policy does not require a column name but is done automatically.
Once the data source has been defined, an indexer can be defined. The indexer will extract information from the data source by crawling through it. A schedule is added as a parameter when creating the indexer, which will tell Azure how often to run the indexer and check for changes. This can be up to every 5 minutes. There are also some additional settings that can be stated such as 'batchSize' (number of items in a batch which can be tweaked to improve performance), 'maxFailedItems' and 'maxFailedItemsPerBatch' (number of failures, can be set to 0 for no errors allowed or -1 for infinite number of errors).
In case the fields in the index and fields in the data source do not match, field mappings can be defined. These field mappings can map names of fields in the data source to differently named fields in the index. Through the REST API actions such as create, delete, update and list indexers/data sources can be performed. You can also check on the status of the index, to view information on the failures that could have occurred during indexing.
Next up I will describe the basics of how to use Azure Search through the REST API in my 'Using Azure Search' blog.