Index Management - Getting Started - Elasticsearch: The Definitive Guide (2015)

Elasticsearch: The Definitive Guide (2015)

Part I. Getting Started

Chapter 10. Index Management

We have seen how Elasticsearch makes it easy to start developing a new application without requiring any advance planning or setup. However, it doesn’t take long before you start wanting to fine-tune the indexing and search process to better suit your particular use case. Almost all of these customizations relate to the index, and the types that it contains. In this chapter, we introduce the APIs for managing indices and type mappings, and the most important settings.

Creating an Index

Until now, we have created a new index by simply indexing a document into it. The index is created with the default settings, and new fields are added to the type mapping by using dynamic mapping. Now we need more control over the process: we want to ensure that the index has been created with the appropriate number of primary shards, and that analyzers and mappings are set up before we index any data.

To do this, we have to create the index manually, passing in any settings or type mappings in the request body, as follows:

PUT /my_index

{

"settings": { ... any settings ... },

"mappings": {

"type_one": { ... any mappings ... },

"type_two": { ... any mappings ... },

...

}

}

In fact, if you want to, you can prevent the automatic creation of indices by adding the following setting to the config/elasticsearch.yml file on each node:

action.auto_create_index: false

NOTE

Later, we discuss how you can use “Index Templates” to preconfigure automatically created indices. This is particularly useful when indexing log data: you log into an index whose name includes the date and, as midnight rolls over, a new properly configured index automatically springs into existence.

Deleting an Index

To delete an index, use the following request:

DELETE /my_index

You can delete multiple indices with this:

DELETE /index_one,index_two

DELETE /index_*

You can even delete all indices with this:

DELETE /_all

Index Settings

There are many many knobs that you can twiddle to customize index behavior, which you can read about in the Index Modules reference documentation, but…

TIP

Elasticsearch comes with good defaults. Don’t twiddle these knobs until you understand what they do and why you should change them.

Two of the most important settings are as follows:

number_of_shards

The number of primary shards that an index should have, which defaults to 5. This setting cannot be changed after index creation.

number_of_replicas

The number of replica shards (copies) that each primary shard should have, which defaults to 1. This setting can be changed at any time on a live index.

For instance, we could create a small index—just one primary shard—and no replica shards with the following request:

PUT /my_temp_index

{

"settings": {

"number_of_shards" : 1,

"number_of_replicas" : 0

}

}

Later, we can change the number of replica shards dynamically using the update-index-settings API as follows:

PUT /my_temp_index/_settings

{

"number_of_replicas": 1

}

Configuring Analyzers

The third important index setting is the analysis section, which is used to configure existing analyzers or to create new custom analyzers specific to your index.

In “Analysis and Analyzers”, we introduced some of the built-in analyzers, which are used to convert full-text strings into an inverted index, suitable for searching.

The standard analyzer, which is the default analyzer used for full-text fields, is a good choice for most Western languages. It consists of the following:

§ The standard tokenizer, which splits the input text on word boundaries

§ The standard token filter, which is intended to tidy up the tokens emitted by the tokenizer (but currently does nothing)

§ The lowercase token filter, which converts all tokens into lowercase

§ The stop token filter, which removes stopwords—common words that have little impact on search relevance, such as a, the, and, is.

By default, the stopwords filter is disabled. You can enable it by creating a custom analyzer based on the standard analyzer and setting the stopwords parameter. Either provide a list of stopwords or tell it to use a predefined stopwords list from a particular language.

In the following example, we create a new analyzer called the es_std analyzer, which uses the predefined list of Spanish stopwords:

PUT /spanish_docs

{

"settings": {

"analysis": {

"analyzer": {

"es_std": {

"type": "standard",

"stopwords": "_spanish_"

}

}

}

}

}

The es_std analyzer is not global—it exists only in the spanish_docs index where we have defined it. To test it with the analyze API, we must specify the index name:

GET /spanish_docs/_analyze?analyzer=es_std

El veloz zorro marrón

The abbreviated results show that the Spanish stopword El has been removed correctly:

{

"tokens" : [

{ "token" : "veloz", "position" : 2 },

{ "token" : "zorro", "position" : 3 },

{ "token" : "marrón", "position" : 4 }

]

}

Custom Analyzers

While Elasticsearch comes with a number of analyzers available out of the box, the real power comes from the ability to create your own custom analyzers by combining character filters, tokenizers, and token filters in a configuration that suits your particular data.

In “Analysis and Analyzers”, we said that an analyzer is a wrapper that combines three functions into a single package, which are executed in sequence:

Character filters

Character filters are used to “tidy up” a string before it is tokenized. For instance, if our text is in HTML format, it will contain HTML tags like <p> or <div> that we don’t want to be indexed. We can use the html_strip character filter to remove all HTML tags and to convert HTML entities like Á into the corresponding Unicode character Á.

An analyzer may have zero or more character filters.

Tokenizers

An analyzer must have a single tokenizer. The tokenizer breaks up the string into individual terms or tokens. The standard tokenizer, which is used in the standard analyzer, breaks up a string into individual terms on word boundaries, and removes most punctuation, but other tokenizers exist that have different behavior.

For instance, the keyword tokenizer outputs exactly the same string as it received, without any tokenization. The whitespace tokenizer splits text on whitespace only. The pattern tokenizer can be used to split text on a matching regular expression.

Token filters

After tokenization, the resulting token stream is passed through any specified token filters, in the order in which they are specified.

Token filters may change, add, or remove tokens. We have already mentioned the lowercase and stop token filters, but there are many more available in Elasticsearch. Stemming token filters “stem” words to their root form. The ascii_folding filter removes diacritics, converting a term like "très" into "tres". The ngram and edge_ngram token filters can produce tokens suitable for partial matching or autocomplete.

In Part II, we discuss examples of where and how to use these tokenizers and filters. But first, we need to explain how to create a custom analyzer.

Creating a Custom Analyzer

In the same way as we configured the es_std analyzer previously, we can configure character filters, tokenizers, and token filters in their respective sections under analysis:

PUT /my_index

{

"settings": {

"analysis": {

"char_filter": { ... custom character filters ... },

"tokenizer": { ... custom tokenizers ... },

"filter": { ... custom token filters ... },

"analyzer": { ... custom analyzers ... }

}

}

}

As an example, let’s set up a custom analyzer that will do the following:

1. Strip out HTML by using the html_strip character filter.

2. Replace & characters with " and ", using a custom mapping character filter:

3. "char_filter": {

4. "&_to_and": {

5. "type": "mapping",

6. "mappings": [ "&=> and "]

7. }

}

8. Tokenize words, using the standard tokenizer.

9. Lowercase terms, using the lowercase token filter.

10.Remove a custom list of stopwords, using a custom stop token filter:

11."filter": {

12. "my_stopwords": {

13. "type": "stop",

14. "stopwords": [ "the", "a" ]

15. }

}

Our analyzer definition combines the predefined tokenizer and filters with the custom filters that we have configured previously:

"analyzer": {

"my_analyzer": {

"type": "custom",

"char_filter": [ "html_strip", "&_to_and" ],

"tokenizer": "standard",

"filter": [ "lowercase", "my_stopwords" ]

}

}

To put it all together, the whole create-index request looks like this:

PUT /my_index

{

"settings": {

"analysis": {

"char_filter": {

"&_to_and": {

"type": "mapping",

"mappings": [ "&=> and "]

}},

"filter": {

"my_stopwords": {

"type": "stop",

"stopwords": [ "the", "a" ]

}},

"analyzer": {

"my_analyzer": {

"type": "custom",

"char_filter": [ "html_strip", "&_to_and" ],

"tokenizer": "standard",

"filter": [ "lowercase", "my_stopwords" ]

}}

}}}

After creating the index, use the analyze API to test the new analyzer:

GET /my_index/_analyze?analyzer=my_analyzer

The quick & brown fox

The following abbreviated results show that our analyzer is working correctly:

{

"tokens" : [

{ "token" : "quick", "position" : 2 },

{ "token" : "and", "position" : 3 },

{ "token" : "brown", "position" : 4 },

{ "token" : "fox", "position" : 5 }

]

}

The analyzer is not much use unless we tell Elasticsearch where to use it. We can apply it to a string field with a mapping such as the following:

PUT /my_index/_mapping/my_type

{

"properties": {

"title": {

"type": "string",

"analyzer": "my_analyzer"

}

}

}

Types and Mappings

A type in Elasticsearch represents a class of similar documents. A type consists of a name—such as user or blogpost—and a mapping. The mapping, like a database schema, describes the fields or properties that documents of that type may have, the datatype of each field—such asstring, integer, or date—and how those fields should be indexed and stored by Lucene.

In “What Is a Document?”, we said that a type is like a table in a relational database. While this is a useful way to think about types initially, it is worth explaining in more detail exactly what a type is and how they are implemented on top of Lucene.

How Lucene Sees Documents

A document in Lucene consists of a simple list of field-value pairs. A field must have at least one value, but any field can contain multiple values. Similarly, a single string value may be converted into multiple values by the analysis process. Lucene doesn’t care if the values are strings or numbers or dates—all values are just treated as opaque bytes.

When we index a document in Lucene, the values for each field are added to the inverted index for the associated field. Optionally, the original values may also be stored unchanged so that they can be retrieved later.

How Types Are Implemented

Elasticsearch types are implemented on top of this simple foundation. An index may have several types, each with its own mapping, and documents of any of these types may be stored in the same index.

Because Lucene has no concept of document types, the type name of each document is stored with the document in a metadata field called _type. When we search for documents of a particular type, Elasticsearch simply uses a filter on the _type field to restrict results to documents of that type.

Lucene also has no concept of mappings. Mappings are the layer that Elasticsearch uses to map complex JSON documents into the simple flat documents that Lucene expects to receive.

For instance, the mapping for the name field in the user type may declare that the field is a string field, and that its value should be analyzed by the whitespace analyzer before being indexed into the inverted index called name:

"name": {

"type": "string",

"analyzer": "whitespace"

}

Avoiding Type Gotchas

The fact that documents of different types can be added to the same index introduces some unexpected complications.

Imagine that we have two types in our index: blog_en for blog posts in English, and blog_es for blog posts in Spanish. Both types have a title field, but one type uses the english analyzer and the other type uses the spanish analyzer.

The problem is illustrated by the following query:

GET /_search

{

"query": {

"match": {

"title": "The quick brown fox"

}

}

}

We are searching in the title field in both types. The query string needs to be analyzed, but which analyzer does it use: spanish or english? It will use the analyzer for the first title field that it finds, which will be correct for some docs and incorrect for the others.

We can avoid this problem either by naming the fields differently—for example, title_en and title_es—or by explicitly including the type name in the field name and querying each field separately:

GET /_search

{

"query": {

"multi_match": { 1

"query": "The quick brown fox",

"fields": [ "blog_en.title", "blog_es.title" ]

}

}

}

1

The multi_match query runs a match query on multiple fields and combines the results.

Our new query uses the english analyzer for the field blog_en.title and the spanish analyzer for the field blog_es.title, and combines the results from both fields into an overall relevance score.

This solution can help when both fields have the same datatype, but consider what would happen if you indexed these two documents into the same index:

§ Type: user

{ "login": "john_smith" }

§ Type: event

{ "login": "2014-06-01" }

Lucene doesn’t care that one field contains a string and the other field contains a date. It will happily index the byte values from both fields.

However, if we now try to sort on the event.login field, Elasticsearch needs to load the values in the login field into memory. As we said in “Fielddata”, it loads the values for all documents in the index regardless of their type.

It will try to load these values either as a string or as a date, depending on which login field it sees first. This will either produce unexpected results or fail outright.

TIP

To ensure that you don’t run into these conflicts, it is advisable to ensure that fields with the same name are mapped in the same way in every type in an index.

The Root Object

The uppermost level of a mapping is known as the root object. It may contain the following:

§ A properties section, which lists the mapping for each field that a document may contain

§ Various metadata fields, all of which start with an underscore, such as _type, _id, and _source

§ Settings, which control how the dynamic detection of new fields is handled, such as analyzer, dynamic_date_formats, and dynamic_templates

§ Other settings, which can be applied both to the root object and to fields of type object, such as enabled, dynamic, and include_in_all

Properties

We have already discussed the three most important settings for document fields or properties in “Core Simple Field Types” and “Complex Core Field Types”:

type

The datatype that the field contains, such as string or date

index

Whether a field should be searchable as full text (analyzed), searchable as an exact value (not_analyzed), or not searchable at all (no)

analyzer

Which analyzer to use for a full-text field, both at index time and at search time

We will discuss other field types such as ip, geo_point, and geo_shape in the appropriate sections later in the book.

Metadata: _source Field

By default, Elasticsearch stores the JSON string representing the document body in the _source field. Like all stored fields, the _source field is compressed before being written to disk.

This is almost always desired functionality because it means the following:

§ The full document is available directly from the search results—no need for a separate round-trip to fetch the document from another data store.

§ Partial update requests will not function without the _source field.

§ When your mapping changes and you need to reindex your data, you can do so directly from Elasticsearch instead of having to retrieve all of your documents from another (usually slower) data store.

§ Individual fields can be extracted from the _source field and returned in get or search requests when you don’t need to see the whole document.

§ It is easier to debug queries, because you can see exactly what each document contains, rather than having to guess their contents from a list of IDs.

That said, storing the _source field does use disk space. If none of the preceding reasons is important to you, you can disable the _source field with the following mapping:

PUT /my_index

{

"mappings": {

"my_type": {

"_source": {

"enabled": false

}

}

}

}

In a search request, you can ask for only certain fields by specifying the _source parameter in the request body:

GET /_search

{

"query": { "match_all": {}},

"_source": [ "title", "created" ]

}

Values for these fields will be extracted from the _source field and returned instead of the full _source.

STORED FIELDS

Besides indexing the values of a field, you can also choose to store the original field value for later retrieval. Users with a Lucene background use stored fields to choose which fields they would like to be able to return in their search results. In fact, the _source field is a stored field.

In Elasticsearch, setting individual document fields to be stored is usually a false optimization. The whole document is already stored as the _source field. It is almost always better to just extract the fields that you need by using the _source parameter.

Metadata: _all Field

In “Search Lite, we introduced the _all field: a special field that indexes the values from all other fields as one big string. The query_string query clause (and searches performed as ?q=john) defaults to searching in the _all field if no other field is specified.

The _all field is useful during the exploratory phase of a new application, while you are still unsure about the final structure that your documents will have. You can throw any query string at it and you have a good chance of finding the document you’re after:

GET /_search

{

"match": {

"_all": "john smith marketing"

}

}

As your application evolves and your search requirements become more exacting, you will find yourself using the _all field less and less. The _all field is a shotgun approach to search. By querying individual fields, you have more flexbility, power, and fine-grained control over which results are considered to be most relevant.

NOTE

One of the important factors taken into account by the relevance algorithm is the length of the field: the shorter the field, the more important. A term that appears in a short title field is likely to be more important than the same term that appears somewhere in a long content field. This distinction between field lengths disappears in the _all field.

If you decide that you no longer need the _all field, you can disable it with this mapping:

PUT /my_index/_mapping/my_type

{

"my_type": {

"_all": { "enabled": false }

}

}

Inclusion in the _all field can be controlled on a field-by-field basis by using the include_in_all setting, which defaults to true. Setting include_in_all on an object (or on the root object) changes the default for all fields within that object.

You may find that you want to keep the _all field around to use as a catchall full-text field just for specific fields, such as title, overview, summary, and tags. Instead of disabling the _all field completely, disable include_in_all for all fields by default, and enable it only on the fields you choose:

PUT /my_index/my_type/_mapping

{

"my_type": {

"include_in_all": false,

"properties": {

"title": {

"type": "string",

"include_in_all": true

},

...

}

}

}

Remember that the _all field is just an analyzed string field. It uses the default analyzer to analyze its values, regardless of which analyzer has been set on the fields where the values originate. And like any string field, you can configure which analyzer the _all field should use:

PUT /my_index/my_type/_mapping

{

"my_type": {

"_all": { "analyzer": "whitespace" }

}

}

Metadata: Document Identity

There are four metadata fields associated with document identity:

_id

The string ID of the document

_type

The type name of the document

_index

The index where the document lives

_uid

The _type and _id concatenated together as type#id

By default, the _uid field is stored (can be retrieved) and indexed (searchable). The _type field is indexed but not stored, and the _id and _index fields are neither indexed nor stored, meaning they don’t really exist.

In spite of this, you can query the _id field as though it were a real field. Elasticsearch uses the _uid field to derive the _id. Although you can change the index and store settings for these fields, you almost never need to do so.

The _id field does have one setting that you may want to use: the path setting tells Elasticsearch that it should extract the value for the _id from a field within the document itself.

PUT /my_index

{

"mappings": {

"my_type": {

"_id": {

"path": "doc_id" 1

},

"properties": {

"doc_id": {

"type": "string",

"index": "not_analyzed"

}

}

}

}

}

1

Extract the doc _id from the doc_id field.

Then, when you index a document:

POST /my_index/my_type

{

"doc_id": "123"

}

the _id value will be extracted from the doc_id field in the document body:

{

"_index": "my_index",

"_type": "my_type",

"_id": "123", 1

"_version": 1,

"created": true

}

1

The _id has been extracted correctly.

WARNING

While this is very convenient, be aware that it has a slight performance impact on bulk requests (see “Why the Funny Format?”). The node handling the request can no longer use the optimized bulk format to parse just the metadata line in order to decide which shard should receive the request. Instead, it has to parse the document body as well.

Dynamic Mapping

When Elasticsearch encounters a previously unknown field in a document, it uses dynamic mapping to determine the datatype for the field and automatically adds the new field to the type mapping.

Sometimes this is the desired behavior and sometimes it isn’t. Perhaps you don’t know what fields will be added to your documents later, but you want them to be indexed automatically. Perhaps you just want to ignore them. Or—especially if you are using Elasticsearch as a primary data store—perhaps you want unknown fields to throw an exception to alert you to the problem.

Fortunately, you can control this behavior with the dynamic setting, which accepts the following options:

true

Add new fields dynamically—the default

false

Ignore new fields

strict

Throw an exception if an unknown field is encountered

The dynamic setting may be applied to the root object or to any field of type object. You could set dynamic to strict by default, but enable it just for a specific inner object:

PUT /my_index

{

"mappings": {

"my_type": {

"dynamic": "strict", 1

"properties": {

"title": { "type": "string"},

"stash": {

"type": "object",

"dynamic": true 2

}

}

}

}

}

1

The my_type object will throw an exception if an unknown field is encountered.

2

The stash object will create new fields dynamically.

With this mapping, you can add new searchable fields into the stash object:

PUT /my_index/my_type/1

{

"title": "This doc adds a new field",

"stash": { "new_field": "Success!" }

}

But trying to do the same at the top level will fail:

PUT /my_index/my_type/1

{

"title": "This throws a StrictDynamicMappingException",

"new_field": "Fail!"

}

NOTE

Setting dynamic to false doesn’t alter the contents of the _source field at all. The _source will still contain the whole JSON document that you indexed. However, any unknown fields will not be added to the mapping and will not be searchable.

Customizing Dynamic Mapping

If you know that you are going to be adding new fields on the fly, you probably want to leave dynamic mapping enabled. At times, though, the dynamic mapping “rules” can be a bit blunt. Fortunately, there are settings that you can use to customize these rules to better suit your data.

date_detection

When Elasticsearch encounters a new string field, it checks to see if the string contains a recognizable date, like 2014-01-01. If it looks like a date, the field is added as type date. Otherwise, it is added as type string.

Sometimes this behavior can lead to problems. Imagine that you index a document like this:

{ "note": "2014-01-01" }

Assuming that this is the first time that the note field has been seen, it will be added as a date field. But what if the next document looks like this:

{ "note": "Logged out" }

This clearly isn’t a date, but it is too late. The field is already a date field and so this “malformed date” will cause an exception to be thrown.

Date detection can be turned off by setting date_detection to false on the root object:

PUT /my_index

{

"mappings": {

"my_type": {

"date_detection": false

}

}

}

With this mapping in place, a string will always be a string. If you need a date field, you have to add it manually.

NOTE

Elasticsearch’s idea of which strings look like dates can be altered with the dynamic_date_formats setting.

dynamic_templates

With dynamic_templates, you can take complete control over the mapping that is generated for newly detected fields. You can even apply a different mapping depending on the field name or datatype.

Each template has a name, which you can use to describe what the template does, a mapping to specify the mapping that should be applied, and at least one parameter (such as match) to define which fields the template should apply to.

Templates are checked in order; the first template that matches is applied. For instance, we could specify two templates for string fields:

§ es: Field names ending in _es should use the spanish analyzer.

§ en: All others should use the english analyzer.

We put the es template first, because it is more specific than the catchall en template, which matches all string fields:

PUT /my_index

{

"mappings": {

"my_type": {

"dynamic_templates": [

{ "es": {

"match": "*_es", 1

"match_mapping_type": "string",

"mapping": {

"type": "string",

"analyzer": "spanish"

}

}},

{ "en": {

"match": "*", 2

"match_mapping_type": "string",

"mapping": {

"type": "string",

"analyzer": "english"

}

}}

]

}}}

1

Match string fields whose name ends in _es.

2

Match all other string fields.

The match_mapping_type allows you to apply the template only to fields of the specified type, as detected by the standard dynamic mapping rules, (for example string or long).

The match parameter matches just the field name, and the path_match parameter matches the full path to a field in an object, so the pattern address.*.name would match a field like this:

{

"address": {

"city": {

"name": "New York"

}

}

}

The unmatch and path_unmatch patterns can be used to exclude fields that would otherwise match.

More configuration options can be found in the reference documentation for the root object.

Default Mapping

Often, all types in an index share similar fields and settings. It can be more convenient to specify these common settings in the _default_ mapping, instead of having to repeat yourself every time you create a new type. The _default_ mapping acts as a template for new types. All types created after the _default_ mapping will include all of these default settings, unless explicitly overridden in the type mapping itself.

For instance, we can disable the _all field for all types, using the _default_ mapping, but enable it just for the blog type, as follows:

PUT /my_index

{

"mappings": {

"_default_": {

"_all": { "enabled": false }

},

"blog": {

"_all": { "enabled": true }

}

}

}

The _default_ mapping can also be a good place to specify index-wide dynamic templates.

Reindexing Your Data

Although you can add new types to an index, or add new fields to a type, you can’t add new analyzers or make changes to existing fields. If you were to do so, the data that had already been indexed would be incorrect and your searches would no longer work as expected.

The simplest way to apply these changes to your existing data is to reindex: create a new index with the new settings and copy all of your documents from the old index to the new index.

One of the advantages of the _source field is that you already have the whole document available to you in Elasticsearch itself. You don’t have to rebuild your index from the database, which is usually much slower.

To reindex all of the documents from the old index efficiently, use scan-and-scroll to retrieve batches of documents from the old index, and the bulk API to push them into the new index.

REINDEXING IN BATCHES

You can run multiple reindexing jobs at the same time, but you obviously don’t want their results to overlap. Instead, break a big reindex down into smaller jobs by filtering on a date or timestamp field:

GET /old_index/_search?search_type=scan&scroll=1m

{

"query": {

"range": {

"date": {

"gte": "2014-01-01",

"lt": "2014-02-01"

}

}

},

"size": 1000

}

If you continue making changes to the old index, you will want to make sure that you include the newly added documents in your new index as well. This can be done by rerunning the reindex process, but again filtering on a date field to match only documents that have been added since the last reindex process started.

Index Aliases and Zero Downtime

The problem with the reindexing process described previously is that you need to update your application to use the new index name. Index aliases to the rescue!

An index alias is like a shortcut or symbolic link, which can point to one or more indices, and can be used in any API that expects an index name. Aliases give us an enormous amount of flexibility. They allow us to do the following:

§ Switch transparently between one index and another on a running cluster

§ Group multiple indices (for example, last_three_months)

§ Create “views” on a subset of the documents in an index

We will talk more about the other uses for aliases later in the book. For now we will explain how to use them to switch from an old index to a new index with zero downtime.

There are two endpoints for managing aliases: _alias for single operations, and _aliases to perform multiple operations atomically.

In this scenario, we will assume that your application is talking to an index called my_index. In reality, my_index will be an alias that points to the current real index. We will include a version number in the name of the real index: my_index_v1, my_index_v2, and so forth.

To start off, create the index my_index_v1, and set up the alias my_index to point to it:

PUT /my_index_v1 1

PUT /my_index_v1/_alias/my_index 2

1

Create the index my_index_v1.

2

Set the my_index alias to point to my_index_v1.

You can check which index the alias points to:

GET /*/_alias/my_index

Or which aliases point to the index:

GET /my_index_v1/_alias/*

Both of these return the following:

{

"my_index_v1" : {

"aliases" : {

"my_index" : { }

}

}

}

Later, we decide that we want to change the mappings for a field in our index. Of course, we can’t change the existing mapping, so we have to reindex our data. To start, we create my_index_v2 with the new mappings:

PUT /my_index_v2

{

"mappings": {

"my_type": {

"properties": {

"tags": {

"type": "string",

"index": "not_analyzed"

}

}

}

}

}

Then we reindex our data from my_index_v1 to my_index_v2, following the process described in “Reindexing Your Data”. Once we are satisfied that our documents have been reindexed correctly, we switch our alias to point to the new index.

An alias can point to multiple indices, so we need to remove the alias from the old index at the same time as we add it to the new index. The change needs to be atomic, which means that we must use the _aliases endpoint:

POST /_aliases

{

"actions": [

{ "remove": { "index": "my_index_v1", "alias": "my_index" }},

{ "add": { "index": "my_index_v2", "alias": "my_index" }}

]

}

Your application has switched from using the old index to the new index transparently, with zero downtime.

TIP

Even when you think that your current index design is perfect, it is likely that you will need to make some change later, when your index is already being used in production.

Be prepared: use aliases instead of indices in your application. Then you will be able to reindex whenever you need to. Aliases are cheap and should be used liberally.