Geo-shapes - Geolocation - Elasticsearch: The Definitive Guide (2015)

Elasticsearch: The Definitive Guide (2015)

Part V. Geolocation

Chapter 39. Geo-shapes

Geo-shapes use a completely different approach than geo-points. A circle on a computer screen does not consist of a perfect continuous line. Instead it is drawn by coloring adjacent pixels as an approximation of a circle. Geo-shapes work in much the same way.

Complex shapes—such as points, lines, polygons, multipolygons, and polygons with holes,--are “painted” onto a grid of geohash cells, and the shape is converted into a list of the geohashes of all the cells that it touches.

NOTE

Actually, two types of grids can be used with geo-shapes: geohashes, which we have already discussed and which are the default encoding, and quad trees. Quad trees are similar to geohashes except that there are only four cells at each level, instead of 32. The difference comes down to a choice of encoding.

All of the geohashes that compose a shape are indexed as if they were terms. With this information in the index, it is easy to determine whether one shape intersects with another, as they will share the same geohash terms.

That is the extent of what you can do with geo-shapes: determine the relationship between a query shape and a shape in the index. The relation can be one of the following:

intersects

The query shape overlaps with the indexed shape (default).

disjoint

The query shape does not overlap at all with the indexed shape.

within

The indexed shape is entirely within the query shape.

Geo-shapes cannot be used to caculate distance, cannot be used for sorting or scoring, and cannot be used in aggregations.

Mapping geo-shapes

Like fields of type geo_point, geo-shapes have to be mapped explicitly before they can be used:

PUT /attractions

{

"mappings": {

"landmark": {

"properties": {

"name": {

"type": "string"

},

"location": {

"type": "geo_shape"

}

}

}

}

}

There are two important settings that you should consider changing precision and distance_error_pct.

precision

The precision parameter controls the maximum length of the geohashes that are generated. It defaults to a precision of 9, which equates to a geohash with dimensions of about 5m x 5m. That is probably far more precise than you need.

The lower the precision, the fewer terms that will be indexed and the faster the search will be. But of course, the lower the precision, the less accurate are your geo-shapes. Consider just how accurate you need your shapes to be—even one or two levels of precision can represent a significant savings.

You can specify precisions by using distances—for example, 50m or 2km—but ultimately these distances are converted to the same levels as described in Chapter 37.

distance_error_pct

When indexing a polygon, the big central continuous part can be represented cheaply by a short geohash. It is the edges that matter. Edges require much smaller geohashes to represent them with any accuracy.

If you’re indexing a small landmark, you want the edges to be quite accurate. It wouldn’t be good to have one monument overlapping with the next. When indexing an entire country, you don’t need quite as much precision. Fifty meters here or there isn’t likely to start any wars.

The distance_error_pct specifies the maximum allowable error based on the size of the shape. It defaults to 0.025, or 2.5%. In other words, big shapes (like countries) are allowed to have fuzzier edges than small shapes (like monuments).

The default of 0.025 is a good starting point, but the more error that is allowed, the fewer terms that are required to index a shape.

Indexing geo-shapes

Shapes are represented using GeoJSON, a simple open standard for encoding two-dimensional shapes in JSON. Each shape definition contains the type of shape—point, line, polygon, envelope,—and one or more arrays of longitude/latitude points.

Caution

In GeoJSON, coordinates are always written as longitude followed by latitude.

For instance, we can index a polygon representing Dam Square in Amsterdam as follows:

PUT /attractions/landmark/dam_square

{

"name" : "Dam Square, Amsterdam",

"location" : {

"type" : "polygon", 1

"coordinates" : [[ 2

[ 4.89218, 52.37356 ],

[ 4.89205, 52.37276 ],

[ 4.89301, 52.37274 ],

[ 4.89392, 52.37250 ],

[ 4.89431, 52.37287 ],

[ 4.89331, 52.37346 ],

[ 4.89305, 52.37326 ],

[ 4.89218, 52.37356 ]

]]

}

}

1

The type parameter indicates the type of shape that the coordinates represent.

2

The list of lon/lat points that describe the polygon.

The excess of square brackets in the example may look confusing, but the GeoJSON syntax is quite simple:

1. Each lon/lat point is represented as an array:

[lon,lat]

2. A list of points is wrapped in an array to represent a polygon:

[[lon,lat],[lon,lat], ... ]

3. A shape of type polygon can optionally contain several polygons; the first represents the polygon proper, while any subsequent polygons represent holes in the first:

4. [

5. [[lon,lat],[lon,lat], ... ], # main polygon

6. [[lon,lat],[lon,lat], ... ], # hole in main polygon

7. ...

]

See the Geo-shape mapping documentation for more details about the supported shapes.

Querying geo-shapes

The unusual thing about the geo_shape query and geo_shape filter is that they allow us to query using shapes, rather than just points.

For instance, if our user steps out of the central train station in Amsterdam, we could find all landmarks within a 1km radius with a query like this:

GET /attractions/landmark/_search

{

"query": {

"geo_shape": {

"location": { 1

"shape": { 2

"type": "circle", 3

"radius": "1km"

"coordinates": [ 4

4.89994,

52.37815

]

}

}

}

}

}

1

The query looks at geo-shapes in the location field.

2

The shape key indicates that the shape is specified inline in the query.

3

The shape is a circle, with a radius of 1km.

4

This point is situated at the entrance of the central train station in Amsterdam.

By default, the query (or filter—do the same job) looks for indexed shapes that intersect with the query shape. The relation parameter can be set to disjoint to find indexed shapes that don’t intersect with the query shape, or within to find indexed shapes that are completely contained by the query shape.

For instance, we could find all landmarks in the center of Amsterdam with this query:

GET /attractions/landmark/_search

{

"query": {

"geo_shape": {

"location": {

"relation": "within", 1

"shape": {

"type": "polygon",

"coordinates": [[ 2

[4.88330,52.38617],

[4.87463,52.37254],

[4.87875,52.36369],

[4.88939,52.35850],

[4.89840,52.35755],

[4.91909,52.36217],

[4.92656,52.36594],

[4.93368,52.36615],

[4.93342,52.37275],

[4.92690,52.37632],

[4.88330,52.38617]

]]

}

}

}

}

}

1

Match only indexed shapes that are completely within the query shape.

2

This polygon represents the center of Amsterdam.

Querying with Indexed Shapes

With shapes that are often used in queries, it can be more convenient to store them in the index and to refer to them by name in the query. Take our example of central Amsterdam in the previous example. We could store it as a document of type neighborhood.

First, we set up the mapping in the same way as we did for landmark:

PUT /attractions/_mapping/neighborhood

{

"properties": {

"name": {

"type": "string"

},

"location": {

"type": "geo_shape"

}

}

}

Then we can index a shape for central Amsterdam:

PUT /attractions/neighborhood/central_amsterdam

{

"name" : "Central Amsterdam",

"location" : {

"type" : "polygon",

"coordinates" : [[

[4.88330,52.38617],

[4.87463,52.37254],

[4.87875,52.36369],

[4.88939,52.35850],

[4.89840,52.35755],

[4.91909,52.36217],

[4.92656,52.36594],

[4.93368,52.36615],

[4.93342,52.37275],

[4.92690,52.37632],

[4.88330,52.38617]

]]

}

}

After the shape is indexed, we can refer to it by index, type, and id in the query itself:

GET /attractions/landmark/_search

{

"query": {

"geo_shape": {

"location": {

"relation": "within",

"indexed_shape": { 1

"index": "attractions",

"type": "neighborhood",

"id": "central_amsterdam",

"path": "location"

}

}

}

}

}

1

By specifying indexed_shape instead of shape, Elasticsearch knows that it needs to retrieve the query shape from the specified document and path.

There is nothing special about the shape for central Amsterdam. We could equally use our existing shape for Dam Square in queries. This query finds neighborhoods that intersect with Dam Square:

GET /attractions/neighborhood/_search

{

"query": {

"geo_shape": {

"location": {

"indexed_shape": {

"index": "attractions",

"type": "landmark",

"id": "dam_square",

"path": "location"

}

}

}

}

}

Geo-shape Filters and Caching

The geo_shape query and filter perform the same function. The query simply acts as a filter: any matching documents receive a relevance _score of 1. Query results cannot be cached, but filter results can be.

The results are not cached by default. Just as with geo-points, any change in the coordinates in a shape are likely to produce a different set of geohashes, so there is little point in caching filter results. That said, if you filter using the same shapes repeatedly, it can be worth caching the results, by setting _cache to true:

GET /attractions/neighborhood/_search

{

"query": {

"filtered": {

"filter": {

"geo_shape": {

"_cache": true, 1

"location": {

"indexed_shape": {

"index": "attractions",

"type": "landmark",

"id": "dam_square",

"path": "location"

}

}

}

}

}

}

}

1

The results of this geo_shape filter will be cached.