Elasticsearch: The Definitive Guide (2015)
Part V. Geolocation
Chapter 37. Geohashes
Geohashes are a way of encoding lat/lon points as strings. The original intention was to have a URL-friendly way of specifying geolocations, but geohashes have turned out to be a useful way of indexing geo-points and geo-shapes in databases.
Geohashes divide the world into a grid of 32 cells—4 rows and 8 columns—each represented by a letter or number. The g cell covers half of Greenland, all of Iceland, and most of Great Britian. Each cell can be further divided into another 32 cells, which can be divided into another 32 cells, and so on. The gc cell covers Ireland and England, gcp covers most of London and part of Southern England, and gcpuuz94k is the entrance to Buckingham Palace, accurate to about 5 meters.
In other words, the longer the geohash string, the more accurate it is. If two geohashes share a prefix— and gcpuuz—then it implies that they are near each other. The longer the shared prefix, the closer they are.
That said, two locations that are right next to each other may have completely different geohashes. For instance, the Millenium Dome in London has geohash u10hbp, because it falls into the u cell, the next top-level cell to the east of the g cell.
Geo-points can index their associated geohashes automatically, but more important, they can also index all geohash prefixes. Indexing the location of the entrance to Buckingham Palace—latitude 51.501568 and longitude -0.141257—would index all of the geohashes listed in the following table, along with the approximate dimensions of each geohash cell:
Geohash |
Level |
Dimensions |
g |
1 |
~ 5,004km x 5,004km |
gc |
2 |
~ 1,251km x 625km |
gcp |
3 |
~ 156km x 156km |
gcpu |
4 |
~ 39km x 19.5km |
gcpuu |
5 |
~ 4.9km x 4.9km |
gcpuuz |
6 |
~ 1.2km x 0.61km |
gcpuuz9 |
7 |
~ 152.8m x 152.8m |
gcpuuz94 |
8 |
~ 38.2m x 19.1m |
gcpuuz94k |
9 |
~ 4.78m x 4.78m |
gcpuuz94kk |
10 |
~ 1.19m x 0.60m |
gcpuuz94kkp |
11 |
~ 14.9cm x 14.9cm |
gcpuuz94kkp5 |
12 |
~ 3.7cm x 1.8cm |
The geohash_cell filter can use these geohash prefixes to find locations near a specified lat/lon point.
Mapping Geohashes
The first step is to decide just how much precision you need. Although you could index all geo-points with the default full 12 levels of precision, do you really need to be accurate to within a few centimeters? You can save yourself a lot of space in the index by reducing your precision requirements to something more realistic, such as 1km:
PUT /attractions
{
"mappings": {
"restaurant": {
"properties": {
"name": {
"type": "string"
},
"location": {
"type": "geo_point",
"geohash_prefix": true,
"geohash_precision": "1km"
}
}
}
}
}
Setting geohash_prefix to true tells Elasticsearch to index all geohash prefixes, up to the specified precision.
The precision can be specified as an absolute number, representing the length of the geohash, or as a distance. A precision of 1km corresponds to a geohash of length 7.
With this mapping in place, geohash prefixes of lengths 1 to 7 will be indexed, providing geohashes accurate to about 150 meters.
geohash_cell Filter
The geohash_cell filter simply translates a lat/lon location into a geohash with the specified precision and finds all locations that contain that geohash—a very efficient filter indeed.
GET /attractions/restaurant/_search
{
"query": {
"filtered": {
"filter": {
"geohash_cell": {
"location": {
"lat": 40.718,
"lon": -73.983
},
"precision": "2km"
}
}
}
}
}
The precision cannot be more precise than that specified in the geohash_precision mapping.
This filter translates the lat/lon point into a geohash of the appropriate length—in this example dr5rsk—and looks for all locations that contain that exact term.
However, the filter as written in the preceding example may not return all restaurants within 5km of the specified point. Remember that a geohash is just a rectangle, and the point may fall anywhere within that rectangle. If the point happens to fall near the edge of a geohash cell, the filter may well exclude any restaurants in the adjacent cell.
To fix that, we can tell the filter to include the neigboring cells, by setting neighbors to true:
GET /attractions/restaurant/_search
{
"query": {
"filtered": {
"filter": {
"geohash_cell": {
"location": {
"lat": 40.718,
"lon": -73.983
},
"neighbors": true,
"precision": "2km"
}
}
}
}
}
This filter will look for the resolved geohash and all surrounding geohashes.
Clearly, looking for a geohash with precision 2km plus all the neighboring cells results in quite a large search area. This filter is not built for accuracy, but it is very efficient and can be used as a prefiltering step before applying a more accurate geo-filter.
TIP
Specifying the precision as a distance can be misleading. A precision of 2km is converted to a geohash of length 6, which actually has dimensions of about 1.2km x 0.6km. You may find it more understandable to specify an actual length such as 5 or 6.
The other advantage that this filter has over a geo_bounding_box filter is that it supports multiple locations per field. The lat_lon option that we discussed in “Optimizing Bounding Boxes” is efficient, but only when there is a single lat/lon point per field.