Tutorial

This document describes how to get started with geocodr.

This tutorial assumes that you have a working geocodr and SolrCloud installation.

For geocodr, make sure that the geocodr command works, otherwise read Installation. For the SolrCloud, we assume that you followed the Getting Started With SolrCloud tutorial and that Apache Solr is running on localhost:8983 and the internal Apache ZooKeeper runs on localhost:2181. Adopt the host names and ports if your installation differs.

Note

We will create all required Apache Solr collections in this tutorial (i.e. it is sufficient to run bin/solr -e cloud -noprompt).

Note

The SolrCloud installation from the Getting Started with SolrCloud tutorial is not meant for production. Please refer to the Apache Solr documentation for production setups.

Example data

The example directory contains a minimal set of Apache Solr schema documents, geocodr mappings and example data. The example dataset uses open data (CC0) from the german city Rostock. Please note that field names are in German.

We will create two collections:

  • boroughs with polygon geometries of statistical boroughs with the city name (gemeinde_name) and borough name (bezeichnung).
  • streets with line geometries of streets with street name (strasse_name), city name (gemeinde_name) and borough name (stat_bezirk_name).

Both collections will also contain a geometry as WKT and a JSON dump of all available fields for retrieval (but not for search).

Both collections will belong to the same address class.

We will use the schema files example/solr/boroughs-schema.xml and example/solr/streets-schema.xml. These are standard Apache Solr XML schemas.

For geocodr, it does not matter how the schema is managed and how the data is imported into Apache Solr. However, you can use the geocodr-zk and geocodr-post tools for a simplified workflow. This workflow requires that your input data is available as CSV and that it is sufficient to make complete re-imports, instead of live updates.

Create config sets

We use geocodr-zk to upload our configurations for both collections. The following command creates the boroughs and streets config sets [1] in Apache ZooKeeper and uploads the solrconfig.xml and the correspondent schema file.

[1]Config sets are described in the Apache Solr documentation. However, the config sets are managed with Apache ZooKeeper for SolrCloud and not as files in $SOLR_HOME/configsets.
geocodr-zk --zk-hosts localhost:2181 --config-dir example/solr/ --push ALL

Warning

You can update existing config sets with the same command. Be aware that Apache Solr will remove your index if you make changes to your schema as soon as you restart Apache Solr or reload the Apache Solr collection. You should re-import the data immediately with geocodr-post to be safe.

Import data

We use geocodr-post to upload the example data into the appropriate collection.

geocodr-post --url http://localhost:8983/solr --csv example/csv/boroughs.csv --collection boroughs
geocodr-post --url http://localhost:8983/solr --csv example/csv/streets.csv --collection streets

Please note that the first call imports the boroughs into the boroughs-1 collection. If the data is successfully imported, then it will create an alias boroughs pointing to boroughs-1. A second call will import the boroughs into the boroughs-2 collection and it will update the alias atomically to point to the new collection. Further calls will alternate between the -1 and -2 suffix. This allows you to re-import the data in production without any downtime.

First queries

Geocodr comes with a command line tool for testing and debugging.

Query for the borough Stadtmitte:

% geocodr --mapping example/conf/geocodr_mapping.py 'stadtmitte'
Rostock, Hanse- und Universitätsstadt, Stadtmitte I
Rostock, Hanse- und Universitätsstadt, Stadtmitte II
Rostock, Hanse- und Universitätsstadt, Stadtmitte III
Rostock, Hanse- und Universitätsstadt, Stadtmitte IV
Rostock, Hanse- und Universitätsstadt, Stadtmitte V
Rostock, Hanse- und Universitätsstadt, Stadtmitte I, Aalstecherstr.
Rostock, Hanse- und Universitätsstadt, Stadtmitte I, Am Kanonsberg
Rostock, Hanse- und Universitätsstadt, Stadtmitte I, An der Oberkante
...

Note

In our example dataset larger statistical boroughs are splitted and enumerated with roman numbers. The dataset contains only street starting with ‘A’.

To output the result as GeoJSON call:

% geocodr --mapping example/conf/geocodr_mapping.py 'stadtmitte' --geojson
{
"features": [
   {
      "geometry": {
      "coordinates": [
         [
            [
            12.142946,
            54.092925
            ],
...

Fuzzy search:

% geocodr --mapping example/conf/geocodr_mapping.py 'schulzestrasse'
Rostock, Hanse- und Universitätsstadt, Reutershagen IV, Alfred-Schulze-Str.
Rostock, Hanse- und Universitätsstadt, Brinckmansdorf III, Albert-Schulz-Str.

Multiple terms:

% geocodr --mapping example/conf/geocodr_mapping.py 'schulzestrasse reuter'
Rostock, Hanse- und Universitätsstadt, Reutershagen IV, Alfred-Schulze-Str.

Debugging

The --debug option shows the score and the ID of each match. You can pass an ID to the --explain option to see how the score was calculated by Solr.:

% geocodr --mapping example/conf/geocodr_mapping.py 'schulze hinri' --debug --explain 3034daaa-3ef5-11e5-9ffb-0050569b7e95
3034daaa-3ef5-11e5-9ffb-0050569b7e95 Rostock, Hanse- und Universitätsstadt, Hinrichshagen, An der alten Baumschule

7.0742044 = sum of:
2.3968432 = max of:
   2.3968432 = sum of:
      0.5397643 = weight(strasse_name_ngram:sch in 45) [SchemaSimilarity], result of:
      0.5397643 = score(doc=45,freq=1.0 = termFreq=1.0
), product of:
         0.2 = boost
         1.9924302 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:
            10.0 = docFreq
            76.0 = docCount
         1.3545375 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
            1.0 = termFreq=1.0
            1.2 = parameter k1
            0.75 = parameter b
            8.328947 = avgFieldLength
            3.0 = fieldLength
      0.92853945 = weight(strasse_name_ngram:chu in 45) [SchemaSimilarity], result of:
      0.92853945 = score(doc=45,freq=1.0 = termFreq=1.0
), product of:
...

HTTP API

geocodr-api starts a web server. Refer to Query API for full documentation of the API.

You can start the server with:

geocodr-api --mapping example/conf/geocodr_mapping.py

You can use your browser or a tool like curl to make queries to the API:

curl "http://127.0.0.1:5000/query?type=search&class=address&query=schulzestr"

geocodr-api uses Waitress, a production-quality pure-Python web server. However, it is still recommended to put it behind an HTTP Proxy (like Nginx or Apache mod_proxy) for features like HTTPS.

For development of Geocodr and configuring your Geocodr mapping, you can use the geocodr-api --develop option. This will automatically reload Geocodr when the application or your mapping file was changed.

API keys

Geocodr allows to restrict API requests to calls with a valid API key. See API documentation.

Checking for API keys can be enabled with the --api-keys option. The option takes a CSV file with all valid API keys.

The CSV file requires the fields key and domains. domains is semicolon separated list of one or more domains. Only requests originating from these domains are permitted. This is done by checking the HTTP referer header. Sub domains of the configured domains are permitted.

Example CSV file:

key,domains
key1,example.org
multikey,example.org;example.com

Note

The referer header can be forged, so this only limits where the API can be used in public, but it does not prevent automated scripts, etc..

User/Password Authentication

Geocodr can pass client provided usernames and passwords to Solr if you enable this with the --enable-solr-basic-auth option.

Please refer to the Solr documentation on how to enable Basic Authentication. Basic Authentication can be used on combination with the Rule-Based Authorization Plugin for fine grained access control to specific collections.

For convenience, you can use the geocodr-zk to pull and push the security.json file.

To pull the security.json file to example/solr/:

geocodr-zk --zk-hosts localhost:2181 --config-dir example/solr/ --pull --security

To push the security.json file from example/solr/:

geocodr-zk --zk-hosts localhost:2181 --config-dir example/solr/ --push --security

Adding users can be accomplished by editing and pushing the security.json file, or by using the Solr REST-API:

curl http://localhost:8983/solr/admin/authentication -H 'Content-type:application/json' \
   -d '{"set-user": {"tom" : "TomIsCool",
                     "harry":"HarrysSecret"}}'

Unauthenticated Requests

To permit requests without user/password (e.g. in combination with API key), you can either use the "blockUnknown": false option of the solr.BasicAuthPlugin. Or, you can set a default username and password in the --solr-url (e.g. --solr-url http://user:passwd@localhost:8983/solr). Username and passwords provided via the API will override these default values.