Flexing Data: April 2017

The problem:

Elasticsearch can prove unwieldy when the data being populated has a different schema. While it is possible to auto create indices in Elasticsearch by posting json data, you may run into validation exceptions when incoming data schema changes.

As an example post this message into an undefined index in order to autocreate an index:

curl -XPOST http://localhost/eventindex_uinak/event -d '{"timestamp":"2017-03-22T21:59:34 UTC","host":"uinak" ,"data":"a string"  }'

returns:

{"_index":"eventindex_uinak","_type":"event","_id":"AVr4DDXJ6XT8Rp6bGeEA","_version":1,"created":true}

Lets change the data schema a little by making data a json object:

curl -XPOST http://localhost/eventindex_uinak/event -d '{"timestamp":"2017-03-22T21:59:34 UTC","host":"uinak" ,"data": {"key":"my value"}  }'

returns:

{"error":"RemoteTransportException[[Super-Nova][inet[/x.x.x.x:y]][indices:data/write/index]]; nested: MapperParsingException[failed to parse [data]]; nested: ElasticsearchIllegalArgumentException[unknown property [key]]; ","status":400}

What happened here? Elasticsearch created an index with a schema of the first data payload. And when the next message is posted it fails validation due to the changed data type of "data" element.

Elastic search is describe as 'schema-free' but is it really? It depends at what level: on defining an index it is schema free, but on updating an index or adding adding data to a schema it is not.

Solution:

There are three possible solutions:
1. rename the conflicting type.
2. enforce data types using a schema beforehand i.e as part of a data pipeline.
3. this is not recommended but you may specify the conflicting attribute as a 'not analyzed' string in the model and then stringify all values; for example, use escape characters on json structure characters ( /{, /} etc )

Flexing Data

Thursday, April 6, 2017

Attaining data flexibility while posting data to Elasticsearch

The problem:

Solution: