:imagesdir: /assets/resources/exploring-data-sets-with-kibana/ :icons: font :experimental:

In this post, I’d like to explore a sample data set using Kibana.

This requires some data to start with: let’s index some tweets. It’s quite straightforward to achieve that by following explanations found in my good friend David’s http://david.pilato.fr/blog/2015/06/01/indexing-twitter-with-logstash-and-elasticsearch/[blog post^] and wait for some time to fill the index with data.

== Basic metric

Let’s start with something basic, the number of tweets indexed so far.

In Kibana, go to menu:Visualize[Metric], then choose the twitter index. For the Aggregation field, choose “Count”; then click on btn:[Save] and name the visualization accordingly e.g. “Number of tweets”.

image:basic-metric-create.png[Create a basic metric,351,146] image:basic-metric-display.png[Display the number of tweets,320,202]

== Geo-map

Another simple visualization is to display the tweets based on their location on a world map.

In Kibana, go to menu:Visualize[Tile map], then choose the twitter index.

Select Geo Coordinates for the bucket type and keep default values,Geohash for Aggregation and coordinates.coordinates for Field.

image::geo-map-display.png[Localized map of tweets,637,402,align=center]

== Bucket metric

For this kind of metric, suppose a business requirement is to display the top 5 users. Unfortunately, as some (most?) business requirements go, this is not deterministic enough. It misses both the range and the aggregation period. Let’s agree for range time to be a sliding window over the last day, and the period to be an hour.

In Kibana, go to menu:Visualize[Vertical bar chart], then choose the twitter index. Then:

  • For the Y-Axis, keep Count for the Aggregation field
  • Choose X-Axis for the buckets type ** Select Date histogram for the Aggregation field ** Keep the value @timestamp for the Field field ** Set the Interval field to Hourly
  • Click on btn:[Add sub-buckets]
  • Choose Split bars for the buckets type ** Select Terms for the Sub Aggregation field ** user.screen.name for the Field field ** Keep the other fields default value
  • Don’t forget to click on the btn:[Apply changes]
  • Click on btn:[Save] and name the visualization accordingly e.g. “Top 5 users hourly”.

image:bucket-metric-create.png[Create a bucket metric,234,472] image:bucket-metric-display.png[Display the top 5 users hourly,637,202]

=== Equivalent visualisations

Other visualizations can be used with the exact same configuration: Area chart and Data table.

The output of the Area chart is not as readable, regarding the explored data set, but the Data table offers interesting options.

From a visualization, click on the bottom right arrow icon to display a table view of the data instead of a graphic.

image::tabular-metric.png[Alternative tabular metric display,635,200,align=center]

Visualizations make use of Elasticsearch public API. From the tabular view, the JSON request can also be displayed by clicking on the btn:[Request] button (oh, surprise…). This way, Kibana can be used as a playground to quickly prototype requests before using them in one’s own applications.

image::request-metric.png[Executed API request,635,200,align=center]

=== Changing requirements a bit

The above visualization picks out the 5 top users having the most tweeted during each hour and display them during the last day. That’s the reason why there are more than 5 users displayed. But the above requirement can be interpreted in another way: take the top 5 users over the course of the last day, and break their number of tweets by hour.

To do that, just move the X-Axis bucket below the Split bars bucket. This will change the output accordingly.

image::another-bucket-metric-display.png[Display the top 5 users over the last day,637,203,align=center]

=== Filtering irrelevant data

As can be seen in the above histogram, top users mostly are about recruiting and/or job offers. This is not really what is wanted in the first place. It’s possible to remove this noise by adding a filter: in the Split bars section, click on btn:[Advanced] to display additional parameters and type the desired regex in the Exclude field.

image::filtered-bucket-metric-create.png[Filter out a bucket metric,342,69,align=center]

The new visualization is quite different:

image::filtered-bucket-metric-display.png[Display the top 5 users hourly without any recruitment-related user,637,202,align=center]

== Putting it all together

With the above visualizations available and configured, it’s time to put them together on a dedicated dashboard. Go to menu:Dashboard[Add] to list all available visualizations.

image::add-visualization-dashboard.png[Add visualizations to a dashboard,732,391,align=center]

It’s as simple as clicking on the desired one, laying it out on the board and resetting its size. Rinse and repeat until happy with the result and then click on btn:[Save].

image::configured-dashboard.png[A configured dashboard,910,429,align=center]

Icing on the cake, using the btn:[Rectangle] tool on the map visualization will automatically add a filter that only displays data bound by the rectangle coordinates for all visualizations found on the dashboard.

image::filtered-dashboard.png[A filtered dashboard,910,453,align=center]

That trick is not limited to the map visualization (try playing with other ones) but filtering on location quickly gives insights when exploring data sets.

== Conclusion

While this post only brushes off the surface of what Kibana has to offer, there are more visualizations available as well as Timelion, the new powerful (but sadly under-documented) the “time series expression interface”. In all cases, even basic features as shown above already provide plenty of different options to make sense of one’s data sets.