:imagesdir: /assets/resources/exploring-data-sets-with-kibana/ :icons: font :experimental:
In this post, I’d like to explore a sample data set using Kibana.
This requires some data to start with: let’s index some tweets. It’s quite straightforward to achieve that by following explanations found in my good friend David’s http://david.pilato.fr/blog/2015/06/01/indexing-twitter-with-logstash-and-elasticsearch/[blog post^] and wait for some time to fill the index with data.
== Basic metric
Let’s start with something basic, the number of tweets indexed so far.
In Kibana, go to menu:Visualize[Metric], then choose the
image:basic-metric-create.png[Create a basic metric,351,146] image:basic-metric-display.png[Display the number of tweets,320,202]
Another simple visualization is to display the tweets based on their location on a world map.
In Kibana, go to menu:Visualize[Tile map], then choose the
Select Geo Coordinates for the bucket type and keep default values,
Geohashfor Aggregation and
image::geo-map-display.png[Localized map of tweets,637,402,align=center]
== Bucket metric
For this kind of metric, suppose a business requirement is to display the top 5 users. Unfortunately, as some (most?) business requirements go, this is not deterministic enough. It misses both the range and the aggregation period. Let’s agree for range time to be a sliding window over the last day, and the period to be an hour.
In Kibana, go to menu:Visualize[Vertical bar chart], then choose the
- For the Y-Axis, keep
Countfor the Aggregation field
- Choose X-Axis for the buckets type
Date histogramfor the Aggregation field ** Keep the value
@timestampfor the Field field ** Set the Interval field to
- Click on btn:[Add sub-buckets]
- Choose Split bars for the buckets type
Termsfor the Sub Aggregation field **
user.screen.namefor the Field field ** Keep the other fields default value
- Don’t forget to click on the btn:[Apply changes]
- Click on btn:[Save] and name the visualization accordingly e.g. “Top 5 users hourly”.
image:bucket-metric-create.png[Create a bucket metric,234,472] image:bucket-metric-display.png[Display the top 5 users hourly,637,202]
=== Equivalent visualisations
Other visualizations can be used with the exact same configuration: Area chart and Data table.
The output of the Area chart is not as readable, regarding the explored data set, but the Data table offers interesting options.
From a visualization, click on the bottom right arrow icon to display a table view of the data instead of a graphic.
image::tabular-metric.png[Alternative tabular metric display,635,200,align=center]
Visualizations make use of Elasticsearch public API. From the tabular view, the JSON request can also be displayed by clicking on the btn:[Request] button (oh, surprise…). This way, Kibana can be used as a playground to quickly prototype requests before using them in one’s own applications.
image::request-metric.png[Executed API request,635,200,align=center]
=== Changing requirements a bit
The above visualization picks out the 5 top users having the most tweeted during each hour and display them during the last day. That’s the reason why there are more than 5 users displayed. But the above requirement can be interpreted in another way: take the top 5 users over the course of the last day, and break their number of tweets by hour.
To do that, just move the X-Axis bucket below the Split bars bucket. This will change the output accordingly.
image::another-bucket-metric-display.png[Display the top 5 users over the last day,637,203,align=center]
=== Filtering irrelevant data
As can be seen in the above histogram, top users mostly are about recruiting and/or job offers. This is not really what is wanted in the first place. It’s possible to remove this noise by adding a filter: in the Split bars section, click on btn:[Advanced] to display additional parameters and type the desired regex in the Exclude field.
image::filtered-bucket-metric-create.png[Filter out a bucket metric,342,69,align=center]
The new visualization is quite different:
image::filtered-bucket-metric-display.png[Display the top 5 users hourly without any recruitment-related user,637,202,align=center]
== Putting it all together
With the above visualizations available and configured, it’s time to put them together on a dedicated dashboard. Go to menu:Dashboard[Add] to list all available visualizations.
image::add-visualization-dashboard.png[Add visualizations to a dashboard,732,391,align=center]
It’s as simple as clicking on the desired one, laying it out on the board and resetting its size. Rinse and repeat until happy with the result and then click on btn:[Save].
image::configured-dashboard.png[A configured dashboard,910,429,align=center]
Icing on the cake, using the btn:[Rectangle] tool on the map visualization will automatically add a filter that only displays data bound by the rectangle coordinates for all visualizations found on the dashboard.
image::filtered-dashboard.png[A filtered dashboard,910,453,align=center]
That trick is not limited to the map visualization (try playing with other ones) but filtering on location quickly gives insights when exploring data sets.
While this post only brushes off the surface of what Kibana has to offer, there are more visualizations available as well as Timelion, the new powerful (but sadly under-documented) the “time series expression interface”. In all cases, even basic features as shown above already provide plenty of different options to make sense of one’s data sets.
- For the Y-Axis, keep
Posts Tagged ‘big data’