Flexing Data: November 2017

Background

Traditional methods of information extraction require the quintessential Extraction, Transformation and Loading (ETL) process; followed by complex analytical queries; and finally, visualization.
In this article we look at a streamlined process to get us to the visual aids quicker if not instantaneously.

The problem

Let us suppose data has been dropped on your screen for you to gain business insight or solve a problem. Your first question may be; what is the source of the data and what does each data point represent? How granular is the data? What is the global relationship between each data point?

We propose three steps to answering these questions:
1. randomly sample your data and profile the attributes
2. extract information from the rest of your data based (1).
3. generate a wholistic knowledge of the information by drawing links between information extracted. Repeat (3) to satisfaction.

Consideration

In order to streamline this process let us consider an API integration exercise and some code. We want to integrate a Stream processing API; and indexing engine; a backend database; and the visual aids.

Flexing Data

Tuesday, November 28, 2017

Visual approaches to gaining knowledge from big data using spark, ELK and directed graphs

Background

The problem

Consideration

Spark for extraction and transformation

Profiling data using ELK

Spark for directed graph loading into JanusGraph db.

Gephi for analytics