When working with time series data, the amount of data is overwhelming. You usually are dealing with millions of rows of data. To consume all this data, we'll need a database to help us sort, filter, and group it so that it's meaningful. Once we have the data in the correct format, we'll need to graph it, so it's easily digestible.

Top JS Graphing Libraries for Time Series Data

There are a lot of graphing libraries out there. Pick one based on your specific requirements.

Here are a few libraries that are highly rated for different purposes:

C3

After looking at a few of these and trying to plot time series data, I came across C3. It is compelling and customizable. We use it for data visualization within Hyprcubd's console.

The Data

Now that we have a graphing library picked out, we need some data. Any data here will do with only one requirement - time! I've chosen the NYC Taxi and Limousine Commission (TLC) trip data since it's free and easily accessible. You can find more about it here: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

Using the High Volume For Hire Vehicle data for January 2020, there are over 20 million rows. This amount of data is pretty standard when dealing with time series data sets. We are going to take the first 1M rows for use in our test database.

Here is our data model:  

column type
license string
dispatching_loc string
pickup_datetime datetime
dropoff_datetime datetime
pickup_loc int
dropoff_loc int
sr_flag int

To get some insights out of this data, we need to put this into a database. A database will give us lots of options for grouping, filtering, sorting, and aggregating. Let's put this into MySQL for our example.

create table trip (
    id int not null primary key auto_increment,
    license varchar(10),
    dispatch_loc varchar(10),
    pickup_time datetime,
    dropoff_time datetime,
    pickup_loc int,
    dropoff_loc int,
    sr_flag int,
    index (pickup_time)
);

Loading the data is beyond the scope of this article. MySQL has a useful reference here on loading data from a CSV. Be warned - loading the full 20M rows will take a significant amount of time. Using the smallest MySQL instance size on DigitalOcean, I was able to load about 1M rows in about 10 minutes.

What answers do we want to figure out? Let's start with some basic questions.

  • How many trips occurred in January?
  • How many trips occur each day?
  • What are the peak trip times?
  • What is the market share of trips by service? (Uber, Lyft, etc.)

Since we aren't working with the full data set, let's start with the second question:

How many trips per day?

select date(pickup_time), count(1) from trip group by date(pickup_time);

This query truncates the time to the current date and counts for that day.  This is relatively straightforward. Let's go to the next question:

What are the peak hours for the first day?

select 
 from_unixtime(t.t*3600) as hour,
 t.c as trips
from
( select 
    round(unix_timestamp(pickup_time)/3600) as t, 
    count(1) as c
from trip
where day(pickup_time) = 1 
group by round(unix_timestamp(pickup_time)/3600)
) as t;

MySQL does not have an easy way (that I know of) to extract out a specific time interval and group it by that interval. If anyone knows of a better way, please leave a comment below, and I'll update it!

Here is the data returned:

hour trips
2020-01-01 00:00:00 24397
2020-01-01 01:00:00 65923
2020-01-01 02:00:00 66546
2020-01-01 03:00:00 62902
2020-01-01 04:00:00 49413
2020-01-01 05:00:00 31640
2020-01-01 06:00:00 20167
2020-01-01 07:00:00 15311
2020-01-01 08:00:00 12778
2020-01-01 09:00:00 12275
2020-01-01 10:00:00 14259
2020-01-01 11:00:00 16844
2020-01-01 12:00:00 19387
2020-01-01 13:00:00 22699
2020-01-01 14:00:00 25149
2020-01-01 15:00:00 26082
2020-01-01 16:00:00 26992
2020-01-01 17:00:00 28300
2020-01-01 18:00:00 29036
2020-01-01 19:00:00 28884
2020-01-01 20:00:00 27006
2020-01-01 21:00:00 25082
2020-01-01 22:00:00 23932
2020-01-01 23:00:00 20809
2020-01-02 00:00:00 9034

Let's graph it:

This looks pretty good! Since we aren't working with the full dataset, we can't draw any meaningful conclusions. With that said, the time series graph is looking pretty.

C3 has excellent features for time series data, such as the Flow API. The API allows you to stream data into the chart as it comes to your UI in real-time. This is very useful with time series databases.

What is the market share between the different services?

Although this is not a time series graph, it is useful to see the approximate rankings of each service. Uber receives the lion's share of trips in NYC.

Time Series Database - Hyprcubd

How does a time series database differ when handling this data, and would it make your life easier? Absolutely!

First, the schema would have to change slightly:

create table trip (
    time time,
    license string,
    dispatch_loc string,    
    dropoff_time time,
    pickup_loc int,
    dropoff_loc int,
    sr_flag int    
);

Hyprcubd supports basic data types. One improvement is from varchar to string. Since we are a column-oriented database, a string can be any length, and it's variable length encoded to maximize space.

The first time column is required. We are using pickup_datetime as time in this example.

How many trips per day?  

select time, count(time) from trip timeseries 1d;

This query should look pretty familiar to most developers who know SQL. The only difference is the time series expression. This expression is unique to Hyprcubd. It tells the query engine to group the results by one day. This version is a slight improvement over the MySQL query.

What are the peak hours on the first day?  

select time, count(time) from trip timeseries 1h; 

Wow! That's much easier. This expression is configurable to other intervals and amounts. You can read more about the time series expression here.

Conclusion

Are you interested in learning more about Hyprcubd? Contact us for a demo!