Over the past few years there has been an explosion of data sources and data types that need to be introduced into a company’s analytics. One can argue that sensor data, and in particular the time series aspect of sensor data, is probably the most rich and useful of these. The big question is how can a company effectively leverage the time series data that is being generated into their other analytics in order to drive profitable actions?
Again, it is about integration
As has long been the case, data needs to be integrated to derive the most value. Companies can certainly capture time series data into a raw data file and use programmers to run sophisticated algorithms and get some insights, but then what? If I see that an engine is overheating, what should I do about it? What parts are going to be effected? Is this an anomaly or part of a larger pattern in my fleet? Is this happening everywhere or only in certain parts of the country? These questions will need data, and if that data is not readily and easily integrated, then the answers may be too hard to find. Delayed action can be costly. Time series data needs to be thought of like any other data in your ecosystem — data which needs to be managed, accessible and performant for end users to capitalize on the opportunities
Welcome to time series in Teradata
Recognizing that the internet of things world was going to present just such challenges, Teradata began working to incorporate time series data into the Teradata Database, and now, with Release 16.20 (available in Q4 2017) it is ready for customer usage. Time series data can now be loaded and managed inside the Teradata Database like any other data type. There have been significant optimizations to allow for easy integration with other tables, faster access to targeted time periods and a host of time-aware aggregate functions to facilitate a wide array of analytics.
What is time series data?
Simply put, time series data is generated a multitude of ways, but the end result is a time stamp, some identifier of what created the data and the observation or measurement. An example would be a car sensor that tracks oil temperature every second. In this case, we get one row for every second that indicated the time, sensor ID and a temperature reading. Every minute we’d get 60 rows of sensor data. There could be other data elements, like the vehicle identification number of the car that is sending the data, or even a multitude of sensor readings in JSON or CSV format. The key is that this data has the time stamp.
A second aspect of time series data is that can be regular, a reading every second without any missing data, or irregular, meaning the frequency is not consistent and there can be missing measurements. For example, instead of getting a temperature reading every second, we only get a reading when the temperature exceeds a certain threshold.
The last aspect to discuss is that time series can be bounded or not. Perhaps we have a sensor that continually sends a message from now until the end of life for the device. Other events may be bounded with a defined period of time (i.e., I ran a test for 20 minutes) or a logical overlay of boundaries (i.e., a car trip bounded by start and stop times).
You can quickly see that time series data can become complicated to manage and hard to align in analytics if you are simply going to just store the data in files rather than a mature and optimized database.
Time series in the Teradata Database
Teradata has integrated time series data by introducing a new type of storage structure, the Primary Time Index (PTI). This is very similar to our Primary Index and Partitioned Primary Index concepts. With PTI, times series data can be bucketed such that data from the same sensor/device is kept together in time intervals for faster analytics. The PTI can either be just a timestamp or include other attributes like the sensor ID.
Once the data is inside the database, queries get all the scalability, manageability and optimization capabilities. Time series is just another data construct to be leveraged.
For analytics, Teradata has optimized the aggregate functions that would work on time constraints. This means you can easily take two different time series tables and align them against each other. For example, one table may be capturing sensor data every two minutes and the other every five minutes. Using the aggregate functions, you can run analytics that get 15-minute averages and correlate the two sets of data.
But time is just the beginning….
Additional benefits come from integrating time series data with geospatial data to understand the changing nature of data over time and space. Then factor in your reference data, which may have temporal characteristics, so now you can understand which vendor supplied what part, at what time, and how that part was used over time in a vehicle. This can provide insights about normal versus abnormal usage or wear and tear. These insights lead to targeted and timely actions.
Operationalizing these new insights across departments is simplified, as the processes still gets the scale, optimizations and the workload management that has been the core of the Teradata Database for decades.
To learn more….
I have only begun to scratch the surface of this new and exciting capability. To learn more about Teradata Analytics Platform, I invite you to visit our website.
Gotta go, lost track of time, and I’m late for my next meeting …
Starting with Teradata in 1987, Rob Armstrong has contributed in virtually every aspect of the data warehouse and analytical processing arenas. Rob’s work in the computer industry has been dedicated to data-driven business improvement and more effective business decisions and execution. Roles have encompassed the design, justification, implementation and evolution of enterprise data warehouses.
In his current role, Rob continues the Teradata tradition of integrating data and enabling end-user access for true self-driven analysis and data-driven actions. Increasingly, he incorporates the world of non-traditional “big data” into the analytical process. He also has expanded the technology environment beyond the on-premises data center to include the world of public and private clouds to create a total analytic ecosystem.
Rob earned a B.A. degree in Management Science with an emphasis in mathematics and relational theory at the University of California, San Diego. He resides and works from San Diego.
View all posts by Rob Armstrong