Predictive analytics gold rush due
We are headed towards a more connected, more instrumented and more data driven world. This fact is underscored once again in Cisco’s latest Visual Networking Index. The statistics from this report are truly mind boggling.
According to Cisco, by 2016, 130 Exabytes will rip through the internet. The number of mobile devices will exceed the human population of seven billion this year. By 2016 the number of connected devices will touch almost ten billion.
The devices that are connected to the net range from mobiles, laptops, tablets, sensors and the millions of devices based on the “internet of things”. All these devices will constantly spew data on the internet and business and strategic decisions will be made by determining patterns, trends and outliers among mountains of data.
In this future of swirling data, predictive analytics will be a key discipline and experts in this domain will be much sought after. Predictive analytics uses statistical methods to mine information and patterns in structured, unstructured and streams of data. The data can be anything from click streams, browsing patterns, tweets, sensor data etc. It can be static or it could be dynamic. Predictive analytics will have to identify trends from data streams from mobile call records, retail store purchasing patterns, social network status messages etc.
Analytics and predictive analytics will be applied across many domains from banking, insurance, retail, telecom, energy. In fact predictive analytics will be the new language of the future akin to what C was a couple of decades ago. C language was used in all sorts of applications spanning the whole gamut from finance to telecom.
While analytics can mine data for patterns, trends and outliers, predictive analytics can model the behavior of the system under study and come up with future trends and outcomes.
In this context it is worthwhile to mention The R Language. R language is used for statistical programming and graphics. Wikipedia defines R Language as a language that “provides a wide variety of statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and others”.
Predictive analytics is already being used in traffic management in identifying and preventing traffic gridlocks. Applications have also been identified for energy grids, for water management, besides determining user sentiment by mining data from social networks etc.
One very ambitious undertaking is “the Data-Scope Project” that is based on the belief that the universe is made of information and there is a need for a “new eye” to look at this data. The Data-Scope project is described as “a new scientific instrument, capable of ‘observing’ immense volumes of data from various scientific domains such as astronomy, fluid mechanics, and bioinformatics.
The system will have over 6-petabytes of storage, about 500-GB per sec aggregate sequential IO, about 20-M IOPS, and about 130-TFlops. The Data-Scope is not a traditional multi-user computing cluster, but a new kind of instrument, that enables people to do science with datasets ranging between 100-TB and 1000-TB. The Data-scope project is based on the premise that new discoveries will come from analysis of large amounts of data. Analytics is all about analyzing large datasets and predictive analytics takes it one step further in being able to make intelligent predictions based on available data.
Predictive analytics does open up a whole new universe of possibilities and the applications are endless. Predictive analytics will be the key tool that will be used in our data intensive future.
Disclaimer: This article represents the author's viewpoint only and doesn't necessarily represent IBM's positions, strategies or opinions
Tinniam V Ganesh is an Infrastructure Architect at IBM India, Global Technology Services. You can write to him at [email protected] and read his blog http://gigadom.wordpress.com