Data Takes Center Stage
Nov 25, 2011
Both you and I leave a big trail of data as we consume information, conduct trade, or interact with our friends on the internet.
Take the simplest example of ripping a newly purchased CD into your iTunes repository. We all know that iTunes connects to a database called gracenote. Gracenote is a central server which provides more information about the CD being ripped. Each CD has a unique signature which could be a combination of number of tracks, length of each track and the digital encoding of the leader part of the individual music tracks. Using this data from the CD, gracenote serves the title of the CD, the track name, artist names, artwork and other metadata that is relevant. Makes music ripping a piece of cake.
Now that the whole world goes to gracenote for music metadata because popular applications like iTunes use them, there are several thousand hits to the server by the hour and millions of hits per year. The data that gracenote serves is central to the music on the disk. However, there is so much surrounding data (which is the user trail) that gracenote servers can capture and put to good use. The surrounding data in this case is the internet address of the ripping computer, its location and time. As millions around the world buy disks (diminishing trend) and rip them, gracenote can spot trends about which albums are being purchased more and at what locations. Great market intelligence available at near real time! This will be music to the production houses, artists and distribution companies. Analysis of surrounding data can also help spot piracy as it unfolds.
A URL shortening service like bit.ly is another great example. They can use surrounding data (the browser trails that pass through bit.ly servers) to spot trending topics. Additionally they will also have the ability to spot relationships between two or more isolated events happening far apart in the world. They can be far more assertive with the cause effect relationship assessment which is based on actual user trails. It is said that Google was able to spot and report the spread of Swine Flu epidemic ahead of the official agencies that have established mechanisms for tracking. Google did this by just tracking searches on Swine Flu related topics.
Enterprises have realized the potential and importance of this surrounding data. Amazon correlates buying behavior, spots trends and patterns and makes intelligent recommendations which fuel more buying. Applications that derive value out of data are called data centric applications. Several of them are mushrooming around the internet. Data centric applications as opposed to compute centric applications are taking center stage. Enterprises that don’t understand importance of data have started fearing those that already do. Data Scientist is the future killer job. Statistics is the next killer subject.