Mayur Gupta Growth @ Spotify. Dreamer + Doer + Connecting Dots.. Angel Investor. Blogger. Cricket, Tennis & Nichiren Buddhism. My 3 girls. All views re my own.

BIG DATA – More Than Just BIG!!!

2 min read

If all the hype & deluge of headlines, articles & advanced analytics and reporting material is anything to go by, BIG DATA is the next big thing. At times you may even wonder what have we been doing in the name of analytics & insight generation thus far. So how much of the hype is worth or is it a bunch of data & BI companies that like to use the 2 magic words – BIG DATA & “Hadoop”, is there a solution ready for an enterprise level need.

So what is BIG DATA, one more time?
To begin with, BIG DATA is NOT just “BIG”. It is a misnomer that implies it is only about the size but put simply, it is big, fast & diverse data that can come from varying sources and channels (offline & online) but cannot be processed or analyzed using traditional processes or databases or even data warehouses. It is a methodology and approach (not just a technology solution) to collect, store, analyze and convert the volume, velocity and variety of data into business critical and actionable insights for organizations to get ahead of the competition. A quick view of the key characteristics, the 3 Vs:

BIG DATA
BIG DATA

Volume – A shift from managing terabytes to petabytes, exabytes & zettabytes of data. Facebook & Twitter alone generate approx 20 terabytes of data each day.
Variety – Complex combination of raw, structured, semi-structured and unstructured data from web pages, log files, indexes, social media, emails, documents, sensor data from active & passive systems generated due to the explosion of sensors, smart devices, communication & social collaboration.
Velocity – The speed at which the data can flow and provide near real time analysis & actionable insights. A capability to parse the data in motion and not just the data at rest.

What is it trying to solve?
For me, it is not so much about solving a problem but creating an opportunity that has been around for a while but never been tapped. It is an attempt to provide businesses with insights, hidden behaviors and patterns that they didn’t know that they didn’t know. If executed successfully, organizations could benefit by:

• Applying predictive models and scoring against fast-moving data and complex event streams for smarter decisions in real time
• Using tips for turning massive amounts of data from online customer behavior and social media activity into valuable and timely business insight
• Becoming a proactive organization by using big data analytics to speed recognition and resolution of problems in customer experiences, supply chains, and business processes
• Addressing new challenges posed by streaming data, social media data, content, and events and so on

How is it (BD) different from a conventional Data Warehouse (DW)?
There are fundamental differences like:

• Variety – DW more ideal for analyzing structured data, BD solves the “variety” challenge
• Processing – Data in DW is usually cleansed, enriched, modeled before being stored, a higher value per byte whereas data in BD does not go through the same quality controls & checks because of the obvious cost. The data is typically stored in its native format.
• Shelf Life – Data in DW can have a much longer shelf life as compared to BD
In a well laid out enterprise solution, a BD solution could push its “reduced” data from a “MapReduce” program permanently into a DW. In other words, BIG DATA will never replace a DW but will compliment it.

How Does It Work – Technology?
Could be summarized as an implementation of Hadoop, Apache’s java based open source computing environment built on top of a distributed clustered file system designed for large-scale data operations. It is based on a “MapReduce” programming paradigm that breaks a massive job into sub-tasks (mappers and reducers) to manipulate data stored across a cluster with hundreds or thousands of servers for massive parallelism.
Hadoop provides a base platform with Java APIs, requires applications to be built on top using development languages like Pig, Hive and Jaql that can abstract some of the internal complexities. An ideal solution would extend the Hadoop platform & framework with enterprise grade security, governance, availability, scalability along with a data visualization tool for easy analysis and insight generation.

It would be too pre-mature to either write it off or treat it as a universal solution to all measurement, analytics & insight but it does possess enough impetus that deserves attention, investment & a well planned and architected execution. It is definitely a step in the right direction for any business, an initiative that is here to stay.

Mayur Gupta
Mayur Gupta Growth @ Spotify. Dreamer + Doer + Connecting Dots.. Angel Investor. Blogger. Cricket, Tennis & Nichiren Buddhism. My 3 girls. All views re my own.

One Reply to “BIG DATA – More Than Just BIG!!!”

  1. Process as a class of Master Data is a concept so starnge to most people that it can raise eyebrows or invoke zero reaction. Back in 2005, at the inaugural Gartner BPM summit in Washington DC, I used the concept in a workshop that I was running and it was a useful linchpin for the workshop was running regarding our Process approach at a major US tech company I was working for then. And it’s a solid idea. Process is a class of master data. Everybody knows the chaos that that can ensue in Order to Cash if you don’t have such fundamentals in place as Customer, Product and Vendor master data. You have one single data set, you limit who can maintain it but let everyone use it. Now apply that thinking to process (and, by the way, I mean ALL process not just the 20% that is automated but the 80% that is people doing things manually) and you can start to bring effeciences that will benefit your customers and stockholders. If you don’t have master data, you audit reports will likely be uncomfortable reading. Isn’t that what marks out Master Data sets? If it is then there’s no question that Process is another class of master data.

Leave a Reply

Your email address will not be published. Required fields are marked *

*