CIO

Why big data is a big deal

But 38 per cent of organisations still don’t understand what it is

Big Data is a relative term. We have been managing data growth since back in the 1880s when American statistician, Herman Hollerith, developed a mechanical tabulator based on punched cards to tabulate statistics from millions of pieces of data.

Hollerith’s tabulator was used in the United States Census in 1890, resulting in it being completed months ahead of schedule and significantly under budget. Hollerith went on to become the founder of the company that would later become IBM.

From those early days of data collection and analysis, there has been an increasing appetite by organisations and individuals to collect and analyse data for a variety of purposes such as operational efficiency, sales and marketing and forecasting.

As organisations found new uses for data, and began to ask more complex questions of it, technology also evolved to meet the demand in being able to manage it, and this led to a symbiotic relationship between information and technology.

Data thus far had been a measurable quantum and was organised in a structured manner that was easily identifiable and retrievable. Technology evolved at a commensurate pace to ensure the equation was balanced and stayed relatively the same.

However, this relied on all relevant data to be managed centrally, where it could be neatly indexed, filed in rows and columns in multiple layers, and then called upon and manipulated to give any number of results based on the questions asked.

This is the typical modus operandi of a relational database management system (RDBMS), which will store an organisation’s operational data to be used to derive insights at various stages of the information lifecycle.

Regardless of where, when and how information was collected, it was rendered to the database structure to comply with standard query language. This standard of information management gave rise to the database administrator (DBA), who became somewhat a demigod in the IT department as he managed the loads and bottlenecks in the flow of information to an organisation’s mission-critical applications.

This also brought about bespoke technologies that automated many of the DBA’s functions and allowed him to concentrate on more important tasks in managing an ever growing data warehouse.

As queries became more complex and multi-dimensional, business intelligence tools came to the forefront to give organisations insights on the data they collected and stored to help manage their business operations through meaningful insights.

Thus far the DBA existed in relative comfort, orchestrating the flow of information in and out of the data warehouse with the help of many management tools that kept the organisation running at an optimum level. Then something unexpected happened. To understand this, we need to look quickly at what IDC classifies as the third platform of technology.

Burning platforms

The first platform saw users connecting to a central computer, either a mainframe or some other host system through a terminal. The second platform saw this evolve to the use of personal computers in a client-server relationship, and then as the Internet came into the equation, application servers and web-enabled applications.

The third platform, the current iteration, sees a democratisation of technology across enterprises and consumers through such trends as mobile devices, cloud computing, social media, application development platforms and analytics.

The third platform does not restrict access to any of the above. Where the first and second platforms were the domain of the enterprise, the third platform now lies in the palms of consumers of all ages, both to create and access information. This contributes to what IDC calls the digital universe, and it is the exponential growth of the digital universe which has led to the phenomenon known as big data.

By IDC’s estimation, 90 per cent of the world’s total data has been created in the last two years and 70 per cent of it by individuals. IDC predicts the digital universe will expand in 2013 by almost 50 per cent to just under 4 trillion gigabytes. Despite this, 38 per cent of organisations still don’t understand what big data is.

Big data is a term that has been thrown around quite extensively over the last couple of years, and in the process has been misused, misaligned, misconceived and misinterpreted.

At the heart of it, big data has described the sudden explosion of data through the proliferation of smartphones, tablets, sensors, scanners, machines and any other receptacle of electronic information, but the concept is far more encompassing than that.

IDC defines big data as a new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and/or analysis.

The real problem of big data is not so much about volume. Technologies are continually evolving to manage the growth of data, and the Hadoop Distributed File System seems to be the emerging standard most solutions are adopting. The real problem lies in the variety and velocity of data.

Big data is messy. It is unstructured and does not fit neatly into the rows and columns of the relational database. It is varied and comes in different types and from different sources.

Organisations are now collecting social media feeds, images, streaming video, text files, documents, telemetry data and so on, reading everything from sentiment, to expression, to electronic forms, to genomes, to soil temperatures and pH levels. This variety of data is hard to render into a structured format and almost impossible for a standard query language to interpret.

Data is being created as fast as it is being collected. High velocity and streaming data could become obsolete minutes after it was created as in the movement of markets on a stock trading floor, or multimedia streaming used for surveillance and security. The challenge with this is to be able to take action on the insights from information that is ever changing.

However, even the variety and velocity of data may be the least of an organisation’s concerns. In a recent IDC study, which polled 300 organisations from all industries across Australia, 47 per cent of respondents revealed they do not have the skill sets required to manage big data.

While the DBA was quite adept and trusted to manage the structured data in a relational database, he is suddenly out of his depth when it comes to mapping and contextualising large volumes of data of different types and sources.

The dilemma is that the skill sets required to manage big data are not those a DBA can typically up-skill for, which leaves many organisations exposed when it comes to dealing with the new unstructured data coming into their environment.

Page Break

Skills required

The skill sets required to manage big data are less typical of our preconceived image of what an IT professional should be. While there is still an element of technical acumen needed, the actual tasks involved in measuring and interpreting unstructured data are more suited to people skilled in mathematics and physics. It is more scientific than technical, giving rise to such titles as data scientist, data modeller or analytician.

In an attempt to accelerate the availability of these skill sets in the market, many vendors are incubating universities with programs and courses to attract students to big data as a lucrative career option, but it is yet to be seen how quickly supply will meet demand in a landscape that is becoming more competitive.

Furthermore, people who have these skill sets or come out of universities where these skills have been honed to perfection will be resources in high demand. We are more likely to see these people make a name for themselves through new start-ups in the style of industry luminaries such as Gates, Jobs, Ellison and Page.

This raises a very interesting scenario for end-user organisations. If skilled graduates are absorbed by the vendor community or go into business for themselves, this leaves a significant skills gap for end-user organisations to fill; and until they are able to do so, they face the very real possibility of costly consulting engagements in their big data projects. End-user organisations will be challenged to circumvent this scenario with strategies to attract skilled professionals.

Big data has been correctly labelled a disruptive force in IT as it caught the market by surprised and left organisations flummoxed about how to deal with it. No one could have anticipated that the third platform technologies they had adopted as a way of doing business would result in the Big Bang that created the Digital Universe.

A sense of ownership

Discussion about big data continues as technology vendors vie with one another to “own” the concept of big data by putting certain exclusivity to it through some cutting edge technology in attempts to harness its commercial potential. This is not necessarily a bad thing.

The phenomenon of big data, along with other disruptive forces such as cloud computing, mobility and social media, has fuelled a new renaissance in innovation and technological advancement that has implications far beyond the corporate sphere.

The discussion around big data has led to a certain air of expectation. The Australian market was initially seen as slow in adopting big data technologies, but this was fuelled by a misconception that big data was only about volume.

As yet, most Australian organisations do not have anywhere near the hundreds of terabytes of data typical of organisations in Europe and the US. The focus for Australian organisations revolves more around the variety of data they collect and what use they put it to.

There are many in the market and in the industry that dismiss big data as mere hype. However, this does not make the reality of the situation go away. We are still faced with a growing volume and variety of data at a pace never experienced before.

Where technology has traditionally been able to evolve to meet the demands of the collection and analysis of information, the data deluge we are seeing currently is like a tsunami we were unprepared for.

It is not difficult to see how hype and misconception of big data have led some to become disillusioned with the whole concept. I would suggest however, that the disillusionment is only a result of not being able to see its full potential.

There are already examples of it being used in many instances outside the corporate sphere such as crime prevention, where law enforcement agencies are able to identify crime hotspots through crowd sourcing and take pre-emptive action that in the end contributes towards a better quality of life in the neighbourhood.

While this is just one example, there are many other use cases for big data that would have a tremendous impact on the quality and indeed the value of human life. The digital universe is constantly expanding and if not already, will soon hold the formula to eradicate terminal disease; and to predict weather pattern that lead to drought and famine; and to identify and trace the root of terrorism.

While there still may be a way to go for big data to solve many of the world’s problems, it is the individual, through the use of third platform technologies, that gave rise to big data. And it will be the individual in the form of the data scientist, the data modeller and analytician that drives innovation in the footsteps of Herman Hollerith.

Big data is a relative term. It has come full circle and is likely to do so many more times yet.