Big Data: The Next Big Thing in IT

Big Data is about to become the next Big thing.

Almost every industry will somehow touch Big Data.

Big Data Analytics is being used by everyone from Target retail, to farming.

So let’s explore this technology, look at the challenges it faces, and the opportunities it’s creating.

(Subscribe to the iTunes Podcast to receive future episodes)


First of all, what is Big Data?

Big Data is a set of Data that is so complex, that it can’t be processed using traditional data processing.

Challenges in dealing with Big Data include transporting, storing, analyzing, processing, sharing, visualizing, updating, and searching.

As you can see, the challenges are many. This also means the opportunities are many.

Although it’s called Big Data, the Big doesn’t necessarily refer to size. Although, the Data sets are often very large.

The term is being thrown around a lot these days. Often it means User Behaviour Analytics, or in Predictive Analytics.

Done right, Big Data analysis offers new insights into complex systems including business trends, crime fighting, city planning, meteorology, genomics, and the environment.

It’s not a walk in the park yet though. As the data sets grow rapidly, from low cost sources including social media and IoT devices, Big Data is growing by several exabytes per day. That’s thousands of terabytes of new data generated every day, and it’s accelerating.

Just looking at these numbers alone indicates emerging constraints in bandwidth, storage, and processing.

You might need hundreds or thousands of computers to process the data.

Too many computers for your budget? That’s where the cloud comes in. Checkout the Enter the Microsoft Cloud podcast for more on that.

There are many cloud providers with different capabilities. Being able to dynamically use millions of processors for a few hours to crunch some Big Data numbers in the cloud is relatively cheap, comparing having to buy, install, and maintain thousands of computers.

Big Data has three primary dimensions, Volume, Velocity, and Variety. Volume is how much of it you have. Velocity is how fast it’s moving. This means it can be continuous, providing a real-time data feed. Variety is how many different kinds of data you have, such as where it comes from.

Big Data has been defined as high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.

Machine Learning, a subset of Artificial Intelligence (AI) can be applied to Big Data to identify patterns.

Business Intelligence uses descriptive statistics to measure and identify trends in Big Data.

Inductive Statistics can be used to show relationships and dependencies, and predict future behavior.

Other features of Big Data include it variability. Often there’s missing information in the data set. Methods must be used to fill in or ignore the gaps.

Veracity refers to the possibility that there could be gaps in the data flow itself, which can muddy the results.

If you look under the hood of Big Data, you will see the underlying technology required to process all this stuff.

Hadoop is the first thing that stands out.

Over a decade ago, Google engineers were looking to optimize parallel processing on their servers, so created their own version of what was called MapReduce. This framework allowed them to split up processing on multiple servers.

Open source versions followed. Apache stepped in with their open source project called Hadoop.

Today, these frameworks are meant to make the processing engine look transparent to the end user.

It’s the role of a Big Data Architect to design and setup the Hadoop nodes.

So what will this do for your business? By applying Big Data analytics your large data sets, you will be able to discover trends, not visible using other methods.

Without Big Data, you essentially cannot see the forest for the trees.

Big Data leverages technology to supercharge your analytics and help you spot hidden patterns.

Although Big Data is no longer a new cool buzzword, it’s beginning to reveal its true value to those who know how to deploy and use it properly.

Even the most obscure data can be viewed as part of a significant pattern when used with the right combination of people, processes, and tools.

If you take all you Big Data assets, and put it together perfectly, but ignore external web data sources, you will be at a competitive disadvantage.

The wider your data sets, the more precise your measurements will be, and the more powerful your insights.

Once you’ve reached the limit of your data, you’ll have two options to move further. Dig deeper into you existing data, or look to getting more data.

Digging deeper into your existing data may be more intensive. For example, if you decide to go after dark data, such as papers in filing cabinets. Digitizing this stuff and adding it to your data set can take forever.

If you are considering this path, make sure the outcome is with it.

There is tremendous value for organizations of all sizes to harvest Big Data. It will enable data driven decision making and drive your organization on the road to success.

2016 was a tremendous year for Big Data. More organizations were storing, processing, and extracting it in more shapes and sizes than ever before.

In 2017, the number of systems that support structured and unstructured data will continue to increase exponentially. The market will demand more platforms for custodians to govern and protect Big Data that enable people to extract meaningful insights. These systems will mature and find their way into organizations, becoming standard offerings.

The need for speed and usability will drive the adoption of faster databases like Exasol and MemSQL. Apache Impala, Hive LLAP, Kudu Stores, Phenix, and Drill, and OLAP-on-Hadoop technologies like AtScale, Jethro Data, and Kyvos Insights on Hadoop will enable faster queries and make Big Data feel like a traditional Data Warehouse. More…

We are seeing non-Hadoop solutions to Big Data as well. Relational databases are becoming Big Data ready. Microsoft SQL Server 2016 supports JSON. JSON means JavaScript Object Notation. It’s easy for people to read, and easy for machines to process. Big Data sets can be stored in JSON.

We are heading in the direction for analytics on all data. Platforms that get any data from any source will rise, while those that just for Hadoop, will fall.

Big Data uses a concept called a data lake. The water in this lake is of course, data. to build one, you build a damn, or cluster of servers, and fill it up. You can then use the data for predictive analytics,  machine learning, etc.

Hydrating the lake has been a goal in it’s own right. This will change going forward, as the business need for Hadoop narrows.

Organizations will require faster and repeatable answers to questions, agile style.

Better outcome management will provide stricter investment in people, data, and infrastructure, perfecting a union between business and IT. Big Data assets will become self-serve.

in 2017, architecture flexibility is king. Variety, not volume or velocity will drive the biggest investments.

You can bet on Apache Spark and Machine Learning to light up Big Data. Microsoft Azure ML is taking off as THE user friendly platform that’s integrates well with existing Microsoft investments.

Making the data approachable to the end user is increasingly critical. Self-service software providers will be well positioned to solve this problem.

The business plan for most Silicon Valley organizations in 2016 was to add AI to X, and creating new IoT devices. The convergence of IoT, Cloud, and Big Data will create new opportunities for self-service analytics companies.

One of the biggest challenges now is to make Hadoop data accessible to users. There are platforms helping with this, but there is plenty of room for improvement.

Some companies that are focusing in on this area include, Alternyx, Trifacta, and Paxata. They are lowering the entry bar for late Hadoop adopters.

Big Data isn’t some future thing anymore. It’s moving into the IT department’s core. The data needs to be secured and governed. Apache Sentry and Apache Atlas have stepped up to the plate to provide both respectively.  Security administration can be handled with Apache Ranger.

Clients will simply expect these capabilities, thus breaking down adoption barriers.

Gone will be the days where data gets thrown out. We are going to keep everything. This will create a new problem. How do we find it all? Metadata catalogs will be the key. Companies like Alation and Waterline are already aligning themselves in this space. They are using Machine Learning to find data in Hadoop.

What will follow the need to get data in a self-serve manner, will be demand for self-service data discovery.

How do we know for sure this is the direction we are heading?

Because one of the best way to predict future change is by looking at the forces that aren’t changing. These forces include our desire to get everything faster and cheaper.

This is what will drive Big Data to be the Next Big Thing in IT.

Thank you for reading.


Leave a Reply