You are here

You are here

Data analytics 101: What it means, and why it matters

public://pictures/Mike-Perrow-Chief-Editor-TechBeacon.png
Mike Perrow Technology Evangelist, Vertica
 

Incorporating analytics into the enterprise toolkit can be straightforward or complex, relatively inexpensive or a multimillion-dollar initiative. It all depends on what you want to accomplish.

Analytics is simply a set of practices designed to aid analysis. It can be as basic as creating a paper report that shows monthly snowfall in a particular region, a spreadsheet that uses formulas to calculate entries from columns A and F and then write the results to column Y, or the results of far more complex methods of data reading, sorting, and discovery. 

Big data analytics, a subset of analytics overall, is about the process of managing and investigating large datasets, typically stored in a repository such as a data warehouse or data store. These may reside on one or more public clouds or on an organization's own on-premises hardware.

The sources of that data can be nearly any kind of system. Traditionally, these sources provide structured data as described in this story, including transactional (sales-related) data such as point of sale, inventory, customer relationship management, and records from related systems. Structured data also includes information from IT operational systems and machine data from sources such as remote sensors and other aspects of the Internet of Things.

Less traditional is the storage of unstructured data for analytical purposes. Unstructured data—photos, video, audio, etc.—has been considered viable for analysis only over the past 15 years or so.

Here's what you need to know to start to make sense of all of your data.

The modes of data query and analysis

Regardless of the data type, you store it to eventually learn what it may reveal. And you need an analytics process to help make sense of what's there by asking questions that a database can interpret. This involves writing a query, done via a language that a database understands, such as SQL (Structured Query Language).

 

Querying a database via SQL long predates the big data phenomenon. SQL database vendors in the early '90s made it easier than it had been before for businesses to use SQL to understand trends in their acquired data. Another piece of this was also to create stored procedures that external applications could use to automate processing of back-end data storage and end-user results.

Nowadays, a big data query can certainly be an ad hoc question, because a business user might want to know how many bicycles he has sold to medium-income buyers in Illinois during the month of October.

Setting up standard queries

But a big data query is not always a "question" in the traditional sense of one that occurs to an analyst to ask relative to some emerging situation. Often, queries are set up to work automatically for routine reporting, or as some event triggers the need for analyzing the data on hand and producing some behavior.

In other words, standard queries can be set up to produce reports on a regular basis, for inventory and supply chain correlations, or about any number of routine operating statistics.

But standard queries can be set up for dynamic situations as well. Say a customer comes to your site and begins clicking on the images and high-level descriptions of bicycles you have for sale. A single click on a bike's name and photo will take the customer to more details about the particular product.

But that same click can cause a variety of related items to display in the margins of the page—helmets, toe clips, side packs, lights, and other accessories. Most of us have experienced this kind of automated behavior when interacting with a commercial web page.

In this example, a user's click on a specific bike triggers a preset query that's designed to read a database of related items and display those items for the user to consider. Simple enough.

Now, imagine this is a returning customer, George, and you have some data accumulated on him—one or more items he's bought in the past, or items he moved to the shopping cart but didn't purchase, or any items clicked on during past site visits. In this case, you have even more intelligence regarding what to display in your margins for George.

That means your preset query not only returns items related to George's current search, but items that specifically match his past behavior, and possibly his continued interest.

Analytics' role in tracking consumer behavior

Keeping track of consumer behavior is how video or music services make movie and song recommendations for you. Netflix, for example, uses prior data from customers to provide you with these kinds of educated guesses. It's also how your favorite outdoor-gear retailer "knows" that you're more likely to purchase clothing than climbing equipment, for instance, and therefore sends you targeted email when it begins a two-week sale.

In all cases, some system is running in the background, making recommendations for users based on accumulated data, predicting likely patterns of behavior, and adjusting offerings based on further data that the system gathers as the user browses the website in response to an email campaign.

Being able to set up this sort of cause-and-effect sequence of likely consumer paths is essential to intelligent business operations. But speed is also a major factor, since the faster a system can respond to these events—e.g., a click here, returning a new display there—the faster a business can act on that insight.

When you compare that capability to what happens on a static web page, where just the details of a bike the user clicks on get displayed, you see the opportunities that a more dynamic, data-driven process can create for the site owner.

Why more data is always better

Big data analytics can also show previously unrecognized patterns based not just on buying habits, but also on equipment failure rates, sentiment analysis of massive volumes of text messages, or health indicators such as heart rate and blood pressure. The more data, the more correlations between various factors you can verify and use to predict and achieve the desired outcome.

Understanding consumer behavior and how to respond is one thing. Businesses can also please customers and help ensure repeat business by understanding their own operational needs and dependencies.

Here's a simple example: Suppose you operate a pizza delivery business that supports multiple restaurants in a given area. Depending on geography, driving times based on traffic patterns vary throughout the day, from morning to midday to evening rush hour. Ideally, you can analyze where driving complexities lie during certain times of day. Google can provide basic route data, and estimate time to delivery.

But an accumulation of local data—based on knowledge of road construction work, current traffic conditions, and alternative routes—can allow you to get more specific. You might learn, say, that there are delivery times of plus or minus five minutes for some restaurants, and you can advise your customers accordingly.

You can also incorporate weather data: What's the forecast? Those adjustments can allow more accurate delivery-time predictions. Without a big data analytics tool, you can't predict your delivery time with much accuracy.

Common modes of analytics

The key word in the previous paragraph is "predict." Predictive analytics is the art and science of probability-based data assessment and decision making. But it isn't the only type of analysis that big data techniques can help with. 

 

Descriptive analytics

This type of analysis "describes" a state or condition that needs to be understood. One simple example: a printed report that shows how many trucks within a fleet of vehicles are available for deployment next Thursday. Electronic dashboards showing system throughput and computational workloads across a business's IT environment is another; this uses data to help IT teams balance the loads properly and reduce bottlenecks.

Returning to the bicycle website example, when George 1) clicks on a bike image, and 2) all he gets is a set of details about that particular model, such as the average customer rating, that's another example of descriptive analytics.

This sort of user experience is little more than a hierarchical display of goods for sale—a report, in a sense—with higher-level elements (bike photo, name, price) leading to lower-level elements (specifications, construction details, colors). This is certainly helpful information, but it's not nearly the extent of what can be accomplished with data. 

To be clear, descriptive analytics is extremely useful in some circumstances. It can help pinpoint what is wrong in a given situation so that corrective steps might be taken. When you feel sick and visit a doctor, you expect the doctor to know what to do about your condition so you'll feel better.

As long as your condition can be described and it's nothing out of the ordinary, it's likely something that can be fairly easily diagnosed. For instance, your blood pressure can be compared against others in your age group, your doctor can show you the average (mean) statistics, and either assure you all is well or advise a new regimen to improve your numbers.

Predictive analytics

This type of analytics is designed to predict an upcoming event, or at least understand the probability of something occurring. As such, it gives you a powerful tool for succeeding in a specific endeavor. In e-commerce, for example, the ability to predict your customers' buying habits based on known patterns is a great way to plan for your next steps.

If George, your prospective bike purchaser, is a returning customer, great. You know something about his buying habits, his preferences, and his decision-making process. But even if George is a new customer, the data you've retained from all your other customers can keep George interested and more likely to buy from you, or at least return at some later date. How does that work?

This requires you to build a predictive model, based on buckets of customer information acquired over months or years of interaction with bicycle customers. If George comes to the site and you have a collection of data that fits his demographics, what do customers like him typically buy? Can you assign a score to George based on his apparent interests and offer him something that has proved successful with other customers at his interest level?

If you have hundreds of customers like him, you have the basis for a model to use for displaying the accessories discussed earlier, plus other bikes, plus any number of elements from your retail bag of tricks that might keep him browsing. There are no guarantees, of course. But a user experience based on predictive analytics is clearly superior to one based on simple descriptive analytics.

Use predictive models to manage the unknown 

Now, here's a great thing about big data analytics: There are patterns lurking within the data that you're likely not even aware of but that you can possibly discover and put to work. That's where predictive analytics based on truly big data comes into play.

In the bike website example, the datasets amassed over time might not be huge, but they have helped you and your team understand some essential patterns that reveal buying behavior, including some patterns that describe a customer who is not interested. When you apply those same patterns to truly large datasets and use a system that can identify similar correlations between one activity and another, wholly new patterns that can lead to success may become evident.

Simply put, patterns are maps to behavior, human or otherwise. But why does it matter?

Imagine being able to sort through hundreds of thousands of details from your sales transactions to discover whether or not men in Canada between the ages of 25 and 40 tend to buy your thicker, dark socks in October. You do some correlations, and you find out—no! They tend to buy brighter colors during that month.

You can now run a sale and web promotion on those items—and watch your sales increase by 30%. You can add more variables, too, and test theories about buying habits based on income range or seasonal conditions. Is this going to be a harsh winter? Will that increase the probability of my sock sales or not?

Removing some of the guesswork

You might argue this is just an example of the educated guesswork that businesses have used for years, and you might be right. But big data analytics can take guesswork out of the picture when you encounter situations such as this.

Keep learning