Micro Focus is now part of OpenText. Learn more >

You are here

You are here

7 tips for designing successful big data applications

public://pictures/image2015-4-7 13-54-37.png
Paul Korzeniowski Blogger, Independent

Big data applications are becoming a major force in many industries. Healthcare technology company Cerner works with doctors to more accurately diagnose potentially fatal bloodstream infections. Farm management software company FarmLogs relies on real-time analytics to improve growing conditions, vegetative health, and harvest yields. Online dating site eHarmony analyzes personal information with the goal of making the right match.

As a result of such applications, big data technology is hot, hot, hot: market research firm International Data Corporation (IDC) projects that a 26.4 percent compound annual growth rate with revenue reaching $41.5 billion by 2018. As evidence of big data's significant impact, that increase is about six times higher than the overall information technology (IT) market, which is growing at 3.8 percent in 2015, according to IDC.

Despite all the Hadoopla, enterprises discover that big data deployments are often strewn with potential pitfalls. These applications don't follow the typical deployment process, so developers must think and act outside the box. Initial roll-out costs can be high and return on investment (ROI) can be amorphous, so getting a new project off the ground can be challenging. Working with ginormous volumes of data means programmers must guard against potential performance issues.

But programmers can take steps to increase the likelihood of successful development by setting clear expectations, starting small, and cleansing data near its source. Here are seven recommendations from the experts.

1. Don't treat big data like other projects

"Deploying a big data application is different from working with other systems," said Nick Heudecker, research director at Gartner. Big data vendors don't offer off-the-shelf solutions but instead sell various components (database management systems, analytical tools, data cleaning solutions) that businesses tie together in distinct ways. Consequently, developers find few shortcuts (canned applications or usable components) that speed up deployments.

In addition, each firm's data and the value they associate with it is unique, so there's no simple, straight line from project conception to production. Instead, developers have to work closely with business units to craft and constantly refine design requirements. The end result is a lot of the development work falls on the business's shoulders. In fact, 72 percent of the costs associated with big data come from personnel, according to Anne Moxie, analyst at Nucleus Research, Inc. When beginning a project, developers need to get ready to hunker down, roll up their sleeves, and dig in for a long, sometimes tedious process.

2. Write specs in pencil, not pen

Defining clear project objectives is another area where big data is an odd duck for IT pros. Typically, management sets clear goals at the start of a project—for example, improving the user interface of a web page. But targets are often murky in the beginning of a big data project, which is often simply about exploration. Companies mine large sets of data with the hope (and usually no guarantee) of discovering valuable business insights that will streamline processes or increase sales. At the project's beginning, the potential benefits are often largely uncertain, and they only become clearer as the work unfolds.

Big data application development is an iterative process requiring patience and faith. "A corporation may start down the wrong track 19 times before hitting pay dirt on the 20th attempt," said Gartner's Heudecker. Developers need to prepare for a process where the end goal is a vague hope rather than a clear objective, and where the next step often alters (and sometimes scraps) the previous one.

3. Think long-term rather than short term ROI

Normally, before top managers approve a new project, they want to understand its potential pay-off. A common cost-justification methodology is ROI, where one measures a project's potential value versus its initial costs. "Typically, new projects promise increased revenue or decreased expenses," said Nucleus Research's Moxie.

In most cases, the return is clear at the start of a project, but as noted, big data comes with no such assurances. In fact, firms initially lose a lot of money on their big data projects: Wikibon.com found that first time projects deliver $0.55 for every $1.00 spent.

Such results are unwelcome news to top management ears. Consequently, developers need to shift the executive focus from now to the future. "Big data projects carry significant risks but they also deliver big rewards," noted Samar Forzely, managing director at Market Drum Corporation. Dramatic returns do occur (eventually) in some cases; for example, a vacation resort cut its labor costs by more than 200 percent by syncing its scheduling processes with National Weather Service data, according to Moxie.

4. Start small and cheap

Big data is, not surprisingly, big. "One client had 50 terabytes of information that they were working with," said Dave Beulke, president of Dave Beulke & Associates, which specializes in big data application development. As the Internet of Things takes shape, even more information will be gathered. One way to doom a new project is by shooting for the stars. Large projects can cost millions of dollars. The board of directors won't easily sign off on such expenditures, especially since the return is so tenuous.

Instead, developers must work with the business unit and convince them to start small with a limited proof of concept project. "There is no need to immediately buy a new Hadoop database and the infrastructure needed to support it," said Market Drum's Forzley. "In many cases, developers can piggyback on existing pools of departmental data and limit initial big data investments." Starting small enables programmers and business users to become more comfortable with the technology and build on their experience.

5. Let users play

Big data involves more art than science compared to typical IT projects. Developers need to ensure that their systems are flexible, so employees can "play" with information. One way to meet that need is by constructing sandboxes, practice areas where data scientists and business users experiment with data—ideally with tools, languages, and environments they're familiar with, according to Gartner's Heudecker.

Faceted search can be another helpful tool. Traditionally, database management systems housed information in strict hierarchical systems that allowed only one way of accessing the data. Faceted systems classify each information element along multiple paths, called facets. Taking this step enables data to be accessed and ordered in multiple ways rather than in the single, predetermined method.

Annotation tools are a good feature to include in a big data system. This functionality enables employees to add insights and interpretations of data and then send them along to coworkers for comments. Such interactions are critical in generating areas in need of further evaluation and ideally lead to "aha" moments, where managers work together to gain new insights into business operations.

6. Spend extra time on user interface design

The success or failure of a big data project revolves around employees' ability to tinker with information. One challenge is translating a large volume of complex data into simple, actionable business information. "The developer needs to be sure that the application algorithms are sound and that the system is easy to use," stated Moxie.

In the background, developers work with data scientists to fine-tune complex mathematical formulas. In the foreground is a user, who often isn't skilled technically and may be mathematically challenged. Therefore, the application has to filter the data and present it to the employee in an easy-to-follow manner so they can probe further. "Many times companies will present too much information to the user and overwhelm them," said Beulke.

In response, user interface designers have increasingly become key members of the big data development team. These individuals are experts at understanding how users interact with information and therefore help cut through the potential clutter and present sleek interfaces to users.

7. Keep an eye on performance

Today, employees using big data applications expect instant results, even when they enter complex queries that sift through millions of records. Consequently, developers must ensure that no performance bottlenecks arise with their big data applications. Storage systems are one potential problem area. "Developers need to keep an eye on system I/O; big data apps generate a lot of reads and writes," noted Beulke.

One way to cut down on potential delays is to cleanse information near the source. Organizations work with information from a variety of different database management systems, which categorize data in different ways. The accounting department may have a nine-field customer record and the services department may have 15-field record. As information is consolidated, developers need to make sure the data looks the same, a process called "data cleansing." Making these changes near the data source means less traffic is added to the company infrastructure.

Storage is another area that impacts performance. As datasets become larger, the challenge to process them quickly increases. A developer may partition data, separating older or "almost stale" data from newer information. Another option is a tiered storage solution. Here, the currency of the data determines its storage location. For example, frequently used data is housed in flash or fast hard disk systems. Less frequently used data can be placed in a second, less expensive tier. Stale data can be placed on slower bulk media, perhaps even on tape.

Worthy of the challenges

Big data applications have the potential to profoundly impact how businesses function. Consequently, organizations are dabbling with these systems and finding unique challenges. Developers can clear these hurdles by recognizing how the applications differ from traditional systems and accommodating those differences.

Keep learning

Read more articles about: Enterprise ITData Centers