You are here

5 things software engineers need to know about big data

public://pictures/Walter-Maguire-Chief-Field-Technologist-HP.png
Walter Maguire, Chief Field Technologist, HPE

Do you ever take a look at the journals published by business schools? The article titles are full of stock phrases such as “the data-driven company,” “the insight-driven enterprise,” “analytics as a differentiator,” and so on. The idea seems to be that businesses can become sleek, fast-moving analytic predators, consuming data 24x7x365, producing insights, and adapting continuously in an ever-changing world.

Not only that, but IT infrastructure, processes, and applications also need to change right along with everything else, providing 24x7x365 agility, availability, scalability, performance, and new insights—all in line with what the business needs.

This is what gives engineers gray hair.

And by engineers, I mean folks who figure out how to deliver digital information to the people who need it. In the context of big data, this sort of engineering includes core IT people, application developers, and data engineers who specialize in the plumbing needed for data and analytics. It also includes QA (testers) people as well, of course. If you are currently involved in any of these roles, then you're the sort of engineer I'm referring to.

If you work for an organization that's embarking on a big data project, or thinking about it, you as an engineer may well be involved, because your expertise in one of these areas could help your organization turn terabytes of seemingly useless information into something valuable.

If you see yourself in that picture, this article might help you in the transition to a big data mindset. Read on!

[ Digital transformation can be a costly failure without proper controls. Find out how IT4IT value streams can help in this upcoming Webinar. ]

The data revolution reminds me of agile

At a lot of companies I work with, the person who does data engineering/analytics and the person who does app dev are the same person, or they're part of the same team, working together. At the larger companies we see those two roles along with a third role, "IT specialist," more clearly differentiated, but still working together, still engineering. 

For example, the folks who program the iPhone's Siri are also the folks who have to analyze the data—because it’s a data-driven product. At a number of mobile ad firms, it’s the same thing—the product is a data-driven platform. I suppose this points to the larger trend of data and applications becoming increasingly intertwined. 

In the early 2000s, I was working as an architect at a large HMO. Our team designed and managed the databases that kept the business running. As part of an initiative to become more competitive, the company decided to embrace the web and develop a number of customer- and partner-facing web portals. So it created and hired a new team of developers who, in one of our first meetings, pointed us at a document titled “The Agile Manifesto.” 

Our team had a good laugh over that, and I think we might’ve sent an email or two along the lines of recommending that we hire a fairy godmother, a unicorn, and a few leprechauns to help things along. But then we realized that they were serious. And that we had a massive mismatch between data infrastructure, operations, and web development. It’s a much longer story, and it took us years to sort it all out. 

[ Looking to bring innovation into your enterprise? Learn from others' Enterprise Service Management (ESM) implementations—and get recommendations for deployment. ]

Five ideas for engineers in transition

Along the way, I picked up a few things worth sharing with engineers about the new world of data. We didn’t call it "big data" then, but looking back now I see the challenges very clearly as the same ones many of our customers struggle with today. I have a top-five list of things every engineer—data/application/QA/IT engineers—should know about big data, along with some tips. So without further ado…

1. Build it or buy it?

IT shops have had a hard time since Y2K. I recall those times well, when budgets were growing, outsourcing wasn’t widespread, and an IT veteran could make a good living without working 80 hours a week. Fast forward 16 years, and relentless outsourcing, flat budgets, and unrelenting cost pressures have driven many out of the field. 

For those who stuck around and the new generation entering the field, inventing new stuff has been one way to have fun and advance careers—especially when most IT shops aren’t prepared to compare a clear expenditure such as software license to the hidden expense of building something from scratch. Got to reduce budget? The team says it can replace that vendor’s database with something they will build themselves. So you can stop paying the vendor’s exorbitant licensing ... wow!

Don’t do it.

It would be easy to make this mistake in the big data world, since it’s a relatively young challenge. My recommendation is to evaluate whether the new technology is strategic to the business (analogous to the Google search algorithm) or not. If it isn’t, it likely isn’t worth getting into the business of building it. Just buy it.

2. Cope with the mix of data streams

This is probably apparent to anyone looking at the state of enterprise data, but yesterday’s data hasn’t gone away. It's just that now, that data is surrounded by an ocean of new data. So while the big data marketing brochures make it seem as if the problem du jour is all about big data, it’s still necessary to do all the familiar things with the old-school, unglamorous transactional data as well as figure out how to store, process, and (hopefully) monetize all the new data. 

The challenge is that this new world is very dynamic—the data might constantly change, have no specs, be very sparse or unstructured, and so on. And it has to be processed faster than ever. Try telling the CIO that it will take three months to understand that new data flow before it can be made available for analysis, and see how that conversation goes. So don’t lose sight of today’s challenges. But coping with the mix will require some new thinking. Read on for suggestions.

3. Embrace your inner 'craftsman'

We are at an interesting time in data. From the first days of computing, it was all about structure—arrays, lists, strongly typed data, tables, columns, rows, etc. In the last 15 years or so, we’ve seen an explosion in the ways computers record information by us and about us. Video, audio, social media, blogs, electronic media of all forms, smartphones with mobile applications, search services, etc. all translate various types of information that used to go unrecorded into information now potentially addressable by a computer, if only we can figure out what to do with it (and how to do it). 

This offers a dizzying world of possibilities, but it's a recent enough phenomenon that the tools, technologies, and skills are racing to catch up. This is similar to the early days of the automobile, when each one was produced by hand. The automobile and the new paradigm it represented were new enough challenges that engineers in those days were also craftsmen, hand making each car and designing it as they went. This hand-made process continued until Henry Ford finally applied standardized production techniques to manufacture the auto at scale and lower cost. 

Today's engineers working with big data must become craftsmen when working with new data for the first time. Many of my customers tell me that designing a big data flow is as much art as science, anathema to many hard-core engineers. The early stages often resemble craftsmanship—exploring, trying something, breaking it, redoing it many times, and finally getting it just right. Without specs. Without a clear idea of the outcome. So while we big data practitioners aren’t exactly Magellan, in a very real sense we are pioneers. Don’t be afraid to experiment.

4. Agility can mean patience, but never anarchy

I mentioned my days at the HMO, many years ago, when we’d just seen “The Agile Manifesto” for the first time. As I said, we first had a good laugh over it, then we had a collective panic attack. Over the ensuing months, we literally had to retool almost everything to get more agile.

At the time we were lacking some key technologies. Big data tech such as Vertica or Netezza would have been a godsend.  Except that Vertica hadn’t been invented yet and Netezza was too expensive. Still, we made progress. We reorganized. We updated our development tools. We completely changed the development, test, and release processes. We rewrote portions of our data warehouse and ancillary environments to simplify the ability to stage and update environments. We hired and built new skills, and we brought in a few entirely new technologies. 

The process took a long time and was not yet concluded when I moved on to other opportunities. But it was never chaos. We had messy web releases to be sure; some of our earliest were three-day-weekend affairs, with me restoring databases in the wee hours of the morning more than once. But the learnings were not lost, and we stayed on the path. 

Along the way we had to invent new processes. I had to spend a year or so with a small team designing and automating processes for synchronizing dev, test, and prod data environments, all key to successful testing. The need only became clear as we worked through the change process. 

The major takeaway for me from this? That while agile can often feel chaotic, when managed properly, it can be a great model for coping with big data challenges. 

And for the reader wondering why the multiyear effort was important, here’s the before and after:

 

Before

After

Time between releases

6 months+

1 month

Time required to perform release

3 days

30 minutes

Time to stabilize release (post-production)

Weeks

None

Lag time between business idea and implementation

About a year

About two months

5. Always get executive buy-in

Since I started off by poking fun at business school publications, this might seem an odd point. But buy-in from senior leadership is fundamentally important to successfully building any sort of big data program, and it goes well beyond IT. Sponsorship, leadership, and strategy are crucial to success.  

My colleagues are sometimes taken aback when I talk with C-level teams about big data. I usually start by telling them that their biggest challenge is not technology. It’s the will to change and the fortitude to see it through. In 20 years of IT work, I did not see a single instance where magic technology overcame a lack of vision, sponsorship, or motivation. In fact, I watched as some great technologies sank under the morass of a rudderless project with no business buy-in. 

Try this litmus test every once in a while in a big data meeting. Ask the question “Why does this matter to the business?” If nobody has an answer, the whole program might just be in trouble.

Engineers are agents of change

Ten years ago, I changed my career path and moved to the vendor side of things. After two decades in IT, my personal challenge became making my customers successful—not just with technology, but with all the other things required to use it successfully. 

In working with the hundreds of IT teams I’ve spoken to over the last decade, I’ve found that engineers often have more influence than they think. They are the gurus. And while the business often has no clue how they do what they do, they absolutely respect the results that their engineers deliver. This offers the potential for influence, as long as you're willing to learn a few new tricks. 

Take a negotiation or communications class. Learn how to speak the language the business speaks. Think outside the engineering box to the human dynamics involved in decision-making. An engineer who ups his game in these ways can have outsize influence on the setup and direction of big data programs. In fact, I’d argue that someone who thinks out of the box in this way can be instrumental to success by helping to bridge the gap between business and IT.

[ Ready to manage your hybrid IT future? Download Crafting and Managing Hybrid Multicloud IT Architecture to get up to speed on unified infrastructure management. ]