You are here

How I jumped from software testing to data science

public://pictures/kenjohnston.jpeg
Ken Johnston, Principal Data Science Manager, Microsoft

My journey from testing into data science was pretty straightforward. First, I learned how to break software, then to monitor production services for regressions, and finally to build models to optimize user experience. Today, I even build artificial intelligence (AI) models for recommender systems, chatbots, and cutting-edge privacy protection algorithms.

Good testers have an innate touch for data that makes them great candidates to become data scientists. I have worked with many testers who have transitioned to data science, and if you are an inquisitive, data-driven tester, you can become a data surfer too. Here's how.

[ Is it time to rethink your release management strategy? Learn why Adaptive Release Governance is essential to DevOps success (Gartner). ]

A misguided belief

On my first day as a tester, my new boss stopped by my desk and dropped a copy of Kaner, Falk, and Nguyen's book—Testing Computer Software, second edition—on my desk. It hit with a loud thud, and I jumped half out of my chair.

He was the kind of test manager who constantly thinks of ways to torture software, sucks the lifeblood out of code, and makes developers tremble. He also had a quirky sense of what's funny, such as dropping books and scaring new testers half to death.

Two weeks later, I was attending my first software testing SIG (back then we called them special interest groups, not meetups). I can't remember who was presenting but he told a story about being the test manager at the maker of an early word processing application. 

The presenter went on and on about how many test cases they had, how much automation they had, and how, from release candidate to signoff by test, they could execute a complete test pass within four weeks (that was pretty fast back then).

He also said that if they found a showstopper, the signoff process would stop and the clock would reset. He seemed to be bragging about how he and his test team had caused a nearly six-month slip in the release of the Windows version of the app because they would find another showstopper during every test pass.

"Wow," I thought to myself, "the goal of testing is to protect the customer from evil bugs created by evil developers, and the ultimate manifestation of that goal is to slip the ship date!" That misguided belief stuck with me and drove many of my behaviors around software testing for nearly a decade.

[ Special Coverage: STARWEST Conference 2019 ]

Learning to love data

The job I had that first got me excited about data was serving as the scale and performance test lead for Microsoft Commerce Server. This was a great product. We were betting on the web, and were confident that millions of companies would want to buy our product, buy expensive hardware from the big vendors, install our software, invest in custom code, and put the server on the Internet so they could sell their products.

We got that business model wrong.

Regardless, I received an invaluable education in biggish data. When my team and I would set up for a test run, we deployed bits to multiple test clusters. Each cluster consisted of four quad-processor servers. When everything was set, we would gather in a corner of the lab. The fans were blowing hard to control the heat, and it was hard to hear.

On one side two team members waited, fingers on buttons. Another colleague and I were on the other side. (If you remember the episode of The Big Bang Theory where they feverishly try to get San Diego Comi-Con tickets by refreshing browsers, it was kind of like that.) After the countdown, at zero, we all pressed "enter" at the same time, and our bank of 500 test automation clients went into motion, hammering our test environments.

Those tests were huge. Every run generated several gigs of performance data. To store that, we had a huge, direct-attached hard drive disk array with nearly 10TB of storage. After a run we would spend the next day or two analyzing the results.

It was a heady time. My annual hardware budget for the performance and scale lab was nearly half a million dollars. Because we had to test at Internet scale, my lab was the envy of just about every other team at Microsoft.

I never planned to move on from testing. When I was the director of test excellence for Microsoft, my team and I worked on developing all the training materials for our 10,000-plus software engineers. Bj Rollison, Alan Page, Harry Robinson, Ron Grumbach, Tracy Monteith, and I collaborated on all the materials. We developed classes: intro to testing, advanced testing, test management, and test automation.

My passion for testers, test culture, and software testing was nearly insatiable. Somehow I convinced Rollison and Page to collaborate on our master work, How We Test Software at Microsoft. In that book we captured the height of formal structured software testing at Microsoft. Page will eagerly tell you that the cloud has changed so much about software testing we no longer recommend most of the techniques covered in our book.

[ Get Report: The Top 20 Continuous Application Performance Management Companies ]

What drove me to data science

What propelled the changes to testing at Microsoft, and moved me away from testing and toward data science, were the cloud, agile development, testing in production, and the MVP (minimum viable product) movement.

All of these changes meant that products and services had to evolve and release faster and that we needed to rely upon real-world telemetry over formal structure testing and lab-based results.

At this point, when I started to actually see data from real customer usage, I realized that getting products into the hands of customers earlier and faster was better than slipping ship dates.

The goal of testing was no longer just protecting the customer—it was finding a way to release as much functionality as possible in a safe way to customers as fast as possible. I called it minimum viable quality (MVQ).

For high-risk products such as spaceships, this still means a lot of lab testing and redundant systems. For large-scale web services such as email or search results, your MVQ is a very low bar, pretty much down to the level of whether or not the service crashes.

This is real testing in production where you try the new code with a little bit of production traffic, monitor it for regressions, and slowly increase the load. If all looks good, roll the new code to every portion of production.

How I made the transition

By embracing MVQ and shifting my focus on quality from the lab to production, I took my first real steps toward data science. I no longer needed a lab to produce gigs of telemetry but instead I had terabytes of live production telemetry every hour. We didn't have days to look through the test results and find root causes; we had to detect issues in near real time.

The second step in my journey was joining the Microsoft Bing team. There, I was fortunate enough to work as the group manager and product manager for several of our vertical search products, including shopping, images, video, TV, movies, and local.

These products needed tens of thousands of labels in more than a dozen languages and markets. They were all fed into large-scale machine-learning models. In this case quality was as much about the quality of the data and of the algorithm as it was the quality of the user experience that presented the search results.

Who would have thought it? Data quality is the key component to quality of experience.

From there it was on to the Windows Store and, for the past five years, working within the Windows Core Data Science team, where we bridge the world of academic research and applied research.

My most recent work has me balancing the world of commercial business insights with responsible AI, where we emphasize fairness and privacy in modern algorithms. The team functions a lot like a high-powered AI consulting team. You can read some about how we function in my blog, "Using Agile & Kanban to Manage Your Data Science Projects."

Follow in my footsteps

If you want to transition from test to data science, you will need to take time to learn new concepts and tools, and to refresh your math skills, especially more advanced statistics. Trust me, it's fun and worth it.

Want tips on how you can move from testing to data science? Come to my talk, "Making the Career Transition from Software Testing to Data Science," at STARWEST 2019, which runs from September 29 to October 4 in Anaheim, California. ​​​TechBeacon readers can save $200 on registration fees by using promo code SWCM.

[ Get Report: Buyer’s Guide to Software Test Automation Tools ]