Micro Focus is now part of OpenText. Learn more >

You are here

You are here

Data management on the edge: Time to get partitioning

public://pictures/davidl.jpg
David Linthicum Chief Cloud Strategy Officer, Deloitte Consulting
 

The idea behind edge computing is simple: Place the data near where it's gathered. That eliminates the need to send all data gathered over the network, sometimes over the open internet, to some centralized resource.

The challenge is where to store all that data.

The recent IoT boom introduced the world to IoT in the form of connected thermostats, automobiles, factory robots, and pretty much everything we touch these days. Coupled with the need to process data as close to the device as possible, IoT naturally led to the birth of computing at the "edge" of a larger, more comprehensive system—or edge computing.

Do you send all that data back to some centralized server? Perhaps one that exists on a pubic cloud? Or do you keep as much data as possible at the edge device?  

This is the architectural challenge that’s now facing those considering the use of edge computing. The potential application could be a one-off enterprise application, such as purpose-built devices to monitor a factory floor, or new IoT-enabled devices placed in farm fields to provide better data for crop management. 

At the heart of the problem is how to orient edge computing's data partitioning. While the number of use cases are numerous as the types of edge-based devices and systems available, a few core patterns are beginning to emerge. 

Response-oriented edge data

This is the main reason to place data at the edge. This category includes any applications where data needs to be stored and viewed immediately, and you also need to avoid both latency and connectivity issues. 

One example is edge-based systems on jet airplanes. While some of the data, such as maintenance and performance data, is centrally stored, the data related to the working of the core avionics is stored onboard. 

The result is little or no latency when that data needs to be accessed, such as when the avionics system needs to automatically correct problems with the engine (e.g., fuel mixture) in near real time.

The core actions that result are typically procedural and simplistic problems related to simple data. These include: If this is out of whack, then do this. However, they can be more complex, such as: If this is out of whack, then try this, and if that fails, try this, and if that fails, try this. Those who have done basic programming understand these patterns. 

The idea is to make the simple data instantaneously accessible, with the ability to react to the simple data. The goal is to make tactical transaction-oriented decisions and take actions based upon the state or values of the data in the edge-based device or system. 

Analytics-oriented edge data

This is a bit more complex. This pattern allows you to take immediate actions based upon the simple state of the data, such as the current internal temperature of a robotic welder. 

However, you can also analyze the data at the edge. This allows you to derive more value from the data, by doing such things as considering a million records of temperature data from the robotic welder to determine patterns that may alert maintenance of issues that need to addressed to avoid a failure that stops production. 

This type of deep analytics normally occurs on centralized systems, such as in public clouds. However, in this case, it's more efficient to do the analytics on the edge. 

As edge-based devices become more powerful, analytics-oriented edge data is becoming a more feasible option. Even deep analytics, normally reserved for cloud servers or traditional on-premises servers, work just fine at the edge device or system. 

Today's analytics-oriented devices can cost less than $100 for the hardware and have a footprint similar to a deck of cards. 

So how do you partition the data?

Emerging issues to consider when deciding whether to partition data at the edge or the at central server include:

  • The usage patterns of the data at the edge, and the capabilities of the edge-based device or system
  • Data security and governance requirements
  • Performance and reliability
  • Physical data storage on both ends 

Central to the partitioning decision is looking at the core requirements of the system at the edge, including how you'll use the data. Moreover, you need to understand if data storage at the edge is even viable. 

For example, say you're looking to store GPS and soil data for a mapping system hosted in a drone that's programmed to fly around a cornfield to do soil moisture assessments for the optimization of irrigation systems. You need to look at a few things:

  • Can the drone's onboard storage, memory, and CPU store the data gathered during each flight?  
  • How much data will be stored, and how will it be shared with external systems, such as systems that exist on public clouds?  
  • Is the data needed to allow for responsive actions, such as going over the same part of the field twice, if data collection failed for some reason during a flight?  

Ask the tough questions about what really needs to be done, as well as how those tasks could fail to get done, to build a successful edge-based system. How will you likely use the data? This leads you to how the data should be partitioned between some central system and data store, and storage that exists at the edge. 

Security issues with data gathered

While most people don't think of security as a primary driver for edge/central data partitioning, for most applications it's going to be at the forefront. Sometimes this is due to the sensitive nature of the data being gathered at the edge. Other times, if the data is changed or corrupted in any part of the partition, bad things can happen. Consider the jet engine example above.

Edge-based security technology is still emerging, and thus much of the security you'll have to implement will be tactical in nature. You can count on encryption and key management to be core to data security on the edge device, and in some cases multi-factor authentication will play a role.

Governance, performance, other concerns

Data governance also has to fit at either the edge or central systems. This means dealing with change management, data policies, and other ways to put guardrails around how the data is used. For example, in the case of the drone, data can be accessed only from inside the drone's onboard systems during flight. 

Consider performance and reliability. Look at how to optimize edge-based data partitioning for usage. Remember, you typically leave the data inside the edge-based device or system for better performance. You don't have to transmit the data over the network to a central cloud-based computer for processing, and thus the device has the ability to react with little or no latency. 

As always, there are tradeoffs to consider. Some edge-based systems will be slower because more processing is performed at the edge, or because of latency associated with slower I/O systems on the edge devices. 

Know your data

Considering physical data storage at both ends would seem to be the first step. But first you need to understand everything I just discussed before you can make an accurate assessment. Issues to consider include:

  1. Data growth over time. Edge-based devices and systems are typically limited and difficult to upgrade, whereas cloud-based storage can be increased at the push of a button. 
  2. Data access across partitions during processing. Will you run processes that span both the edge-based computer and central cloud-based servers? If so, what access patterns will you employ?
  3. Data storage systems. Will you leverage a traditional database on both sides, such as MySQL? Or will you leverage a purpose-built database, such as those built for embedded systems? There once were few options, but many edge-based systems leverage traditional operating systems and thus can host traditional databases.

Your goals: Access, plus reduced latency

So how do you separate cloud-based data from edge-based data? There are many pros and cons to consider, but you have more choices than you did just a few years ago. 

Edge devices continue to get more powerful, with more storage, faster processing, and features that mimic a traditional server, such as running standard operating systems. That means the opportunity to partition data and data processing on these devices will continue to increase. 

You need to allow access to the data in reliable ways, and with as little latency as possible. It's time to get better at partitioning across these systems. 

Keep learning

Read more articles about: Enterprise ITData Management