IoT monitoring and control risks every developer should understand
During his AWS re:Invent conference keynote last October, Amazon.com chief technology officer Werner Vogels introduced AWS IoT, a new system that puts a control loop for a "smart thing" into the Amazon cloud using a fairly straightforward set of services, transports, and certificates. To demonstrate that the response time of such a long loop isn't necessarily bad, a colleague guided the movement of a toy robot arm with a Leap Motion Controller. The controller sent its state to the AWS cloud, and logic running in the cloud updated the "shadow state" of the robot arm. Other logic noticed that the shadow state differed from the actual state of the arm and sent motion commands to it, enabling it to move more or less in sync with the demonstrator's hand.
My first reaction to this was being impressed. My second reaction was wishing to scream bloody murder.
Developers face many challenges with nascent IoT technology, and standards are still evolving, but this is a particularly important issue. I have personal experience with full-size industrial robot arms going astray and causing major equipment damage and bodily harm, from back in the days when I worked on cyclotrons. In that same period, I had personal experience with other things that can go wrong when a control loop is too long or badly tuned. Sending a control loop through the cloud is just asking for trouble. If you're developing apps for the Internet of Things, you'd better understand these issues and how to deal with them or risk a giant lawsuit later.
Lest you think I’m picking on Amazon, Azure IoT and other cloud IoT architectures suffer from the same flaw. The good news is that all of this can be fixed.
What the IoT can learn from robotics
To give you some historical perspective on the problem, James Clerk Maxwell, whose name might be familiar from his thought experiment called "Maxwell's Demon," famously studied the effects of lag in a centrifugal governor in 1868 — a study that laid the groundwork for control theory. Maxwell described and analyzed the phenomenon of self-oscillation, in which lags in the system (a centrifugal governor) can lead to overcompensation and unstable behavior. This is not new stuff: It's well known to anyone who has taken a course in control theory or even rubbed elbows with a controls engineer.
The sneaky illusion of Vogels' demo was that the small size of the robot arm and the irregular movement of the handheld controller device were hiding the overcompensation and unstable behavior that would have been obvious — and dangerous — in a full-scale robot arm operating at speed. So let's think about the optimal design of such a system.
In a real industrial robot, the control loop never goes outside the room where the robot operates. You might download a program from your computer or the cloud to the robot's local controller that connects to the robot by way of a shielded cable, but that's no more than a series of desired points in phase space: The robot's local controller is responsible for stripping out any unreasonable points and excessive speeds, and following the translations, rotations, grips, and welds described in accordance with its own safety constraints and limits, as well as any external limit switches installed in the room. The principle here is that you've pushed the control loop down to the lowest possible level.
That could be done in the AWS system, with the addition of some local closed-loop controllers. For example, at the time I was introduced to the industrial robot, I had implemented computer control of a cyclotron. The computer, a Digital Equipment Corp. PDP-11 minicomputer, was in the same room as the cyclotron's control panel, but despite my youthful eagerness to tear out the manual controls and hook the cyclotron up to the computer directly, saner heads prevailed.
For one thing, it was a high-power isotope production cyclotron, delivering something like 500 mA of 30 MeV protons to a target: about 15 megawatts. Second, it was one of only four cyclotrons producing all the thallium 201 for the company, and radioactive thallium (for cardiac scans) was the company's most profitable product. So no, I couldn't hook the cyclotron up directly to the computer for control — only for monitoring.
To be safe, the computer control had to fail gracefully and yield to manual control at the flick of a switch, or even faster. What we built was essentially a hierarchical control system. We put the stepping motors for the computer controls behind the normal manual control potentiometers, on the same shafts, with electromagnetic clutches in between. The operator could grab the dial if he thought the computer was messing it up, and the clutch would slip. He could flip the switch powering the clutch, and the computer wouldn't be able to affect the shaft at all. And the stepping motors had their own local controllers, run by 6502-based single-board computers, so that when the cyclotron was in computer-controlled mode and the PDP-11 rebooted, the 6502 held the desired state without the possibility of disturbance.
Better device control for the cloud
All of these tactics could be applied just as easily to an IoT control scheme with top-level software running in the cloud as to one where the top-level software is running on a computer across the room. But the tactics are even more important when your software is in the cloud, because there are many more potential points of failure in between the "plant" (the thing being controlled) and the controller.
Many smart, cloud-connected devices meet the criteria for conservative control design. One is the soon-to-be-released Yale Linus lock. This battery-powered keypad lock connects to a Google Nest controller locally over a low-power protocol. The Nest controller in turn communicates with the Nest cloud, and you can control any Nest device in your house with your smartphone running the Nest app.
The convenient parts of this are that you don't need a key; you can open the door with the keypad even if the house power is out or the Nest controller can't connect to the Internet; you can track who comes and goes; and you can lock and unlock the door with your phone.
There's just one thing wrong with this design: If the batteries fail, how do you get in the door? I'm sure there's a low-battery alarm, but what if that happens when the occupants are away on vacation? One would hope that the standard installation advises that the house have a back door that opens to a key.
I wish that IoT vendors had a better understanding of controls theory and Internet security. If they don't get up to speed soon, exploits such as the hack of a Jeep Cherokee will be just the beginning.