There’s been a lot of buzz about AI (artificial intelligence) recently, especially about generative AI products like ChatGPT, Midjourney, Stable Diffusion, Bard, DALL-E and several others, so it might be tempting to think that these tools are the only way AI is being used. Nothing could be further from the truth.
There are numerous applications for AI, many of which won’t have direct consumer applications, but will make other technologies more powerful, capable and intuitive. Today, we discuss the ways AI is being used at the network edge to open the door for other applications, particularly where the end user might not have a great deal of computing resources at their disposal.
In the predominant applications using AI today, most of the computation happens in the cloud. These apps are usually built by connecting big data sources to the AI algorithm, training it and then making predictions — a process known in the industry as “inference.”
Because most of these tools are using cloud computing resources, the most well-known AI tools are being offered by large companies. These companies have the resources (both technological and financial) to develop a large GPU and train a learning model on billions of parameters. Smaller companies simply don’t have the same capabilities.
This is changing. Some AI capabilities are beginning to move toward the edge of the network. This is happening because even the large companies are realizing it isn’t scalable to do all the computing on the cloud side, in part because that computing also generates a lot of data traffic.
The associated transmission costs are high, as all AI data gets transmitted back and forth from the user to the cloud. More data being transmitted over longer distances also increases security risks and their associated privacy concerns — two factors which are increasingly important in several use cases.
Putting AI capability at the edge would mean that machine learning computations would generally happen on the mobile device itself or on a nearby local server instead of in a remote cloud data center.
Tradeoffs: Latency and Speed vs Accuracy
Latency is a primary concern for many applications, and one where edge AI can have a big impact. Let’s say you are developing a navigation application for an autonomous car. You can’t rely on cloud AI to deliver navigational data, because round-trip times for the large data set will be too long. At highway speeds, every millisecond matters. The edge brings computational power closer to the user, and reduces latency to manageable, near-real-time levels.
Some compromises in accuracy can come into play when using edge AI. Application developers can decide if they can benefit from edge AI by looking at accuracy requirements versus speed requirements. For a virtual assistant, for example, speed is often more important than accuracy. It’s okay for Alexa or Siri to make the occasional mistake if their responses happen in real-time, which is what makes them feel more human and usable.
An application with less tolerance for inaccuracy may be able to accept the slightly longer response times of requests into the cloud. While an autonomous car could technically be considered a mobile device, it’s certainly larger than a handheld one, and can have sufficient onboard computing power to help counteract reductions in accuracy.
As devices have gotten smaller and people have come to rely more heavily on them in a larger proportion of their daily lives, a shift has happened. When app-based smartphones first arrived around 2007, most of the apps relied solely on computing resources that existed on the device. Around 2010, the apps grew more complex and started to rely on cloud resources with just a thin layer of resources on the device itself.
Earlier AI technologies could be run on these older smartphones, but they weren’t very powerful. In 2017, something big happened: Apple and Huawei released the first deep-learning chip dedicated to mobile devices, marking the first time modern AI was possible on a mobile device.
Since 2017, AI on mobile devices has been gaining momentum, as has AI on low-power computers like IoT (Internet of Things) microcontrollers and Raspberry PI computers. These low power computers can now deploy AI models, which further improves the ability of AI to happen at the edge in a range of device types.
New Models, New Approaches
For AI to work on smaller, less powerful devices, new technological approaches have come into existence. We will look at three of them here.
The first method is pruning. Large deep-learning models have lots of parameters and lots of ways they can make decisions. These can be envisioned almost like neural pathways in the brain, or branches on a tree. As the model targets a specific problem, some connections in the neural network are not useful for the scenario or inquiry.
Pruning allows the AI/machine learning model to disregard irrelevant pathways or branches on the “tree” that aren’t useful in the given scenario. In this way, the size of the model — along with the required resources — is dynamically reduced without sacrificing accuracy. This allows the model to be run on smaller, less powerful devices, including mobile devices.
Another method is composition. Typical machine learning algorithms use floating point values at 32 bits, or sometimes 64 bits. Floating point calculation is very accurate but also very compute-intensive due to the complex computations that need to be conducted on every bit of data.
The composition method takes these 32-bit floating values and reduces them each to a single bit. The computational costs are slashed because the number of bits to process is only 1/32 . In this method, accuracy drops somewhat, but is typically within acceptable limits.
A third method, the most recently developed, is knowledge distillation. It takes the essential parts of the task and works on just those. It is similar to pruning, but unlike pruning, knowledge distillation doesn’t need to look at every node or connection. It is trained on a specific domain so that its overall need to process is more narrowly defined. As a result, the model itself is much smaller and more resource efficient.
These methods and others are making it possible to run AI on edge devices. In the coming years, as AI continues to proliferate, novel approaches will enable numerous applications across business and consumer technology sectors. Watch this space, because I’ll be writing another article that talks about some of the applications for edge AI you may see in the not-too-distant future.