Radar is a 85 — year old technology that up until recently has not been actively deployed in human-machine interfaces. Radar — based gesture sensing allows user intent to be inferred across more contexts than optical-only based tracking currently allows.
Google’s use of Project Soli, a radar-based gesture recognition system, in the Pixel 4 series of phones is most likely the first step in further adoption of radar as an input for interacting with our devices.
Background on Project Soli
At Google I/O 2015, the ATAP (Advanced Technology and Projects) group announced several new initiatives. These included:
After going through several prototype iterations, Google integrated Soli into the Pixel 4 as part of the Motion Sense feature, which allows the phone to start the process of facial authentication before the phone’s owner even has to touch their phone.
Shortly after the announcement at I/O, ATAP put out a call for third party developers to apply to the Alpha Developer program for Project Soli, as a way to gain feedback on their early stage development kit. I filled out an application to develop music — based interactions with Soli, and was accepted into the program.
I wrote more about my experience as a member of the alpha developer program here; what I wanted to do with this blog post was provide more of an overview of the capabilities of millimeter-wave radar, and how they enable certain new experiences and experiments in the field of human-computer interaction.
There’s been several academic papers written exploring this area since Project Soli was announced exploring different application areas, so we’ll look at those; as well as a quick overview of what millimeter-wave radar is, and the types of properties afforded by it.
First though, let’s take a look at the first commercial product to use Project Soli, the Pixel 4.
The first commercial product to integrate Project Soli is the Pixel4, released by Google in October, 2019. The teaser-ad hinted that the new phone would be the first product to integrate with Soli; given the touchless air gestures shown in it:
The Soli chip affords three new types of capabilities for the Pixel 4:
Presence — thanks to radar’s ability to detect motion in the nearby area from where it’s placed, the Pixel 4 will turn off the always-on display if the user of the phone isn’t nearby while it’s placed on a table; therefore being able to both save battery power and not intrude upon the user’s attention
Reach — the Soli sensor will detect if a hand is moving towards it; waking up the screen and activating the front facing cameras for face-unlock
9 to 5 Google did an analysis of the Pokemon Wave Hello game that comes bundled with the Pixel 4 phones, and discovered a Unity plug-in in the game that connected to a “Motion Sense Bridge” application running on the phone that gave the game developers access to various gesture parameters:
Right now, third-party developers don’t have access to the MotionSense gestures unless they’ve been given access to the Android internal MotionSense Bridge app by Google. Hopefully, Google will open up full-access to the Soli sensor so developers are able to explore how they can use the gesture recognition capabilities it affords in new and innovative ways.
(The Pixel 4’s Soli sensor; from iFixit’s Pixel 4 XL Teardown https://www.ifixit.com/Teardown/Google+Pixel+4+XL+Teardown/127320)
The location of the Soli sensor on the Pixel 4 (from https://ai.googleblog.com/2020/03/Soli-radar-based-perception-and.html)
Challenges of Creating a Training a Dataset for Radar-based Gesture Recognition
In a post on the Google AI Blog, Google ATAP engineers describe some of the challenges and considerations for embedding radar into a smartphone, such as making the radar chip small and modular enough that it can fit at the top of a phone, adding filters to account for vibrational noise that occurs when music is playing from the phone, and machine learning algorithms that are able to run at a low power level.
One of the challenges with creating any robust machine learning model, especially one that will be in a device in the hands of millions of consumers, is making sure the model is able to accurately predict a gesture across a wide and diverse population of users. At the semantic level, it’s easy for humans to differentiate what a swipe or a flick gesture is. However, since each person makes those gestures in slightly different ways through variations in speed, hand angle, length of the gesture; the machine learning model for inferring which gesture occurs needs to be robust enough to be able to correctly infer the user’s gesture regardless of these differences.
To make sure their models were accurate, the Soli team trained their TensorFlow model on millions of gestures made by thousands of volunteers. These models were then optimized to run directly on the Pixel 4’s DSP unit; enabling the phone to recognize gestures even when the main processor is powered down — which is how the Pixel 4 is able to detect is someone is moving towards the phone using MotionSense, and then power on the FaceUnlock sensors to unlock the phone.
While Google developed the machine learning algorithms, signal processing, and UX patterns for interacting with Soli, German company Infineon developed the radar chip that is part of the Project Soli system. While it is possible to purchase development kits from Infineon, they only stream raw radar data — no processed signal features that could be used to train a machine learning model to recognize gestures or presence.
In their SSIGRAPH paper, Soli: Ubiquitous Gesture Sensing with Millimeter Wave Radar, the ATAP authors describe a HAL (Hardware Abstraction Layer) as a set of abstractions that would allow Project Soli to work across different radar sensor architectures from different manufacturers. This would allow Google to have the flexibility to use the same set of Soli feature primitives across various types of radar while keeping the same high-level interaction patterns.
Participants of the Soli Alpha Dev Program were encouraged to publish our work in academic publications; some members also created demos for showcase on various blogs, including:
The HCI department at the University of St. Andrews produced a robust body of work as members of the Alpha Dev program, including
Some of the projects from the Alpha Developer program were showcased in a video that was presented in the update from ATAP at the following year’s I/O event (2016):
Members of Google ATAP also published papers on their work with Project Soli:
Radar sensing is based on detecting the changing patterns of motion of an object in space. Radio waves are transmitted from the radar, bounce of a target (a human hand in motion), and then re-received by the radar’s antennas. The timed difference between when the waves are sent and when they are received is used to create a profile of the object that’s in the radar’s path.
In the case of human gestures, the hand moves its position through 3D space while in the line of sight of a radar sensor. The changes of position produce different profiles for the bounced off radar signals, allowing for different gestures to be detected.
Since radar detects gestures based upon different motion characteristics, it’s not well suited for detecting static gestures, such as sign language, or a peace sign. However, it is well suited for detecting dynamic, motion based gestures, like a finger snap, or a key-turning motion.
Unlike optical based sensors, radar’s performance isn’t dependent on lighting, can work through materials, and even detect gestures that occur when fingers may be occluding each other.
Micro-gestures can be defined as “interactions involving small amounts of motion and those that are performed primarily by muscles driving the fingers and articulating the wrist, rather than those involving larger muscle groups to avoid fatigue over time”. Some examples of these types of gestures are making the motion of pushing a button by tapping your forefinger against your thumb, making a slider motion by moving your thumb against the surface of your forefinger, and making a motion similar to turning a dial with your fingers and wrist.
These gestures could be used in a variety of contexts (IoT, AR/VR, etc) for interacting with user interface elements.
In the first published paper for Project Soli, the authors (from Google ATAP) list several possible application areas:
If all of these device types were to integrate Project Soli, Google could leverage a universal gestural framework that all of them would have in common. This would make it easy for people to quickly use these new devices, all interacting with Google’s array of services.
Ben Thompson’s article on Stratechery, “Google and Ambient Computing”, analyzes Google’s recent shift from stating they want to help organize the world’s information, to one that helps you get things done.
In his opening remarks at Made by Google 2019, Google Senior VP of Devices and Services, Rick Osterloh (who was formerly the head of Google ATAP), outlines a vision of Google as a company that wants “to bring a more helpful Google to you.” Sundar Pichai stated in the keynote of 20193 I/O that “We are moving from a company that helps you find answers to a company that helps you get things done”.
Ambient Computing was first coined by tech journalist Walt Mossberg in his final column, “The Disappearing Computer”. It’ also referred to as ubiquitous or pervasive computing.
For some additional reading on this area of computing, check out the work of Mark Weiser, a Chief Scientist at Xerox PARC, especially his 1991 Scientific American article, “The Computer for the 21st Century”. Weiser coined the term ubiquitous computing, which he described as computing able to occur using “any device, in any location, and in any format”.
Thompson makes the point that Google’s vision of ambient computing “does not compete with the smartphone, but rather leverages it”. Google isn’t trying to find whatever the next hardware platform is (like Facebook was doing with acquiring Oculus for VR, or Apple’s full-on push into AR); ratther, they are looking to create an ecosystem of ambient devices that all seamlessly connect (possibly using the smartphone as a hub?) and are intuitive to interact with; all connected to the services that Google’s providing.
Having a unified way to interact with devices that exist in a variety of contexts would be hugely beneficial to Google in furthering adoption of their ambient computing vision. A small, easily embeddable sensor that can detect gestures from people regardless of lighting or other atmospheric conditions would bring this vision much closer to reality. This would make it easier for users to engage with a wide variety of devices that would offer access to Google’s services.
With the recent release of a LiDAR capable iPad Pro in service to AR capabilities, Apple seems to be showing a willingness to put sensors of ever increasing complexity (and utility) into their products. Additionally, Apple has put up at least one posting for roles related to radar; a now inactive posting on LinkedIn for a Radar Signal Processing Engineer includes the following in its description.
It feels fair to me to say that at the very least, Apple is looking at millimeter-wave radar as a sensing modality; when, how, and most importantly; if a radar-enabled Apple product ever leaves the labs in Cupertino is one that only time will be able to tell.
My personal speculation is that Apple will release an AR headset with radar built-in for micro-gesture detection to augment their hand tracking capabilities. Additionally, as radar becomes better well known as a possible sensing modality (thanks mostly due to Project Soli, and whatever products Google and their partners decide to integrate it into), other AR and VR headset makers will begin integrating millimeter-wave radar chips into their headsets as a way to solve the “missing interface” problem mentioned earlier; making sure that the real-world physical objects that people interact with via AR/VR have a way to map to digital information that’s presented via the headset.
There is at least one startup working on millimeter-wave radar for human-machine interfaces; Taiwan’s KaiKuTek (“CoolTech”). They claim that their radar-based gesture sensing system can match, if not surpass, Google’s Project Soli.
A Machine-Learning inference chip is integrated with the radar sensor; so all the inference is done on a sensor-side compute level, unlike the Pixel 4’s MotionSense system, in which the sensor (Soli) and the inference engine are on separate chip components. This is, KaiKuTek claims, they are able to achieve such a low power (1 mW) rating.
With Project Soli, Google has advanced the conversation on how we interact with computers across a wide range of modalities and contexts. Millimeter-wave radar offers a promising way to gesturally interact with computers without having to worry about occlusion, lighting conditions, or similar limiting conditions imposed on camera-based systems.
With the increasing pace of computers being embedded in more devices, millimeter-wave radar could end up enabling a more universal gestural language that’s familiar across these devices. Of course, each manufacturer will inevitably have differences between each other (though Google is the first to use mm-wave radar as a sensor for gestural interaction, it does not mean it will be the last), it could end up affording “similar enough” gestural interactions the same way that touchscreens are almost universal, but each OEM vendor enables different gestures for use with the touch screen.
I’ve included additional publications dealing with millimeter-wave radar and its applications in HCI (not necessarily involving Project Soli). A good portion of these focus on the machine learning techniques used to enable gesture recognition with a radar pipeline.
Patents related to Project Soli:
Press Articles on the Pixel 4 Launch and Integration with Project Soli: