Machine learning (ML) and artificial intelligence (AI) now permeate nearly every feature on the iPhone, but Apple hasn’t been touting these technologies like some of its competitors have.
Machine learning (ML) and artificial intelligence (AI) now permeate nearly every feature on the iPhone, but Apple hasn’t been touting these technologies like some of its competitors have. I wanted to understand more about Apple’s approach , so I spent an hour talking with two Apple executives about the company’s strategy–and the privacy implications of all the new features based on AI and ML.
Historically, Apple has not had a public reputation for leading in this area. That’s partially because people associate AI with digital assistants, and reviewers frequently call Siri less useful than Google Assistant or Amazon Alexa. And with ML, many tech enthusiasts say that more data means better models–but Apple is not known for data collection in the same way as, say, Google.
Despite this, Apple has included dedicated hardware for machine learning tasks in most of the devices it ships. Machine intelligence-driven functionality increasingly dominates the keynotes where Apple executives take the stage to introduce new features for iPhones, iPads, or the Apple Watch. The introduction of Macs with Apple silicon later this year will bring many of the same machine intelligence developments to the company’s laptops and desktops, too.
In the wake of the Apple silicon announcement, I spoke at length with John Giannandrea, Apple’s Senior Vice President for Machine Learning and AI Strategy, as well as with Bob Borchers, VP of Product Marketing. They described Apple’s AI philosophy, explained how machine learning drives certain features, and argued passionately for Apple’s on-device AI/ML strategy.
What is Apple’s AI strategy?
Both Giannandrea and Borchers joined Apple in the past couple of years; each previously worked at Google. Borchers actually rejoined Apple after time away; he was a senior director of marketing for the iPhone until 2009. And Giannandrea’s defection from Google to Apple in 2018 was widely reported; he had been Google’s head of AI and search.
Google and Apple are quite different companies. Google has a reputation for participating in, and in some cases leading, the AI research community, whereas Apple used to do most of its work behind closed doors. That has changed in recent years, as machine learning powers numerous features in Apple’s devices and Apple has increased its engagement with the AI community.
“When I joined Apple, I was already an iPad user, and I loved the Pencil,” Giannandrea (who goes by “J.G.” to colleagues) told me. “So, I would track down the software teams and I would say, ‘Okay, where’s the machine learning team that’s working on handwriting?’ And I couldn’t find it.” It turned out the team he was looking for didn’t exist–a surprise, he said, given that machine learning is one of the best tools available for the feature today.
“I knew that there was so much machine learning that Apple should do that it was surprising that not everything was actually being done. And that has changed dramatically in the last two to three years,” he said. “I really honestly think there’s not a corner of iOS or Apple experiences that will not be transformed by machine learning over the coming few years.”
I asked Giannandrea why he felt Apple was the right place for him. His answer doubled as a succinct summary of the company’s AI strategy:
I think that Apple has always stood for that intersection of creativity and technology. And I think that when you’re thinking about building smart experiences, having vertical integration, all the way down from the applications, to the frameworks, to the silicon, is really essential… I think it’s a journey, and I think that this is the future of the computing devices that we have, is that they be smart, and that, that smart sort of disappear.
Borchers chimed in too, adding, “This is clearly our approach, with everything that we do, which is, ‘Let’s focus on what the benefit is, not how you got there.’ And in the best cases, it becomes automagic. It disappears… and you just focus on what happened, as opposed to how it happened.”
Speaking again of the handwriting example, Giannandrea made the case that Apple is best positioned to “lead the industry” in building machine intelligence-driven features and products:
We made the Pencil, we made the iPad, we made the software for both. It’s just unique opportunities to do a really, really good job. What are we doing a really, really good job at? Letting somebody take notes and be productive with their creative thoughts on digital paper. What I’m interested in is seeing these experiences be used at scale in the world.
He contrasted this with Google. “Google is an amazing company, and there’s some really great technologists working there,” he said. “But fundamentally, their business model is different and they’re not known for shipping consumer experiences that are used by hundreds of millions of people.”
How does Apple use machine learning today?
Apple has made a habit of crediting machine learning with improving some features in the iPhone, Apple Watch, or iPad in its recent marketing presentations, but it rarely goes into much detail–and most people who buy an iPhone never watched those presentations, anyway. Contrast this with Google, for example, which places AI at the center of much of its messaging to consumers.
While computers can process certain data more quickly or accurately than humans can, they are still ultimately not intelligent. Traditional models of computer programming involve telling the computer what to do at all times, and in advance; if precisely this happens, then do exactly this. But what if something else happens–even a minor variation? Well, programmers can get quite creative and elaborate to define sophisticated behaviors, but the machine is incapable of making judgments of its own.
With machine learning, in addition to telling a computer what to do, programmers give it a data set relevant to the task and a methodology for analyzing that data set. They then give it time to spin its cycles getting more accurate at labeling or interpreting that data over time, based on positive or negative feedback. This allows the machine to algorithmically make informed guesses about data it hasn’t previously encountered, if the new data is similar to that with which it was trained.
When big tech companies talk about artificial intelligence today, they often mean machine learning. Machine learning is a subset of AI. Many lauded gadget features–like image recognition–are driven by a subset of machine learning called “deep” learning.
There are numerous examples of machine learning being used in Apple’s software and devices, most of them new in just the past couple of years.
Machine learning is used to help the iPad’s software distinguish between a user accidentally pressing their palm against the screen while drawing with the Apple Pencil, and an intentional press meant to provide an input. It’s used to monitor users’ usage habits to optimize device battery life and charging, both to improve the time users can spend between charges and to protect the battery’s longterm viability. It’s used to make app recommendations.
Then there’s Siri, which is perhaps the one thing any iPhone user would immediately perceive as artificial intelligence. Machine learning drives several aspects of Siri, from speech recognition to attempts by Siri to offer useful answers.
Savvy iPhone owners might also notice that machine learning is behind the Photos app’s ability to automatically sort pictures into pre-made galleries, or to accurately give you photos of a friend named Jane when her name is entered into the app’s search field.
In other cases, few users may realize that machine learning is at work. For example, your iPhone may take multiple pictures in rapid succession each time you tap the shutter button. An ML-trained algorithm then analyzes each image and can composite what it deems the best parts of each image into one result.
Phones have long included image signal processors (ISP) for improving the quality of photos digitally and in real time, but Apple accelerated the process in 2018 by making the ISP in the iPhone work closely with the Neural Engine, the company’s recently added machine learning-focused processor.
I asked Giannandrea to name some of the ways that Apple uses machine learning in its recent software and products. He gave a laundry list of examples:
There’s a whole bunch of new experiences that are powered by machine learning. And these are things like language translation, or on-device dictation, or our new features around health, like sleep and hand washing, and stuff we’ve released in the past around heart health and things like this. I think there are increasingly fewer and fewer places in iOS where we’re not using machine learning.
It’s hard to find a part of the experience where you’re not doing some predictive [work]. Like, app predictions, or keyboard predictions, or modern smartphone cameras do a ton of machine learning behind the scenes to figure out what they call “saliency,” which is like, what’s the most important part of the picture? Or, if you imagine doing blurring of the background, you’re doing portrait mode.
All of these things benefit from the core machine learning features that are built into the core Apple platform. So, it’s almost like, “Find me something where we’re not using machine learning.”
Borchers also pointed out accessibility features as important examples. “They are fundamentally made available and possible because of this,” he said. “Things like the sound detection capability, which is game-changing for that particular community, is possible because of the investments over time and the capabilities that are built in.”
Further, you may have noticed Apple’s software and hardware updates over the past couple of years have emphasized augmented reality features. Most of those features are made possible thanks to machine learning. Per Giannandrea:
Machine learning is used a lot in augmented reality. The hard problem there is what’s called SLAM, so Simultaneous Localization And Mapping. So, trying to understand if you have an iPad with a lidar scanner on it and you’re moving around, what does it see? And building up a 3D model of what it’s actually seeing.
That today uses deep learning and you need to be able to do it on-device because you want to be able to do it in real time. It wouldn’t make sense if you’re waving your iPad around and then perhaps having to do that at the data center. So in general I would say the way I think about this is that deep learning in particular is giving us the ability to go from raw data to semantics about that data.
Increasingly, Apple performs machine learning tasks locally on the device, on hardware like the Apple Neural Engine (ANE) or on the company’s custom-designed GPUs (graphics processing units). Giannandrea and Borchers argued that this approach is what makes Apple’s strategy distinct amongst competitors.
Why do it on the device?
Both Giannandrea and Borchers made an impassioned case in our conversation that the features we just went over are possible because of–not in spite of–the fact that all the work is done locally on the device.
There’s a common narrative that boils machine learning down to the idea that more data means better models, which in turn means better user experiences and products. It’s one of the reasons why onlookers often point to Google, Amazon, or Facebook as likely rulers of the AI roost; those companies operate massive data collection engines, in part because they operate and have total visibility into what has become key digital infrastructure for much of the world. By that measure, Apple is deemed by some unlikely to perform as well, because its business model is different and it has publicly committed to limit its data collection.
When I presented these perspectives to Giannandrea, he didn’t hold back:
Yes, I understand this perception of bigger models in data centers somehow are more accurate, but it’s actually wrong. It’s actually technically wrong. It’s better to run the model close to the data, rather than moving the data around. And whether that’s location data–like what are you doing– [or] exercise data–what’s the accelerometer doing in your phone–it’s just better to be close to the source of the data, and so it’s also privacy preserving.
Borchers and Giannandrea both repeatedly made points about the privacy implications of doing this work in a data center, but Giannandrea said that local processing is also about performance.
“One of the other big things is latency,” he said. “If you’re sending something to a data center, it’s really hard to do something at frame rate. So, we have lots of apps in the app store that do stuff, like pose estimation, like figure out the person’s moving around, and identifying where their legs and their arms are, for example. That’s a high-level API that we offer. That’s only useful if you can do it at frame rate, essentially.”
He gave another consumer use case example:
You’re taking a photograph, and the moments before you take a photograph with the camera, the camera’s seeing everything in real time. It can help you make a decision about when to take a photograph. If you wanted to make that decision on the server, you’d have to send every single frame to the server to make a decision about how to take a photograph. That doesn’t make any sense, right? So, there are just lots of experiences that you would want to build that are better done at the edge device.
Asked how Apple chooses when to do something on-device, Giannandrea’s answer was straightforward: “When we can meet, or beat, the quality of what we could do on the server.”
Further, both Apple executives credited Apple’s custom silicon–specifically the Apple Neural Engine (ANE) silicon included in iPhones since the iPhone 8 and iPhone X–as a prerequisite for this on-device processing. The Neural Engine is an octa-core neural processing unit (NPU) that Apple designed to handle certain kinds of machine learning tasks.
“It’s a multi-year journey because the hardware had not been available to do this at the edge five years ago,” Giannandrea said. “The ANE design is entirely scalable. There’s a bigger ANE in an iPad than there is in a phone, than there is in an Apple Watch, but the CoreML API layer for our apps and also for developer apps is basically the same across the entire line of products.”
When Apple has talked publicly about the Neural Engine, the company has shared performance numbers, like 5 trillion operations per second in 2018’s A12 chip. But it hasn’t gotten specific about the architecture of the chip. It’s literally a black box on the slides in Apple’s presentations.
Given that, I wanted to know if Giannandrea would shed more light on how the Neural Engine works under the hood, but he declined to go into much detail. Instead, he said that app developers can glean all they need to know from CoreML–a software development API that provides developers with access to the iPhone’s machine learning capabilities.
The CoreML developer API outlines very clearly the kinds of machine learning models, runtime models that we support… We have an increasing set of kernels that we support. And you target CoreML from any of the popular machine learning things, like PyTorch or TensorFlow, and then you essentially compile down your model and then you give it to CoreML.
CoreML’s job is to figure out where to run that model. It might be that the right thing to do is run the model on ANE but it might also be the right thing to run the model on the GPU or to run the model on the CPU. And our CPU has optimizations for machine learning as well.
Throughout our conversation, both executives pointed as much to third-party developers’ apps as to Apple’s own. The strategy here isn’t just driving Apple-made services and features; it’s opening at least some of that capability up to the large community of developers. Apple has relied on developers to innovate on its platforms since the App Store first opened in 2008. The company often borrows ideas those developers came up with when updating its own, internally made apps.
Apple’s devices are not the only ones with machine learning chips built in, of course. Samsung, Huawei, and Qualcomm all include NPUs on their systems-on-a-chip, for example. And Google, too, offers machine learning APIs to developers. Still, Google’s strategy and business model are markedly different. Android phones don’t do nearly as wide an array of machine learning tasks locally.
Macs with Apple Silicon
The focus of my interview with Giannandrea and Borchers wasn’t on the big announcement the company made at WWDC just a few weeks ago–the imminent launch of Macs with Apple silicon. But when I speculated that one of Apple’s many reasons for designing Macs around its own chips might be the inclusion of the Neural Engine, Borchers said:
We will for the first time have a common platform, a silicon platform that can support what we want to do and what our developers want to do…. That capability will unlock some interesting things that we can think of, but probably more importantly will unlock lots of things for other developers as they go along.
Giannandrea gave one specific example for how Apple’s machine learning tools and hardware will be used on the Mac:
I don’t know if you saw that demo in the State of the Union, but basically the idea was: given a video, go through the video frame or frame-by-frame and do object detection. And you can do it more than an order of magnitude faster on our silicon than you could on the legacy platform.
And then, you say, “Well, that’s interesting. Well, why is that useful?” Imagine a video editor where you had a search box and you could say, “Find me the pizza on the table.” And it would just scrub to that frame… Those are the kinds of experiences that I think you will see people come up with. We very much want developers to use these frameworks and just surprise us by what they can actually do with it.
Apple said at its developer conference that it plans to ship Macs with its own silicon starting later this year.
What about privacy?
Privacy has been front-and-center in Apple’s messaging to users over the past couple of years. It’s brought up again and again in keynotes and marketing materials, there are reminders about it peppered through iOS, and it often comes up in interviews–which was also the case with this one.
“People are worried about AI writ large because they don’t know what it is,” Giannandrea told me. “They think it’s more capable than it is, or they think about this sci-fi view of AI, and you have influential people like Bill Gates and Elon Musk and others saying that this is a dangerous technology.”
He believes the hype around AI from other big tech companies is a negative, not a positive, for those companies’ marketing efforts “because people are worried about this technology.”
The term “AI” may not be helpful here. It evokes malicious synthetic villains from pop culture, like Skynet or HAL 9000. But most experts in applied artificial intelligence will tell you that this dark outcome is far from reality. Tech driven by machine learning carries many risks–inheriting and amplifying human prejudices, for example–but going rogue and violently attacking humanity doesn’t seem likely in the immediate future.
Machine learning doesn’t actually make machines intelligent in the same way that humans are. For this reason and others, many AI experts (Giannandrea included) have suggested alternative terms like “machine intelligence” that don’t draw parallels to human intelligence.
Whatever the nomenclature, machine learning can bring with it a very real and present danger: the undermining of users’ privacy. Some companies aggressively collect personal data from users and upload it to data centers, using machine learning and training as a justification.
As noted above, Apple does a lot of this collection and processing locally on the user’s device. Giannandrea explicitly tied this decision to privacy concerns. “I think that we have a very clear position on this, which is we are going to do this machine learning advanced technology in as many cases as possible on your device, and the data’s not going to leave your device,” he said. “We have a very clear statement about why we think our devices are safer or better or should be more trusted.”
He used text-to-speech as a specific example of this philosophy in action:
If you say something like, “Read me my messages from Bob.” The synthesis of the text to speech is happening on the device, on the Neural Engine–the combination of the Neural Engine and the CPU. And because of that, we never saw the content of your message from Bob because your phone is reading it out–it’s not the servers reading it out. So, the content of that message never made it to the server…
So that’s a great example of advanced technology actually improving both the user utility because the voice is being synthesized on the device, so even if you’re disconnected, it’ll still work. But also the privacy story. It’s actually really hard to do. A lot of really hard engineering went into making modern high quality [text to speech] be synthesized on a device that you can put in your pocket.
Of course, you must use some user data for machine learning in many cases. So how exactly does Apple use the user data it does handle? Giannandrea explained:
Generally speaking, we have two ways that we build models. One is where we collect and label data, which is appropriate in many, many circumstances. And then there’s the case where we ask users to donate their data. The most notable example of that would be Siri where, when you set up an iPhone, we say, “Would you like to help make Siri better?”
That’s a case where some amount of data is donated to us and then a very small percentage of that may be used for training. But many, many things we’re talking about here–like say, handwriting–we can gather enough data to train that model to work with basically everybody’s handwriting without having to use any consumer data at all.
Some of these prompts requesting to use your data have been added recently. Last summer, a report indicated that Siri was recording what users were saying after accidental activations; contractors who were tasked with quality assurance for Siri’s functionality were hearing some of those recordings.
Apple responded by committing to only storing Siri-related audio after users explicitly opted in to make Siri better by sharing recordings (this behavior was rolled out in iOS 13.2) and then brought all of the quality assurance in-house. I asked what Apple is doing differently than the contractors were with this data. Giannandrea replied:
We have a lot of safeguards. For example, there is a process to identify whether or not audio was intended for the assistant, which is completely separate from the process to actually review the audio. So we do a lot of stuff internally to make sure that we are not capturing–and then discarding, in fact–any accidental audio.
But if you’re not willing to actually QA, to your point, the feature, then you’ll never make the accidental recordings any better. As you know, machine learning requires that you continually improve it. So we actually overhauled a lot of our workflows and processes at the same time as we brought the work in-house. I’m very confident that we have one of the very best processes for improving the assistant in a privacy-preserving way.
It’s clear that Apple is looking to push privacy protections as a key feature in its devices; from Giannandrea, this came across as genuine conviction. But it could also help Apple in the marketplace, as its biggest competitor in the mobile space has a far worse track record on privacy, and that leaves an opening as users become more and more concerned about the privacy implications of AI.
Throughout our conversation, both Giannandrea and Borchers came back to two points of Apple’s strategy: 1) it’s more performant to do machine learning tasks locally, and 2) it’s more “privacy preserving”–a specific wording Giannandrea repeated a few times in our conversation–to do so.
Inside the black box
After a long track record of mostly working on AI features in the dark, Apple’s emphasis on machine learning has greatly expanded over the past few years.
The company is publishing regularly, it’s doing academic sponsorships, it has fellowships, it sponsors labs, it goes to AI/ML conferences. It recently relaunched a machine learning blog where it shares some of its research. It has also been on a hiring binge, picking up engineers and others in the machine learning space–including Giannandrea himself just two years ago.
It’s not leading the research community in the ways that Google is, but Apple makes the case that it is leading at least in bringing the fruits of machine learning to more users.
Remember when Giannandrea said he was surprised that machine learning wasn’t used for handwriting with the Pencil? He went on to see the creation of the team that made it happen. And in tandem with other teams, they moved forward with machine learning-driven handwriting–a cornerstone in iPadOS 14.
“We have a lot of amazing machine learning practitioners at Apple, and we continue to hire them,” Gianandrea said. “I find it very easy to attract world-class people to Apple because it’s becoming increasingly obvious in our products that machine learning is critical to the experiences that we want to build for users.”
After a brief pause, he added: “I guess the biggest problem I have is that many of our most ambitious products are the ones we can’t talk about and so it’s a bit of a sales challenge to tell somebody, ‘Come and work on the most ambitious thing ever but I can’t tell you what it is.'”
If big tech companies and venture capital investments are to be believed, AI and machine learning will only become more ubiquitous in the coming years. However it shakes out, Giannandrea and Borchers made one thing clear: machine learning now plays a part in much of what Apple does with its products, and many of the features consumers use daily. And with the Neural Engine coming to Macs starting this fall, machine learning’s role at Apple will likely continue to grow.