Posted by Terence Zhang – Developer Relations Engineer and Kristi Bradford – Product Supervisor
Google Pixel’s Recorder app permits folks to report, transcribe, save, and share audio. To make it simpler for customers to handle and revisit their recordings, Recorder’s builders turned to Gemini Nano, a strong on-device massive language mannequin (LLM). This integration introduces an AI-powered audio summarization characteristic to assist customers extra simply discover the appropriate recordings and rapidly grasp key factors.
Earlier this month, Gemini Nano obtained an influence enhance with the introduction of the brand new Gemini Nano with Multimodality mannequin. The Recorder app is already leveraging this improve to summarize longer voice recordings, with improved processing for grammar and nuance.
Assembly consumer wants with on-device AI
Recorder builders initially experimented with a cloud-based resolution, reaching spectacular ranges of efficiency and high quality. Nonetheless, to prioritize accessibility and privateness for his or her customers, they sought an on-device resolution. The event of Gemini Nano introduced an ideal alternative to construct the concise audio summaries customers have been in search of, all whereas holding knowledge processing on the system.
Gemini Nano is Google’s most effective mannequin for on-device duties. “Having the LLM on-device is useful to customers as a result of it offers them with extra privateness, much less latency, and it really works wherever they want since there’s no web required,” stated Kristi Bradford, the product supervisor for Pixel’s important apps.
To attain higher outcomes, Recorder additionally fine-tuned the mannequin utilizing knowledge that matches its use case. That is executed utilizing low order rank adaptation (LoRA), which permits Gemini Nano to constantly output three-bullet level descriptions of the transcript that embrace any speaker names, key takeaways, and themes.
AICore, an Android system service that centralizes runtime, supply, and demanding security parts for LLMs, considerably streamlined Recorder’s adoption of Gemini Nano. The supply of a developer SDK for working GenAI workloads allowed the staff to construct the transcription abstract characteristic in simply 4 months, with solely 4 builders. This effectivity was achieved by eliminating the necessity for sustaining in-house fashions.
Since its launch, Recorder customers have been utilizing the brand new AI-powered summarization characteristic averaging 2 to five instances day by day, and the variety of total saved recordings elevated by 24%. This characteristic has contributed to a big enhance in app engagement and consumer retention total. The Recorder staff additionally famous that suggestions concerning the new characteristic has been constructive, with many customers citing the time the brand new AI-powered summarization characteristic saves them.
The following huge evolution: Gemini Nano with multimodality
Recorder builders additionally carried out the newest Gemini Nano mannequin, often called Gemini Nano with multimodality, to additional enhance its summarization characteristic on Pixel 9 units. The brand new mannequin is considerably bigger than the earlier one on Pixel 8 units, and it’s extra succesful, correct, and scalable. The brand new mannequin additionally has expanded token help that lets Recorder summarize for much longer transcripts than earlier than. Gemini Nano with multimodality is at present solely obtainable on Pixel 9 units.
Integrating Gemini Nano with multimodality required one other spherical of fine-tuning. Nonetheless, Recorder builders have been in a position to make use of the unique Gemini Nano mannequin’s fine-tuning dataset as a basis, streamlining the event course of.
To totally leverage the brand new mannequin’s capabilities, Recorder builders expanded their dataset with help for longer voice recordings, carried out refined analysis strategies, and established launch standards metrics targeted on grammar and nuance. The inclusion of grammar as a brand new metric for assessing inference high quality was made doable solely by the improved capabilities of Gemini Nano with Multimodality.
Doing extra with on-device AI
“Given the novelty of GenAI, the entire staff had enjoyable studying how one can use it,” stated Kristi. “Now, we’re empowered to push the boundaries of what we will accomplish whereas assembly rising consumer wants and alternatives. It’s really introduced a brand new degree of creativity to problem-solving and experimentation. We’ve already demoed not less than two extra GenAI options that assist folks get time again internally for early suggestions, and we’re excited concerning the potentialities forward.”
Get began
Study extra about how one can convey the advantages of on-device AI with Gemini Nano to your apps.