Technical reports revealed that Google is working to introduce a voice file analysis feature within the Gemini app on the Android system, a step that may enhance the app’s interactivity and expand its capabilities in processing multiple types of media, including recorded audio clips. In the latest beta version of the Gemini app, version 16.30.59sa.arm64, a feature was added allowing users to upload audio files within the conversation. Although this feature is not officially activated yet, its presence in the app interface indicates it is under development. After uploading the file, users see an option suggesting that AI will attempt to interact with the file’s content, but the response remains inaccurate or completely absent, indicating that backend audio processing is not yet complete.
This new feature aligns with Google’s broader AI efforts, as Gemini’s API already supports voice analysis on the web environment, including capabilities like extracting text from audio files, precise timing identification within clips, and support for formats such as MP3, WAV, and FLAC. This raises expectations that the Android app will soon adopt similar capabilities, allowing real audio files to be used as a starting point for conversation without prior transcription. Despite the file upload interface appearing, the app has not yet provided real functions to process or interpret the audio clips, sometimes ignoring them entirely or giving fabricated responses unrelated to the content, indicating that the infrastructure for this feature is not yet activated and what is visible now is likely an internal test or preliminary code added before official launch.
Since its launch, the Gemini app has shown flexibility in handling images and texts, but the lack of voice support remained a weakness compared to other AI apps. Google seems aware of this gap and is now seeking to close it by expanding the range of supported media within the app. Including voice will give users a more natural way to interact, especially in scenarios where typing is difficult or audio information is more expressive than written texts.
Recommended for you
Exhibition City Completes About 80% of Preparations for the Damascus International Fair Launch
Afghan Energy and Water Minister to Al Jazeera: We Build Dams with Our Own Funds to Combat Drought
Iron Price on Friday 15-8-2025: Ton at 40,000 EGP
Unified Admission Applications Start Tuesday with 640 Students to be Accepted in Medicine
Al-Jaghbeer: The Industrial Sector Leads Economic Growth
Love at First Sight.. Karim Abdel Aziz and Heidi: A Love That Began with a Family Gathering and 20 Years of Marriage