-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Android: Allow switching the voice typing library to Whisper #11158
base: dev
Are you sure you want to change the base?
Android: Allow switching the voice typing library to Whisper #11158
Conversation
The React Native ONNX library was failing to load Whisper (reporting an incorrect sequence_length). Similar code that uses the Kotlin/Java ONNX library works fine.
provide that to Whisper at each time step, rather than only the latest 5-10s
// Needed for Whisper speech-to-text | ||
implementation 'com.microsoft.onnxruntime:onnxruntime-android:latest.release' | ||
implementation 'com.microsoft.onnxruntime:onnxruntime-extensions-android:latest.release' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The implementation uses the Kotlin/Java bindings to ONNX. Originally, I tried recording audio in the correct format with Kotlin, sending the recorded data to React Native, then starting Whisper using the ONNX React Native bindings (implementation). However, this was failing with an error similar to the following:
Here, the input sequence was the raw audio data being provided to ONNX.
max_length > sequence_length was false. max_length (448) shall be greater than input sequence length (80000)
max_length
is the maximum output sequence length.- Using the React Native ONNX bindings also requires transferring the audio data from the audio recorder to React Native, then from React Native to ONNX, which adds additional overhead.
|
let urlTemplate = rtrimSlashes(Setting.value('voiceTypingBaseUrl').trim()); | ||
|
||
if (!urlTemplate) { | ||
urlTemplate = 'https://github.com/personalizedrefrigerator/joplin-voice-typing-test/releases/download/test-release/{task}.zip'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm currently hosting the built Whisper models on this GitHub repository under my personal account. It may make sense to host these files elsewhere (e.g. under a GitHub repo in the Joplin organization).
Summary
Important
I'm currently hosting the built Whisper models on this GitHub repository under my personal account. It may make sense to host these files elsewhere.
This pull request allows switching the voice typing library to Whisper (
whisper-tiny
) as an alternative to Vosk. For now, this sets Whisper to the default voice typing library and lets the user switch back to Vosk in settings > note.Unlike Vosk,
whisper-tiny
supports punctuation in the generated text.This pull request uses ONNX runtime (https://github.com/microsoft/onnxruntime) to run Whisper on Android (model release and build instructions). ONNX runtime is a general-purpose runtime for trained machine learning models.
This pull request uses
whisper-tiny
rather thanwhisper-small
orwhisper-large
. On my Android tablet,whisper-small
seems to crash the application (out of memory?).Screen recordings
Whisper
whisper.webm
In the above screen recording:
Vosk
vosk.webm
In the above screen recording:
Size & performance notes
CPU usage comparison
Whisper uses significantly more CPU than Vosk. On my tablet, Whisper uses roughly 50% CPU while converting speech to text, and currently does this every few seconds. How often can be adjusted. By comparison, Vosk uses about 10% CPU.
Whisper CPU usage (Android/debug mode):
Vosk CPU usage (Android/debug mode):
Note: The graphs above are for different audio input.
Download size
whisper-tiny
has a download size of 47.1 MB (zipped) and uses a single model for all languages. This is slightly larger than Vosk's 41.2 download size for English.Note: Currently, the "downloading" message still says "Downloading [locale] language files"
APK size increase
ONNX runtime adds
libonnxruntime.so
andlibortextensions.so
for all output platforms.libonnxruntime.so
:libortextensions.so
:Joplin builds a single APK that targets all four platforms. With the current build output, this corresponds to roughly a 75 MB increase in size.
Removing
react-native-vosk
causes the APK size to decrease from 183.3 MB to 146.2 MB. This, however, is still significantly larger than the previous APK size of 100 MB.Note
If I enable per-ABI builds (diff), each per-ABI APK is still smaller than the current APK size of about 100 MB:
Demo APK
https://github.com/personalizedrefrigerator/joplin/releases/tag/whisper-typing-0.0.2