Can we have speech_segments or even silence_segments in seconds in TranscriptionInfo? #481
Replies: 2 comments 2 replies
-
When calling the function model.transcribe, you can use these arguments: vad_filter=True,
vad_parameters={
"threshold": 0.2,
"min_speech_duration_ms": 10,
"min_silence_duration_ms": 1000,
"speech_pad_ms": 400,
}, You'll be able to tune these arguments on your demand. |
Beta Was this translation helpful? Give feedback.
-
This idea is not about tuning the parameters, it is about having access to what the model did afterwards. The VAD model can output the silence segments it detected with keys "start" and "end". For example: With this I could calculate the amount of silence and even get the opposite of it, that is, speech segments. |
Beta Was this translation helpful? Give feedback.
-
Since we already have it as a DEBUG log, I believe it would be a great addition to the TranscriptionInfo so we can access it easily. I believe that the best way to have that is by dividing the segments by the sampling rate, so that we have it in the seconds.
For silence_segments, we would need to build a function that iterate over those segments and returns what is not speech.
What do you guys think? I believe that the speech segments is of easy implementation.
Beta Was this translation helpful? Give feedback.
All reactions