Can we have speech_segments or even silence_segments in seconds in TranscriptionInfo? #481

guilhermehge · 2023-09-20T16:48:10Z

guilhermehge
Sep 20, 2023

Since we already have it as a DEBUG log, I believe it would be a great addition to the TranscriptionInfo so we can access it easily. I believe that the best way to have that is by dividing the segments by the sampling rate, so that we have it in the seconds.

For silence_segments, we would need to build a function that iterate over those segments and returns what is not speech.

What do you guys think? I believe that the speech segments is of easy implementation.

toanhuynhnguyen · 2024-09-15T03:14:14Z

toanhuynhnguyen
Sep 15, 2024

When calling the function model.transcribe, you can use these arguments:

vad_filter=True,
vad_parameters={
    "threshold": 0.2,
    "min_speech_duration_ms": 10,
    "min_silence_duration_ms": 1000,
    "speech_pad_ms": 400,
},

You'll be able to tune these arguments on your demand.

0 replies

guilhermehge · 2024-10-24T10:14:33Z

guilhermehge
Oct 24, 2024
Author

This idea is not about tuning the parameters, it is about having access to what the model did afterwards. The VAD model can output the silence segments it detected with keys "start" and "end".

For example:
If we have silence segments being an output of the transcriptions, we'd have access to a list of dicts:
Silence segment 1:
{[
"start": 0.455,
"end": 2.113
},{
"start": 3.443,
"end": 4.332
}
]

With this I could calculate the amount of silence and even get the opposite of it, that is, speech segments.

2 replies

MahmoudAshraf97 Oct 25, 2024
Maintainer

the TranscriptionInfo class already has duration_after_vad

guilhermehge Oct 27, 2024
Author

Yes, but that's just a full number that we can subtract from the full duration, the information that we can get from that is a simple "30s of silence" for example. What I meant is having each segment that was speech, with timestamps and all, so I can know that at time from x to y, it was silent. This information is already available, but not accessible, since it is not added to the TranscriptionInfo class.

It is what I proposed in this pull request that was closed.

#487

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we have speech_segments or even silence_segments in seconds in TranscriptionInfo? #481

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Can we have speech_segments or even silence_segments in seconds in TranscriptionInfo? #481

guilhermehge Sep 20, 2023

Replies: 2 comments · 2 replies

toanhuynhnguyen Sep 15, 2024

guilhermehge Oct 24, 2024 Author

MahmoudAshraf97 Oct 25, 2024 Maintainer

guilhermehge Oct 27, 2024 Author

guilhermehge
Sep 20, 2023

Replies: 2 comments 2 replies

toanhuynhnguyen
Sep 15, 2024

guilhermehge
Oct 24, 2024
Author

MahmoudAshraf97 Oct 25, 2024
Maintainer

guilhermehge Oct 27, 2024
Author