Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong checksums for Common Voice Corpus 13.0 #21

Open
paniedziela opened this issue Mar 31, 2023 · 1 comment
Open

Wrong checksums for Common Voice Corpus 13.0 #21

paniedziela opened this issue Mar 31, 2023 · 1 comment

Comments

@paniedziela
Copy link

Hello, I usually verify checksum after download and till Common Voice Corpus 12.0 it worked with no problem, but now (Common Voice Corpus 13.0) I suspect they are wrong, because I have no issues with download, but the checksums don't match, I don't have resources nor time to check more datasets, but I can provide a few (I suppose all checksums from this version are calculated wrong):

  • CV13-German-> wrong checksum (can't provide the values)
DATASET -> PROVIDED_CHECKSUM -> REAL_CHECKSUM
CV13-Icelandic -> 48db6e809f5b6eb0c00b077e6b736aeeee5d544ee3f2fdd059244da88926c040 -> 33e4c68fe2b4501f358a4762487f9c2b9d8c509a304a3288d31fd24ba6e3c451
CV13-Danish -> 6c85261bcf8dffe5c06ad29c82760cda5cd1fdc7d9c1c99b6285a425f11d105e -> 5b39bb325b76043a57b8735621dd6c8b68b615d49903c18bfa9cb4b783df01af
CV-13-Occitan -> e241c12159ac7b3d880f41d5e91d804775da188a3ac413c775341eef3406001b -> 59480c122de507e4f8ce94120a726ea042040c0750ceaf16d9d708c485a6288d
@HarikalarKutusu
Copy link

There was a similar report on delta segments here:

https://discourse.mozilla.org/t/sha256-checksum-seems-to-be-wrong-for-common-voice-delta-segment-xx-x-and-what-is-delta-segment/111765/5

A possible suspect: The original release of v13.0 lacked the default splits, but they were produced to get the records here. But you downloaded these datasets more than a couple of days ago. A couple of days ago, they are included. But this time, the "reported.tsv" files are taken from up-to-date ones. So old DL or new, they should be different than the records here.

The only way to correct them is to recalculate and update the values here I think...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants