Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to rebuild a language model and why it is necessary #1667

Open
Sasaki303 opened this issue Nov 25, 2024 · 2 comments
Open

How to rebuild a language model and why it is necessary #1667

Sasaki303 opened this issue Nov 25, 2024 · 2 comments

Comments

@Sasaki303
Copy link

We would like to add vocabulary to our small Japanese model to be able to recognize specific names, but we are having trouble creating a successful custom dictionary.
First, I added the words I want to register to words.txt. (For example, for “nasas”, I added “nasas 400000”)
Then I added the vocabulary I want to add as text.txt. (For example, for “nasas”, I added “nasas n_B a_I s_I a_I s_I u_E”)
In phones.txt, we added “...,m_B 115,m_E 116,m_I 117,...” and is registered as “m_B 115,m_E 116,m_I 117,...”.
So, if I run “farcompilestrings --fst_type=compact --symbols=words.txt --keep_symbols text.txt > text.far” in WSL,
ERROR: ConvertSymbolToLabel: Symbol “n_B” is not mapped to any integer label, symbol table = words.txt
FATAL: FarCompileStrings: Compiling string number 1 in file text.txt failed with token_type = symbol and entry_type = line”
is displayed.
Am I doing something wrong to begin with? Can you please tell me how best to add vocabulary to the Japanese model?
If I want to add a specific person's name that is not in the language model, do I need to rebuild the model?
Please let us know.
Thank you in advance.
(This message is using DeepL translation)

@nshmyrev
Copy link
Collaborator

You need to request update package from us, mail [email protected], we can add it for you. See also https://alphacephei.com/vosk/lm

@Sasaki303
Copy link
Author

Since I was going to deal with personal information, I was thinking of creating a local dictionary to recognize it, but is that too demanding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants