You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 4, 2023. It is now read-only.
Describe the bug
Markdown formatting will in most cases not survive the translation process, either being mangled, closed improperly, mapped to other characters or outright omitted in the final translation.
Related issues #486 (Requires additional training to fix/implement)
To Reproduce
Steps to reproduce the behavior:
Select any model, behavior will vary depending on the model used
Translate any piece of text containing markdown formatting
Example piece of markdown:
---
# Markdown test
This is a *test* to see how **well** Bergamot _handles_ the [Markdown](https://www.markdownguide.org/) syntax.
1. The **bergamot orange**, is a fragrant citrus fruit the size of an orange
2. Has a *yellow* or *green* color similar to a lime, depending on ripeness
- The word bergamot is derived from the Italian word _bergamotto_
- It is a small tree that blossoms during the winter
\```js
variable = 10
if (variable == "10")
variable = "10" + 1
\```
> “Beware of bugs in the above code; I have only proved it correct, not tried it.”
> — Donald E. Knuth.
---
Expected behavior
Markdown formatting should survive the translation process.
Actual behavior
French:
# turns into -
** disappears or gets changed into a single quote
Dutch:
--- turns into -- ---
# turns into •
Numberings gets repeated
Additional context
This would be a nice to have, though I do realize that some parts of the syntax will never be able to be translated in a proper manner (i.e. codeblocks and quotes).
Improving the markdown handling would probably entail randomly adding markdown syntax to words (similar to what was mentioned in #486), admittedly, I have no experience in this field, and it might not be feasible to re-train the models for this small use case.
Apologies if this BR is misplaced, please let me know if I should move this issue to the students training repo.
The text was updated successfully, but these errors were encountered:
Hi @Fevol , filing this in the students repo definitely would help bringing the issue to the attention of the maintainers there, but I'll leave this open here for reference.
I'm guessing the issue is more related to browser/students than the Bergamot engine itself. Post/preprocessing the translation will probably not be able to fix the issue. I've also seen that there is a Firefox specific repo for model training? I'm not entirely sure if that repository is more suitable for the issue, the two seem closely related.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Describe the bug
Markdown formatting will in most cases not survive the translation process, either being mangled, closed improperly, mapped to other characters or outright omitted in the final translation.
Related issues
#486 (Requires additional training to fix/implement)
To Reproduce
Steps to reproduce the behavior:
Example piece of markdown:
Expected behavior
Markdown formatting should survive the translation process.
Actual behavior
French:
#
turns into-
**
disappears or gets changed into a single quoteDutch:
---
turns into-- ---
#
turns into•
Additional context
This would be a nice to have, though I do realize that some parts of the syntax will never be able to be translated in a proper manner (i.e. codeblocks and quotes).
Improving the markdown handling would probably entail randomly adding markdown syntax to words (similar to what was mentioned in #486), admittedly, I have no experience in this field, and it might not be feasible to re-train the models for this small use case.
Apologies if this BR is misplaced, please let me know if I should move this issue to the students training repo.
The text was updated successfully, but these errors were encountered: