Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation split option #54

Open
johnml1135 opened this issue Nov 1, 2023 · 2 comments
Open

Validation split option #54

johnml1135 opened this issue Nov 1, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@johnml1135
Copy link
Collaborator

Machine.py should allow a validation split to be made. This should be passed in as a parameter but should default to off. We may want a Bleu score out of this - or an expectation as to how many steps we should use.

@johnml1135 johnml1135 added the enhancement New feature or request label Nov 1, 2023
@johnml1135 johnml1135 added this to Serval Nov 1, 2023
@github-project-automation github-project-automation bot moved this to 🆕 New in Serval Nov 1, 2023
@johnml1135 johnml1135 added this to the 1.2 Mother Tongue release 2 milestone Nov 1, 2023
@johnml1135 johnml1135 moved this from 🆕 New to 🔖 Ready in Serval Dec 2, 2023
@johnml1135 johnml1135 modified the milestones: Serval API 1.1, Serval API 1.2 Dec 13, 2023
@johnml1135 johnml1135 removed this from the Serval API 1.2 milestone Jan 3, 2024
@ddaspit
Copy link
Contributor

ddaspit commented Feb 6, 2024

We don't have a specific requirement for this, so I am moving it to the backlog.

@johnml1135
Copy link
Collaborator Author

While the filtering may change, I think we can make an initial stab at this. Specifically, I am assuming that:

  • We will have the same type of filter for validation as we have for training in general
  • We will be able to select a percentage or a number, but default to something (500 segments?)
  • We will fail if the number of segments for training is less than the number used for validation
  • We will do the validation split in Machine (not in python)
  • Machine.py will need to be updated to handle the validation split
  • We will need to have a data format to get the data back on the S3 bucket to machine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 🔖 Ready
Development

No branches or pull requests

4 participants