-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat 1078 modelschema #1109
Feat 1078 modelschema #1109
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this structure is too specific and nested.
More generally, how do others describe models? We should talk to Laura.
name: str = Field(..., title="Name") | ||
developer_full_name: Optional[str] = Field(default=None, title="Name of developer") | ||
developer_institution: Optional[Organization.ONE_OF] = Field(default=None, title="Institute where developed") | ||
modality: Modality.ONE_OF = Field(..., title="Modality") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be a list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we want a more generic version of modality as well, especially for models like an LLM that is not taking experimental data specifically as input (ie text, image, video etc). Not sure what an alternative name for that would be though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is meant to include all models, I agree with Tom that there won't always be training data. In 'task trained' RNNs, the model is trained on a supervised target output. I imagine some models might be 'pre-trained' in this setup and then trained on data at a later stage of training.
src/aind_data_schema/core/model.py
Outdated
std: Decimal = Field(..., title="Standard deviation") | ||
|
||
|
||
class PerformanceScore(AindModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these only apply to supervised models?
value: Any = Field(..., title="Metric value") | ||
|
||
|
||
class ModelEvaluation(AindModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be clearer to have a distinct ModelTraining as a subclass of ModelEvaluation. This would add a cross_validation_method (instead of validation_folds) along with a list of training_parameters and anything else we want to highlight (perhaps a specific field for data augmentation if present?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, now I'm wondering if these should both subclass DataProcess, with ModelEvaluation just adding the performance field? It's a bit off conceptually in the case of unsupervised methods, but I think we're already considering simulation as within the scope of DataProcess so maybe no different from that.
name: str = Field(..., title="Name") | ||
developer_full_name: Optional[str] = Field(default=None, title="Name of developer") | ||
developer_institution: Optional[Organization.ONE_OF] = Field(default=None, title="Institute where developed") | ||
modality: Modality.ONE_OF = Field(..., title="Modality") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we want a more generic version of modality as well, especially for models like an LLM that is not taking experimental data specifically as input (ie text, image, video etc). Not sure what an alternative name for that would be though.
closing in favor of #1166 |
closes #1078