Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat 1078 modelschema #1109

Closed
wants to merge 15 commits into from
Closed

Feat 1078 modelschema #1109

wants to merge 15 commits into from

Conversation

saskiad
Copy link
Collaborator

@saskiad saskiad commented Oct 16, 2024

closes #1078

@saskiad saskiad marked this pull request as draft October 16, 2024 23:01
Copy link
Member

@dyf dyf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this structure is too specific and nested.

More generally, how do others describe models? We should talk to Laura.

name: str = Field(..., title="Name")
developer_full_name: Optional[str] = Field(default=None, title="Name of developer")
developer_institution: Optional[Organization.ONE_OF] = Field(default=None, title="Institute where developed")
modality: Modality.ONE_OF = Field(..., title="Modality")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want a more generic version of modality as well, especially for models like an LLM that is not taking experimental data specifically as input (ie text, image, video etc). Not sure what an alternative name for that would be though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is meant to include all models, I agree with Tom that there won't always be training data. In 'task trained' RNNs, the model is trained on a supervised target output. I imagine some models might be 'pre-trained' in this setup and then trained on data at a later stage of training.

src/aind_data_schema/core/model.py Outdated Show resolved Hide resolved
src/aind_data_schema/core/model.py Outdated Show resolved Hide resolved
src/aind_data_schema/core/model.py Show resolved Hide resolved
src/aind_data_schema/core/model.py Outdated Show resolved Hide resolved
std: Decimal = Field(..., title="Standard deviation")


class PerformanceScore(AindModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these only apply to supervised models?

@tmchartrand tmchartrand marked this pull request as ready for review October 22, 2024 20:42
@tmchartrand tmchartrand marked this pull request as draft October 22, 2024 20:42
@tmchartrand tmchartrand self-requested a review October 22, 2024 20:43
value: Any = Field(..., title="Metric value")


class ModelEvaluation(AindModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be clearer to have a distinct ModelTraining as a subclass of ModelEvaluation. This would add a cross_validation_method (instead of validation_folds) along with a list of training_parameters and anything else we want to highlight (perhaps a specific field for data augmentation if present?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, now I'm wondering if these should both subclass DataProcess, with ModelEvaluation just adding the performance field? It's a bit off conceptually in the case of unsupervised methods, but I think we're already considering simulation as within the scope of DataProcess so maybe no different from that.

name: str = Field(..., title="Name")
developer_full_name: Optional[str] = Field(default=None, title="Name of developer")
developer_institution: Optional[Organization.ONE_OF] = Field(default=None, title="Institute where developed")
modality: Modality.ONE_OF = Field(..., title="Modality")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want a more generic version of modality as well, especially for models like an LLM that is not taking experimental data specifically as input (ie text, image, video etc). Not sure what an alternative name for that would be though.

@saskiad saskiad requested review from tmchartrand and dyf November 17, 2024 03:56
@saskiad saskiad marked this pull request as ready for review November 17, 2024 05:03
@tmchartrand
Copy link
Member

closing in favor of #1166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Schema for models
4 participants