-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset recipes #153
Comments
Do you think we can pull some or all of this into MLDatasets.jl? Obviously some parts like the block API won't be applicable, but it would be nice to expose the registry functionality, for example. Edit: ref. JuliaML/MLDatasets.jl#73 as well. |
It might be worth also looking at DataSets.jl announced at JuliaCon. |
At some point, all the dataset functionality should me merged down to MLDatasets.jl and MLDataPattern.jl. The registry itself is pretty barebones; if you take away the functionality related to blocks, then you could replace it with a |
At some point we'll have to think about iterable datasets and at that point some rearchitecting DataSets.jl could be useful. It should also not be too hard to add iterable support to DataLoaders.jl. For now I want to provide a useful core of offline datasets here in FastAI.jl with this simple approach. Rearchitecting should probably flow into the efforts in MLDatasets.jl (or perhaps a DLDatasets.jl if everything will be deprecated anyway?). I'll give a larger reply in JuliaML/MLDatasets.jl#73 later In any case, any recipe logic associated with the fastai datasets here should be easily relocatable later. 👍 |
Some are being added in #163 |
Hey, I'd like to work on this issue. Since this issue is labeled good first issue I believe I can help. Can you please specify to me what has to be done still cause I see the list above hasn't been updated? |
Hey! The list above is uptodate. The easiest thing to get started with should be adding recipes for the csv datasets and registering some |
Next I want to add recipes for |
Might need a new recipe type that wraps |
fastai-dbpedia_csv/ This is the folder structure for both datasets (dbpedia_csv, ag_news_csv). |
Is it necessary to make a new recipe for datasets that have folder structures similar to the one above? Or is it possible to tweak the existing ones to get the job done? |
I think in this case it may be possible to create a new recipe that wraps |
I'll work on this. |
After the community meet, I explored fastAI, MLutils and couple of other libraries and tried to understand the codebase specifically . Would love to get started with adding a dataset , can you please specify which one of the above would be a good one to get started into , also I believe the list above isnt updated |
With #151, FastAI.jl is getting high-level interfaces for searching datasets (
finddatasets
) and loading datasets into task-specific data containers (loaddataset
). There is also a newDatasetRecipe
that encapsulates configuration for loading a data container and the block information from a path. These recipes can be registered with a dataset so that they can be found using the above high-level functions.The fastai dataset colletion comes with quite a lot of datasets, so only a few have recipes yet. This issue tracks the progress on adding recipes to all the datasets. Contributions of recipe types and recipe configs for datasets are welcome.
See
src/datasets/recipes.jl
for example recipe implementations andsrc/datasets/fastairegistry
for how recipes are registered.listdatasources()
gives you a list of all dataset sources anddatasetpath(name)
downloads them and returns the download folder.Progress
For datasets that can be used for multiple tasks, they are listed below. Otherwise a checked dataset that at least one recipe is already implemented.
(Image{2}, LabelMulti)
)The text was updated successfully, but these errors were encountered: