- Log error for unsupported input_format (thanks @dmvaldman)
- Add open_clip support (thanks @cat-state)
- fix mclip in clip back
- add violence detector to clip back
- add feature to pass options in config file
- safety model for ViT-B/32
- replace safety heuristic by safety model
- enable back dedup of images
- turn off image dedup by default temporarily
- fix range search use
- add back node build in publish
- new arrow provider in clip back
- index combiner script
- parquet to arrow script
- deduplication of results feature
- one more fix for text only
- fix image_tensor_count vs text_counter count in runner
- fix file count check for input format files
- going back to autofaiss main
- switch to fork of autofaiss
- properly close the wandb run at the end
- fix pex building
- fix version ranges
- fix sample_count == 0 issue in logger and handle no text sample properly in main
- improve logger by checking the file exists before reading
- use zero padding for output file names
- add proper multi gpu support in pyspark distributor
- improve printing of error in logger
- fix another small issue with logger reporting
- small fix in logger computation
- Fix race condition when using mkdir in writer
- Refactor clip inference, make it support distributed inference
- add use_jit option to back and inference, now True by default, add clip_model option to back
- mclip support in clip back and front
- replace null bytes while transforming parquet to hdf5
- Use collate_fn to skip corrupt images without using recursion (thanks @afiaka87)
- truncate text inputs in clip back
- fix url column option bug
- add url column option
- use torch no grad to fix a memleak in clip back
- add default backend url in clip back
- add option in clip end 2 end to avoid running the back
- update for autofaiss
- add missing front building in python publish
- clip retrieval end2end
- minor bug fix about missing .npy extension in output of clip inference
- mclip support
- use fsspec to make it possible to output to any fs
- add indice deduplication in the output of clip back
- use the npy mapping in all cases for ivf reordering since it's fast enough
- save ivf_old_to_new_mapping for the text index to use
- implement ivf re-ordering for much faster metadata fetching
- add download button in front
- fix filterDuplicateUrls issue when there is no url, only images
- fix default columns_to_return
- add a simple filter ipynb notebook
- implement infinite scroll feature
- fix limiting of results in clip back
- fix absence of caption in clip front
- fix an issue in clip front handling of default
- limit the number of results to the number available in clip back
- add compression by default when creating the hdf5 cache file
- add columns_to_return in clip back
- safe mode in front
- fix metrics sorting in metrics summary
- add download url time and descriptions in metrics summary endpoint
- add prometheus endpoint in clip back
- properly display errors in clip index
- add nb cores option in clip index
- add folder name option and catch errors in clip index
- package front in npm
- implement image url search in clip back
- add memory mapping option in clip back : 0 memory usage to load an index!
- add copy metadata option to clip index
- allows controlling the amount of ram used during the creation process of the index
- add logs in clip back to inform when each thing is loaded
- fix PIL call (thanks @pvl)
- expose max_index_memory_usage
- --wds_image_key, --wds_caption_key options (thanks @afiaka87)
- implement h5py caching in clip back
- fix clip back and filter to use sorted metadatas
- fix finding the last batch number (continuing output)
- add warn and continue handler to avoid crashing
- add missing webdataset dep
- webdataset input format
- save in batch
- test files in tests folder
- save metadata as parquet
- use autofaiss in a new clip index
- remove indexing from clip batch and rename to clip inference
- fixes
- it works