This repository explores how two aspects of visual perception which are vital for early language learning can be captured by algorithms. It consists of two sub-projects:
The library code is shared for the two sub-projects. Below we present the guide on how to install and run it locally or on a docker image, as well as how to get the required data.
Click on the sub-project documentation link above to read more about the sub-project setting and see the step-by-step guide on how to prepare and run experiments.
This code was developed by Mikayel Samvelyan under the direction of Ryan Gabbard and Marjorie Freedman as part of the Information Science Institute's DARPA GAILA research effort on algorithmic models of child language learning. Should you have any question, please reach out to Mikayel Samvelyan and Marjorie Freedman.
See install.md.
We have gathered a large number of videos of educational children's television series, such as Mister Rogers' Neighborhood and Sesame Street, and created the initial version of the benchmark.
The video files are downloaded from Internet Archive which is a non-profit library of millions of free books, movies, and more. Here are the links:
- Neighborhood (50 episodes, 8.2 GB) - Link, Download Link
- Sesame Street (Episode 3037, 444 MB) - Link, Download Link
- Sesame Street (Episode 2257, 567 MB) - Link, Download Link
- Sesame Street (Episode 2517, 846 MB) - Link, Download Link
The information about the metadata on video segments can be found in the benchmark
directory.
-
motion_raw.tsv
stores information on videos for the motion sub-project prior to preprocessing.motion.tsv
contains information on video segments after the preprocessing. -
gaze_raw.tsv
stores information on videos for the gaze sub-project prior to preprocessing.gaze.tsv
contains information on video segments after the preprocessing.
Preprocessed video segments can be found in the data
directory.
-
videos_motion
directory contains the preprocessed video segments for the motion sub-project. -
videos_gaze
directory contains the preprocessed video segments for the gaze sub-project.