- Extraction of pose coordinates from dance videos using openpose human pose estimation.
- Training LSTM network on extracted coordinates using songs as input and coordinates as output.
- Trained lstm is used to predict dance coordinates for the remaining song( 95% of the audio is used for training and remaining 5% for predictions ).
- Display output videos by joining predicted coordinates to generate dancing human stick figures.
opencv-contrib-python==4.7.0.72
pandas==2.0.1
librosa==0.10.0.post2
moviepy==1.0.3
yt-dlp==2023.3.4
tensorflow==2.12.0
keras==2.12.0
- Run get_data.py to download videos and audios to data folder. You can add youtube videos links to "video_links.txt" file for downloading. Alternatively you can copy videos( '.mp4' format ) and audios( '.wav' format ) directly to the data folder.
- Download pretrained weights for pose estimation from here. Download pose_iter_440000.caffemodel and save it in "models" folder.
- Run main.py to train lstm and display predicted dance video.
python main.py --video "path to input video" --audio "path to input audio" --background "path to background image" --display
Example - python main.py --video data/0.mp4 --audio data/0.wav --background inputs/bg0.jpg --display
#Note - If the gpu-ram is 3 GB or less, Reduce memory-limit in this line to a value less than your gpu-ram.
- https://www.learnopencv.com/deep-learning-based-human-pose-estimation-using-opencv-cpp-python/
- https://github.com/CMU-Perceptual-Computing-Lab/openpose
- https://python-pytube.readthedocs.io/en/latest/
- https://zulko.github.io/moviepy/
- https://librosa.org/librosa/
- https://www.youtube.com/channel/UCX9y7I0jT4Q5pwYvNrcHI_Q