- Extraction of pose coordinates from dance videos using openpose human pose estimation.
- Training LSTM network on extracted coordinates using songs as input and coordinates as output.
- Trained lstm is used to predict dance coordinates for the remaining song( 95% of the audio is used for training and remaining 5% for predictions ).
- Display output videos by joining predicted coordinates to generate dancing human stick figures.
keras==2.3.1
librosa==0.7.2
moviepy==1.0.1
opencv-python==4.2.0.34
pytube3==9.6.4
tensorflow==2.2.0
- Run get_data.py to download videos and audios to data folder. You can add youtube videos links to "video_links.txt" file for downloading. Alternatively you can copy videos( '.mp4' format ) and audios( '.wav' format ) directly to the data folder.
- Download pretrained weights for pose estimation from here. Download pose_iter_440000.caffemodel and save it in "models" folder.
- Run main.py to train lstm and display predicted dance video.
python main.py --video "path to input video" --audio "path to input audio" --background "path to background image"
Example - python main.py --video data/0.mp4 --audio data/0.wav --background inputs/bg0.jpg
#Note - If the gpu-ram is 3 GB or less, Reduce memory-limit in this line to a value less than your gpu-ram.
- https://www.learnopencv.com/deep-learning-based-human-pose-estimation-using-opencv-cpp-python/
- https://github.com/CMU-Perceptual-Computing-Lab/openpose
- https://python-pytube.readthedocs.io/en/latest/
- https://zulko.github.io/moviepy/
- https://librosa.org/librosa/
- https://www.youtube.com/channel/UCX9y7I0jT4Q5pwYvNrcHI_Q