Video Scribe is a web application that processes video files, extracting audio, transcribing speech, and capturing key frames. It provides a visual interface for analyzing video content and generating markdown-based transcripts with embedded images.
- Video upload and processing
- Audio extraction and transcription
- Key frame extraction
- Interactive transcript editing
- Markdown export with embedded images
- Dark mode support
- Node.js (v14 or later)
- FFmpeg
No installation required. Start the server via:
npx videoscribe
or:
deno run --allow-read --allow-write --allow-net --allow-env --allow-run https://raw.githubusercontent.com/gramener/videoscribe/refs/heads/main/cli.js
Log into LLM Foundry.
Then open your browser and navigate to http://localhost:3000
and upload a video file.
This will transcribe the audio and extract key frames from the video. You can then:
- View and edit the transcript
- Play the extracted audio
- Toggle key frames on/off
- Export the result as a ZIP file with the keyframes, Markdown transcript and JSON transcript
POST /audio
: Extract audio from uploaded videoPOST /keyframes
: Extract key frames from uploaded video
Both endpoints accept multipart form data with a file
field containing the video file.
- Backend: Node.js, Express.js
- Frontend: HTML, CSS, JavaScript (ES6+)
- UI Framework: Bootstrap 5
- Templating: lit-html
- Audio Processing: FFmpeg
- Transcription: Groq API (distil-whisper-large-v3-en model)
This project is licensed under the MIT License.