It's like Large Language Model, but smol.
I don't use any dependencies in this projects(except for Rand). All subprojects will made from scratch and only in Rust.
- BPE Tokenizer
- Interactive tokenizer and vocab viewer in HTML/WASM
- Simple language model(e.g. using Markov chain)
- Telegram bot and web app for models
- Simple neural network implementation
- Neural network training/inference framework
- Word embedding, word2vec
- Interactive word similarity viewer in HTML/WASM
- Generative model for text
$ cargo run --bin tokenizer_cli
Interactive mode:
$ cargo run --bin markov_chain -- content/vocab.vcb content/corpus.txt