https://github.com/ggerganov/llama.cpp
https://camo.githubusercontent.com/2bb6ac78e5a9f4f688a6a066cc71b62012101802fcdb478e6e4c6b6ec75dc694/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6963656e73652d4d49542d626c75652e737667
Roadmap / Project status / Manifesto / ggml
Inference of Meta's LLaMA model (and others) in pure C/C++
llama_token_to_piece
can now optionally render special tokens #6807llama_state_*
#6341llama_synchronize()
+ llama_context_params.n_ubatch
#6017llama_kv_cache_seq_rm()
returns a bool
instead of void
, and new llama_n_seq_max()
returns the upper limit of acceptable seq_id
in batches (relevant when dealing with multiple sequences) #5328struct llama_context_params
#5849