llama3.np
is a pure NumPy implementation for Llama 3 model. For an accurate implementation, I ran the stories15M model trained by Andrej Karpathy.
- For a detailed explanation in English, see Llama 3 implemented in pure NumPy.
- If you're interested in CUDA implementation, see Llama 3 implemented in pure C/CUDA.
$ python llama3.py "I have a dream"
"""
I have a dream. He dream of a big, beautiful garden full of flower and tree. He dream of playing with hi friend and eating yummy snack.
One day, he wa walking in the garden when he saw
Token count: 50, elapsed: 1.53s, 33 tokens/s
"""
If you use or discuss llama3.np
in your academic research, please cite the project to help spread awareness:
@misc{llama3.np,
title = {llama3.np: pure NumPy implementation for Llama 3 model},
author = {Sang Park},
howpublished = {\url{https://github.com/likejazz/llama3.np}},
note = {llama3.np, MIT License}
year = {2024},
}
Thank you to the creators of the following libraries and tools and their contributors:
- llama2.c - @karpathy
- llama.np - @hscspring
- modeling_llama.py - Hugging Face's Transformers
I got a lot of information from the articles below:
- 42dot LLM 1.3B - 42dot
- Exploring and building the LLaMA 3 Architecture : A Deep Dive into Components, Coding, and Inference Techniques - @vi.ai_
- Rotary Embeddings: A Relative Revolution - EleutherAI
- Mastering LLM Techniques: Inference Optimization - NVIDIA
And title image was generated by DALL-E
MIT