Comments (3)
I feel that this issue is targeted towards the work I did writing the matrix multiply code, and fixing the cache alignment issue.
It is a little disheartening of course given the amount of work, and there is a comment by @karpathy on #95 saying he doesn't mind the matrix multiplies being more complicated because that is where the work gets done, and now if you want fast use something else.
Yet the march for performance moves on, with exploration of half floats and other data types that are sure to add complexity.
Perhaps it would be good to say in the README what kinds of PR's the maintainers will allow and which not so that other people don't waste time in future.
I still believe there is educational value in seeing the guts of a matrix multiplication, since those are the guts of the whole system.
Maybe the right thing to do would be just to leave it frozen in time like nanoGPT, so it preserves its simplicity, and then do additional versions with more performance or features as a separate thing, idk.
In any case, I quite enjoyed writing the code, so not to worry.
All the best
from llama2.c.
@kroggen agree ty. If people want the fastest thing they should take a look at the excellent llama.cpp.
from llama2.c.
Hi @Foundation42 thanks for your thoughts, I adjusted the readme with contributor guidelines.
from llama2.c.
Related Issues (20)
- Llama-shepherd-cli a small tool to keep track of implementations in various languages
- Keras based tiny llama implementations
- Code/script to reproduce val loss using the shared models HOT 3
- How to quantize stories15M.bin HOT 1
- Train/val split
- Runing llama2.c on a microcontroller HOT 1
- Understanding "multiple_of"
- Mobile React native Support Ported
- New Visual Walkthrough of Llama2.c
- Please implement a project
- Could anyone port deepseek-moe to llama2.c?
- I'm doing an experiment with image generation, but my script outputs a binary file, how can I train a model using llama2.c?
- Can you make a sora (diffusion transformer) tutorial similar to llama2.c? HOT 1
- [Suggestion] Enable Discussion HOT 1
- add feature: export (quantize) from Llama2.c format
- RuntimeError with CUDA assertion failure when resuming model training from checkpoint HOT 1
- Could llama2.c be adapted to BitNet? HOT 1
- the export model and read_checkpoint is conflict HOT 2
- Tokenizer errors out when inferencing llama2 HOT 1
- Can the Huggingface model be converted to ckpt.pt to support training?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama2.c.