Giter VIP home page Giter VIP logo

tinyllama's People

Contributors

chaoscodes avatar green-sky avatar hunter-lee1 avatar joennlae avatar jzhang38 avatar koalazf99 avatar tianduowang avatar tridao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tinyllama's Issues

Very very poor perf using faraday and amd gpu ?

Hello, tiny lama takes all my ram and has very very poor perfs' like lower than 7b models, it takes a very long time to load and is worse than most model, I don't unstand what I'm doing wrong, usually I use ggml gguf ? version but you have bin that is 4GB for 1B .... I guess that's the issue, maybe you have somewhere the ggml or gguf model ?
I'm pretty sure something is wrong ... Maybe I can convert it ? (the real issue is that I have an AMD high end gpu, it useless .............)

I used the base model Last version and not the chat model, since it's a 1b params maybe I can convert it to gguf ?

Hardware requirements

Hello, It's a very nice and much needed development. How much storage will be required for complete model training. As around 1.9TB is required only for datasets. Also how much RAM is required.
Best wishes to the team!

Replay Finetuning & store as GGML

Hi

I have been trying to redo TinyLlama finetuning starting from PY007/TinyLlama-1.1B-intermediate-step-480k-1T using both finetuning.py and using the last command from script.sh.
I used one A100 40G (using only 27GB of VRAM). Everything went well apparently.

I just added:

final_model="path to last checkpoint"

tokenizer = AutoTokenizer.from_pretrained(final_model)

model = model = AutoModelForCausalLM.from_pretrained(
        final_model,
        device_map="auto",
        trust_remote_code=True,
    )

model.save_pretrained("TinyLlama-1.1B-chat-hf")
tokenizer.save_pretrained("TinyLlama-1.1B-chat-hf")

Then I tried to convert the model to a GGML format using convert.py from llama.cpp

!python convert.py <path to TinyLlama-1.1B-chat-hf>

This lead to the following error:

Loading model file <path to TinyLlama-1.1B-chat-hf/pytorch_model.bin>
params = Params(n_vocab=32003, n_embd=2048, n_layer=22, n_ctx=2048, n_ff=5632, n_head=32, n_head_kv=4, f_norm_eps=1e-05, f_rope_freq_base=10000.0, f_rope_scale=None, ftype=None, path_model=PosixPath('/content/drive/MyDrive/TinyLlama/TinyLlama/sft/TinyLlama-1.1B-chat-hf'))
Loading vocab file '/content/drive/MyDrive/TinyLlama/TinyLlama/sft/TinyLlama-1.1B-chat-hf/tokenizer.model', type 'spm'
Traceback (most recent call last):
  File "/content/drive/MyDrive/TinyLlama/llama.cpp/convert.py", line 1193, in <module>
    main()
  File "/content/drive/MyDrive/TinyLlama/llama.cpp/convert.py", line 1175, in main
    vocab = load_vocab(vocab_dir, args.vocabtype)
  File "/content/drive/MyDrive/TinyLlama/llama.cpp/convert.py", line 1086, in load_vocab
    return SentencePieceVocab(path, added_tokens_path if added_tokens_path.exists() else None)
  File "/content/drive/MyDrive/TinyLlama/llama.cpp/convert.py", line 372, in __init__
    raise Exception(f"Expected added token IDs to be sequential and start at {len(added_tokens)}; got {actual_ids}")
Exception: Expected added token IDs to be sequential and start at 6; got [0, 1, 2, 32000, 32001, 32002]

Any idea what I am doing wrong ?

Some more info contained in related files:

special_tokens_map.json

{
  "additional_special_tokens": [
    "<unk>",
    "<s>",
    "</s>",
    "[PAD]",
    "<|im_end|>",
    "<|im_start|>"
  ],
  "bos_token": "<s>",
  "eos_token": "</s>",
  "pad_token": "[PAD]",
  "unk_token": "<unk>"
}

added_tokens.json

{
  "</s>": 2,
  "<s>": 1,
  "<unk>": 0,
  "<|im_end|>": 32001,
  "<|im_start|>": 32002,
  "[PAD]": 32000
}

Those files are slightly different from what can be found in PY007/TinyLlama-1.1B-Chat-v0.3 and I don't understand why.

.

.

Running on CPU using llama.cpp

Hi,

Posting here even though this is not related to the code itself.

Context:
I have tried to used Chat-v0.3 directly using the checpoints [code](<script src="https://gist.github.com/galleon/ca73c87542e9110dea4220bb143e70a5.js"></script>) I just added eos_token_id=tokenizer.eos_token_id to the example to make it finish as expected.

I obtain an answer that I consider ok even though it is made of three sentences (I have not looked into the details on how you generated the chat version. Any info avail ?)

Then I decided to move to llama.cpp making sure to update my version to get the fix for the issue you recently ran into.

I did generate the F32 version (which should be the same as the checkpoint).
Here is the result I got to this CLI
./main -m ~/.cache/llama.cpp/models/TinyLlama-1.1B-Chat-v0.3.gguf -p "Please answer in one sentence to this question: What is a Large Language Model?" --n-gpu-layers 0 --temp 0 --escape --seed 42 --color --n-predict -2

Do you know why it continue to generate after the EOS ?

Then I moved to Q5_K quntized version and get the following output

Completely AWOL which make me consider I have done something wrong. Did someone have similar issues ?

下载数据集不便

需要的数据集好大,而且还得科学上网下载的。不知道 大佬能不能提供一下下载数据集的网盘或者torrent之类的下载渠道?

Any plans for the ONNX runtime?

One of your potential Usecases is to deploy on edge devices.

For that, the ONNX runtime is probably the most likely candidate, supporting a lot of platforms/apis/architectures/hardware-accelerators.
Its supposably easy to convert any huggingface hosted models to ONNX with optimum, though I haven't done it personally.

Any thoughts?

Release format + artefact

Dear Authors,
Thanks so much for your amazing project.
Would it be possible for you plan to release the following:

  1. the optimizer states
  2. the scheduler
  3. a checkpoint just before cooling down the model

This would be a highly valuable artefact for keeping training the model !

Thanks so much and congratulation for your work !
Pierre

Why does a dimension mismatch occur when I use AutoModelForCausalLM to load a model?

model = AutoModelForCausalLM.from_pretrained(args.model_name_or_path)

File "/usr/local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 471, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2795, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3173, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]).
size mismatch for model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]).
size mismatch for model.layers.1.self_attn.k_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]).
size mismatch for model.layers.1.self_attn.v_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]).
size mismatch for model.layers.2.self_attn.k_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]).
size mismatch for model.layers.2.self_attn.v_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]).

TinyLlama-1.1B-orca-gpt4

First, I want to express my gratitude about this project. I think TinyLlama has a lot of potential and we're just starting to see it. Cudos!

I'm pretty new to this exciting field and this is the first time I fine-tuned a model. I used the "base" TinyLlama model (step-240k) to fine-tune using the sam-mosaic/orca-gpt4-chatml dataset but the result seems not as good as your v0.2 chat model.

I will keep working on this and I will share with you the models I create. I think that the RAG approach you guys are experimenting now is the good direction and I'll going to do some experiments with that too.

Anyway the model I produced is here in case you want to take a look: TinyLlama-1.1B-orca-gpt4

Request: Finetune the Model on more Data?

This might be unorthodox, but I had to ask.

I've been trying to run the sft script on colab T4, and on Kaggle double T4, P100 and It instantly ran out of memory.
I've been Trying to perform a QLora run, and It was successful for a very small dataset, but the dataset I'm trying to finetune this with is around 20GB, and takes anywhere from 81 to 135 hrs to map, trying to stream the dataset makes it load nothing, and I can't run any CPU or GPU instances that long.

If the SFT script isn't meant to take up that much memory, could you please fix it?
If it is meant to use that much memory, I would like to request that you train a checkpoint or the final model on the UnagamiData dataset

Its the dataset I used to train my previous model, Unagami. Its a Mixture of several high quality Datasets, Including Open-Platypus, Oasst1, and OpenOrca. It also has some QA from context datasets, like Dolly DataBricks, etc, which could make it better for RAG.

Its currently Formatted with HTML-like tokens, like <system>, <human> I can switch to ### System: , ### Human: if needed.

Considerable?

TinyLlama-chat outputs truncated/small?

From vLLM
Colab --> https://colab.research.google.com/drive/1HOxyJVxo0NeVk8oidvR3dvouGBTYO60X?usp=sharing

I've noticed that the outputs are rather small/truncated compared to the usual models trained on openassistant?

'### Human: Give me a hello world in python? ### Assistant:' 'Sure, here is a simple "hello world" program in Python:\n\n'
'### Human: Give me a hello world in python? ### Assistant:' 'Sure! Here\'s a simple Python program that says "Hello, world!"'
'### Human: Give me a hello world in python? ### Assistant:' 'Here\'s a simple "hello world" program in Python:\n\n```'
'### Human: Give me a hello world in python? ### Assistant:' 'Sure! Here is a sample code in Python:\n```python\nprint("'
'### Human: Give me a hello world in python? ### Assistant:' "Sure, here's a simple `print()` statement:\n```python\n"

Resuming training

I am training a 120M model from scratch because I would like to do some experiments myself. When I stop and try to resume, it requires me to drop the batch size significantly otherwise I get memory error. Any ideas why?

Also please consider making a discord server where people can discuss about the project.

info when load model

hi,when I load the model for train:
model = AutoModelForCausalLM.from_pretrained(
model_name_or_path,
torch_dtype=torch.bfloat16,
# device_map="auto",
trust_remote_code=True,
)

info like this:(is it right?)
Some weights of LlamaForCausalLM were not initialized from the model checkpoint at /root/bert_path/TinyLlama-1.1B-intermediate-step-240k-503b/TinyLlama-1.1B-intermediate-step-240k-503b and are newly initialized: ['model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.17.self_attn.rotary_emb.inv_freq']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

A guide to adding more datasets

One of the requirements is

  • Add scripts for pretraining on other datasets.

I'm assuming that the pretrain dataset script would still work for a finetune script, as the data is processed the same?

I was looking through prepare_slimpajama.py and from what I can tell,

  • Data is taken in as JsonL files, and tokenized into a "packed dataset"

When I tried to look into the packed dataset, I notice its supposed to be a custom format dataset?

I think it would be very useful if you made a guide on preparing a dataset, like maybe an example of a small dataset on Colab, because most of our PCs can't handle the sheer file size of the tokens in the slimpajama and starcoder datasets.

国内模型镜像

非常有意思的工作,但是huggingface 最近总是连接超时,是否可以放一个国内可以下载的链接呢。

Can it run on CPU?

Hello, can it run on cpu? with 4GB RAM.
Can you please guide me regarding the minimum hardware requirements?

Thanks in advance.

Why train three epochs? not one epoch?

Hi, Thanks for your great works!
In all previous works(i.e. GPTs, LLaMAs, ...), them both pretrain one epoch. But I found you train three epoch? why set three?
Looking forward to hearing from you in your free time. Thank you very much.

Colab

Can it be run on colab?

Notes on chat fine-tuning and datacontent

I adapted TimDettmers filtered Openassistant dataset in order for it to take the Llama 2 prompt format (e.g. with INST), see here.

I then fine-tuned TinyLlama (using a full fine-tune of all LoRA modules) at the 1T token checkpoint, see here.

Observations:
A. TinyLlama seems to have issues emitting an EOS (< /s > token). For example:

<s> [INST] What planets are in our solar system? [/INST] 1. Mercury

2. Venus

3. Earth

4. Mars

5. Jupiter

6. Saturn

7. Uranus

8. Neptune

9. Pluto

10. Ceres

11. Callisto

12. ...

This leads me to wonder are BOS and, particularly, EOS tokens being used in pre-training (e.g. < s > and < /s >)?

B. I notice that when inferencing the raw 1T checkpoint (i.e. not chat fine-tuned), it is common to see ### in the response:

<s> [INST] Generate a python code snippet to add two numbers. [/INST] 

### [INST] Generate a python code snippet to add two numbers.

### [INST] Generate a python code snippet to add two numbers.

...

I'm somewhat surprised to see this '###'. Does this mean there are some chat fine-tuning or instruct fine-tuning datasets in the pre-training datasets?

How to speedup tokenizer.encode?

I found in the pre-trained datasets, there are some docs has large amount chars, which cause a long time to encode them. For example, a doc has 15955671 chars, will cost 6.6 hours to encode it.

How do you speedup it? split the doc into many sub-docs? But I use the megatron to pre-train, has any idea?

Looking forward to hearing from you in your free time. Thank you very much.

Getting gibberish output when running on llama.cpp

Hi, I see the mention of running this model on llama.cpp in README. Did you get a manage to get it to run and quantize with good output? I'm trying to evaluate if this model can be used for speculative decoding for llama 2 7B

With the first checkpoint https://huggingface.co/PY007/TinyLlama-1.1B-step-50K-105b - seems like there might be some issue converting to gguf

python convert.py ../TinyLlama-1.1B-step-50K-105b/

./main -m ../TinyLlama-1.1B-step-50K-105b/ggml-model-f32.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -ngl 0 --temp 0

Is resulting in the following - Either f16 or f32 would result in this, adding a <s> token at the beginning didn't help either:

(...)
Building a website can be done in 10 simple steps:\nStep 1:12000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
(...)

I can see that running with huggingface/torch is giving a more reasonable result, although it quickly becomes repeated

<s> Building a website can be done in 10 simple steps:
Step 1: Create a website.
Step 2: Add a logo.
Step 3: Add a contact form.
Step 4: Add a blog.
Step 5: Add a social media links.
Step 6: Add a contact page.
Step 7: Add a contact form.
Step 8: Add a contact form.
Step 9: Add a contact form.

Not sure where this mismatch is coming from

Thanks

eval loss become nan after a single batch

Hello,
I am trying to finetune the model with the script you provided, on four RTX 3090 GPUs.
However, I was getting a CUDA out of memory issue, so I made the following change:

model = AutoModelForCausalLM.from_pretrained(
    args.model_name_or_path,
    device_map=device_map,
    trust_remote_code=args.trust_remote_code
)
model = model.half()

It now fits on my gpu, but the training loss becomes 0 after a single batch, and the evaluation loss is nan.
I tried to check the predictions of the model after training, but its output contains nan so it does not work.
What I already tried to solve the issue:

  • different hyper-parameters (lr and wd)
  • different datasets (alpaca-cleaned and osst1)
  • different checkpoints (TinyLlama-1.1B-intermediate-step-240k-503b and TinyLlama-1.1B-step-50K-105b)

But I get the same result every time. I am assuming this is due to the use of float16, since it is the main difference between my code and the original code. Do you have an idea of what is happening, and of what I could do about it?
Thank you!

Why is the vocab size of `TinyLlama-1.1B-Chat-V0.1` 32001?

Makes it somewhat more annoying to use.

Also, were there any changes in how the weights are saved between TinyLlama-1.1B-intermediate-step-240k-503b and TinyLlama-1.1B-intermediate-step-50k-105b? I'm getting incorrect output with the newer checkpoint for code that worked with the first checkpoint.

Minimum learning rate

The minimum learning rate is the same as the "max". Is this intentional or a mistake? If yes, why (you can skip explanation if it is too bothersome)?

Problem with TinyLlama-1.1B-Chat-v0.3 tokenizer

I am wondering if this behavior is correct:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("PY007/TinyLlama-1.1B-Chat-v0.3")
print(f"vocab_size: {tokenizer.vocab_size}")
print(f"length get_vocab: {len(tokenizer.get_vocab())}")
print(list(vocab.keys())[list(vocab.values()).index(32000)])
print(list(vocab.keys())[list(vocab.values()).index(32001)])
print(list(vocab.keys())[list(vocab.values()).index(32002)])
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
vocab_size: 32000
length get_vocab: 32003
[PAD]
<|im_start|>
<|im_end|>

I compared tokenizers from TinyLlama-1.1B TinyLlama-1.1B-Chat Llama-2-7b-hf and Llama-2-7b-chat-hf which are supposed to be the same and only TinyLlama-1.1B-Chat has this discrepancy.

python环境包有可能发出来一份吗

虽然它很基础,但确实很重要并且也很耗时,看着当前并非使用稳定版本,自己构建环境包遇到各种问题,目前会挂在xformer源码构建上,请问有可能发出来份python包链接吗?例如conda环境放到一个网盘上,感谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.