Giter VIP home page Giter VIP logo

yarn's Introduction

YaRN

This repo contains the code and data for the YaRN context window extension method.

Paper

Paper (ICLR 2024): YaRN: Efficient Context Window Extension of Large Language Models
Old Preprint (arXiv)

Models

LLaMA

We publish variants of Llama 2 fine-tuned with YaRN at 32K, 64K and 128K context window length. They are available under the Llama 2 license on ๐Ÿค— Hugging Face.

Size Context Link
7B 64K NousResearch/Yarn-Llama-2-7b-64k
7B 128K NousResearch/Yarn-Llama-2-7b-128k
13B 64K NousResearch/Yarn-Llama-2-13b-64k
13B 128K NousResearch/Yarn-Llama-2-13b-128k
70B 32K NousResearch/Yarn-Llama-2-70b-32k

In addition, we also publish 8K context window versions of Llama 2 7B fine-tuned with NTK-aware and YaRN (Table 1 in the conference paper).

Mistral

With the release of v2 of our paper we are also publishing 64K and 128K variants of Mistral 7B v0.1.

Size Context Link
7B 64K NousResearch/Yarn-Mistral-7b-64k
7B 128K NousResearch/Yarn-Mistral-7b-128k

SOLAR

The SOLAR 10.7B v1.0 model utilizes depth-up scaling to add layers to Mistral 7B v0.1, which may potentially improve long context performance on a per-parameter basis. We publish 32K and 64K variants.

Size Context Link
10.7B 32K NousResearch/Yarn-Solar-10b-32k
10.7B 64K NousResearch/Yarn-Solar-10b-64k

Reproduction

We strongly believe in open science, and thus publish all code and data to reproduce the results in our paper. To reproduce, clone the repository and perform a local installation.

git clone https://github.com/jquesnelle/yarn
cd yarn
pip install -e .

Training

To train the models, run accelerate config and enable DeepSpeed acceleration. deepspeed/zero3.json was the configuration file used for training.

# ./train.sh

The tokenized training data is available on ๐Ÿค—Hugging Face and was derived from the pg19 dataset. For the Mistral models, a mix of the pretrain and fine-tune splits of Long-Data-Collections was used and the tokenized dataset is also available on ๐Ÿค—Hugging Face.

Evaluation

To reproduce the evaluations, install lm-evaluation-harness with pip install git+https://github.com/EleutherAI/lm-evaluation-harness and then run the two provided scripts.

# ./eval.sh
# ./eval-harness.sh

Citation

@inproceedings{
      peng2024yarn,
      title={Ya{RN}: Efficient Context Window Extension of Large Language Models},
      author={Bowen Peng and Jeffrey Quesnelle and Honglu Fan and Enrico Shippole},
      booktitle={The Twelfth International Conference on Learning Representations},
      year={2024},
      url={https://openreview.net/forum?id=wHBfxhZu1u}
}

yarn's People

Contributors

bloc97 avatar cebtenzzre avatar honglu2875 avatar jquesnelle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yarn's Issues

OOM on two 80GB GPUs

accelerate launch finetune.py \
    --output-dir output/mistral-yarn-7b-64k \
    --model mistralai/Mistral-7B-v0.1 \
    --architecture mistral \
    --scaling-factor 2 \
    --max-position-embeddings 16384 \
    --dataset emozilla/yarn-train-tokenized-8k-mistral \
    --sliding-window-attention-schedule 4096 \
    --lr-schedule constant \
    --learning-rate 0.000001 \
    --max-train-steps 1000

Both with or without lora hits the OOM error, this is on only 8K sequence length, so memory consumption should be around 4x smaller compared with training on 16K sequence length.

accelerate is configured to use two GPU and FSDP.

What is the purpose of `finetuned` parameter in `LlamaDynamicYaRNScaledRotaryEmbedding`?

I see that in __init__ method of LlamaDynamicYaRNScaledRotaryEmbedding there is a parameter called finetuned which is a boolean. What is the purpose of that parameter? Should we set it to False while finetuning the model and then set it to True for inference after finetuning? What could be the problem if we keep it False regardless the model is finetuned or not?

Could this repository be used for sft based on YaRN?

Thank you for your team's open-source contributions!
From the code, it seems to only support pre-training. I want to conduct extrapolation training in the SFT phase, taking the Instruct version from 4k to 16k. How should I proceed?

dataset preprocessing script

Hi, can you also share the preprocessing script to convert the dataset to the standard format? also why the attention_mask in the dataset is required?

Compute Requirements

Great stuff!

Just out of curiosity what was the compute setup used?
I couldn't seem to find details such as GPU type and cluster size used in the paper

Thanks!

inv_freq seems not calculated right

Hello, I'm thrilled to see that linear and NTK interpolation have been elegantly combined to create a much stronger interpolation strategyโ€”YARN. However, while going through the code in modeling_llama.py, I find myself a bit confused by the calculation of inv_freq, particularly at line398.

According to the YaRN paper, in equation 23, it is stated as follows:

$$ \lambda_d'=(1-\gamma_d)s\lambda_d+\gamma_d\lambda_d $$

Consequently, we can derive:

$$ h(\theta_d) = \frac{2\pi}{\lambda_d'} = \frac{2\pi}{(1-\gamma_d)s\lambda_d+\gamma_d\lambda_d} = \frac{\theta_d}{(1-\gamma_d)s+\gamma_d} $$

However, in the paper, the calculation of $h(\theta_d)$ in equation 25 is different:

$$ h(\theta_d) = \left(\frac{(1-\gamma_d)}{s}+\gamma_d\right)\theta_d \neq \frac{2\pi}{\lambda_d'} $$

Hence, I think there might be some problem with equation 25 and also with line398. Perhaps we can revise the yarn function as follows, since I've empirically found that this fix can further enhance performance:

def revised_yarn(self, device):
ย  ย  ย  ย  inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))

ย  ย  ย  ย  low, high = _yarn_find_correction_range(self.beta_fast, self.beta_slow, self.dim, self.base, self.original_max_position_embeddings)
ย  ย  ย  ย  inv_freq_mask = (1 - _yarn_linear_ramp_mask(low, high, self.dim // 2).float().to(device)) * self.extrapolation_factor
ย  ย  ย  ย  inv_freq = inv_freq / ((1-inv_freq_mask)*self.scale + inv_freq_mask)

ย  ย  ย  ย  self.register_buffer("inv_freq", inv_freq, persistent=False)
ย  ย  ย  ย  self.mscale = float(_yarn_get_mscale(self.scale) * self.attn_factor)

deepspeed config crashed for `auto` and OOM

1. ไฝฟ็”จ deepspeed

้…็ฝฎๆ–‡ไปถ deepseed/zero3.json ๆŠฅ้”™๏ผŒไธ่ƒฝ็”จ auto๏ผŒ่‡ชๅทฑๆ”นไบ† config๏ผŒ ไนŸไธ็Ÿฅ้“ๅฏนไธๅฏน๏ผŒๅ…ˆ่ท‘่ตทๆฅ๏ผš

image

ไฝฟ็”จๅ‘ฝไปค

accelerate launch finetune.py     --output-dir output/yarn-7b-32k    
--model NousResearch/Llama-2-7b-hf  --learning-rate 0.00001 
--lr-schedule constant --scaling-factor 8  --deepspeed

็„ถๅŽ OOM

2. ไธ็”จ deepspeed

accelerate config ๅ–ๆถˆๆŽ‰ deepspeed ๅ’Œ dynamo๏ผŒ้ป˜่ฎค train.sh ็ฌฌไธ€ไธช้…็ฝฎๅบ”่ฏฅๆ˜ฏ 64k ้•ฟๅบฆ็š„๏ผŒ OOM

# run `accelerate config` first. pass --deepspeed to finetune.py if using DeepSpeed

accelerate launch finetune.py \
    --output-dir output/yarn-7b-64k \
    --model NousResearch/Llama-2-7b-hf

3. ไปฃ็ ้”™่ฏฏ

DDP ๆ˜ฏไธๆ˜ฏ่ฆๆ”นไธ€ไธ‹๏ผŒๅคดไธ€ๅ›ž็”จ๏ผŒไธ็Ÿฅ้“ๅฏนไธๅฏน
image

็–‘้—ฎ

็œ‹ไบ†ไธ‹ x.shape ๆ˜ฏ torch.Size([1, 65536, 4096]), ๅ•ๅก 80G ๆ˜พๅญ˜ไผผไนŽไนŸไธๅคŸใ€‚

ๆ‰€ไปฅๆ˜ฏไธๆ˜ฏๅ“ช้‡Œๅบ”่ฏฅ่ฎพ็ฝฎไธช tp ? ็„ถ่€Œ README ๅฏนๆ–ฐไบบๅนถไธๅ‹ๅฅฝ็š„ๆ ทๅญ QAQ

An OOM error occurred while computing the perplexity of 128k Proofpoint documents with a maximum token count set to 128k.

Thank you so much for your open source work.

I evaluated the 128K context capacity of the LLaMA-27B model using an NVIDIA A100 (80G) GPU. However, I encountered an OOM error. Here is my script:

PG19="--tokenized emozilla/pg19-test-tokenized"
PROOFPILE_LONG_SMALL="--tokenized emozilla/proofpile-test-tokenized --dataset-min-tokens 131072 --samples 10 --truncate"
CUSTOM="--custom-model-together"

python eval/perplexity.py \
    ${PROOFPILE_LONG_SMALL} ${CUSTOM} \
    --output-file data/proofpile-long-small.csv \
    --min-tokens 131072 --max-tokens 131072 --tokens-step 2048 --aggressive-memory \
    -m llama2_7b_yarn_64k

image

How to generate plot?

Thank you for sharing this!

I'd like to review your steps for generating the plots I've seen on twitter.

Could you please include your plot generation script.. I know it's calling perplexity.py, but I'd like to re-trace your steps exactly. Then I can tweak it :)

A potential bug in scaled_rope/LlamaDynamicScaledRotaryEmbedding.py

  1. The comment "# This if block is unlikely to be run after we build sin/cos in __init__. Keep the logic here just in case." might be incorrect. From what I understand, the code following this comment calculates the scale value based on the actual length of the input. However, the value cached in __init__ is unscaled. Therefore, this branch should be executed frequently.

  2. The new values for cos_cached and sin_cached shouldn't be cached. If they are, after encountering a long sample, all subsequent samples will use the scaled values, regardless of their length.

Finetune Example

Awesome job on this

Do you have any examples of a fine-tune cli / setup to show llama3b 4096 | 6144?

Error about eval/passkey.py

/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [21,0,0], thread: [61,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

when I run the eval/passkey.py, then report the above error.
how can I solve it

Why the updated cache is initialized with seqlen=256?

Hi~
I am currently following the hf version for exploration.
But find that when update KV cache in Llama (NousResearch/Yarn-Llama-2-7b-128k).
The updated empty caches' length is always 256 (line 528):

past_kv = torch.cat([past_kv, torch.empty(bsz, 256, 2, kv.size(3), kv.size(4), dtype=kv.dtype, device=kv.device)], 1)

I think it should be

past_kv = torch.cat([past_kv, torch.empty(bsz, kv.size(1), 2, kv.size(3), kv.size(4), dtype=kv.dtype, device=kv.device)], 1)

Is that right? Or I misunderstand this procedure?

Linear Scaled Embedding Has Different Implementation?

I compare your code with The Bloke code for Linear Scaled Embedding. Somehow there are some difference:

  1. Your code change the scale self.scale = 1/scale which make it fraction but then divide t with fractioned scale (t /= self.scale). But The bloke code multiply t with fractioned scale. Which one is right?
  2. Your code max_position_embeddings seems stays at 2048. But The Bloke code change it according to max context length. Or did you actualy change the max_position_embeddings in the config file?

Which one follow the implementation from kaiokendev?

RoPE scaling config confusing

Hi Yarn team,
thank you guys for the awesome work. Currently I'm trying to evaluate several rope scaling methods and fortunately there are all available in this git. I have some question related to the Config of rope scaling.
I see that in the requirements.txt you already include transformers >= 4.34.0, so it mean I could use the "linear" and "dynamic-ntk" out of the box with transformers, just by add the rope scaling in AutoConfig.from_pretrained() like that:
config.rope_scaling = { "type": "linear", "factor": args.linear }
or
config.rope_scaling = { "type": "dynamic", "factor": args.dynamic_ntk }
I tried that and remove the patch for linear & dynamic-ntk and the result look identical when using your implemented patch.
Moreover, it also support Falcon architecture. (https://github.com/huggingface/transformers/blob/main/src/transformers/models/falcon/modeling_falcon.py#L162)
So my question is that, are there any different between this two implementation? or your implementation for linear & dynamic-ntk patch is for keeping the reproduction eval consistent?

OSError: [Errno 28] No space left on device

Resolving data files: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 136/136 [00:00<00:00, 296941.88it/s]
Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2/2 [00:02<00:00,  1.27s/it]
/root/miniconda3/lib/python3.8/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/root/miniconda3/lib/python3.8/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
Resolving data files: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 136/136 [00:00<00:00, 284501.42it/s]
Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2/2 [00:01<00:00,  1.01it/s]
/root/miniconda3/lib/python3.8/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/root/miniconda3/lib/python3.8/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
Resolving data files: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 136/136 [00:00<00:00, 121937.87it/s]
Generating train split: 61410 examples [01:49, 561.64 examples/s]
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 1940, in _prepare_split_single
    writer.write_table(table)
  File "/root/miniconda3/lib/python3.8/site-packages/datasets/arrow_writer.py", line 577, in write_table
    self.pa_writer.write_table(pa_table, writer_batch_size)
  File "pyarrow/ipc.pxi", line 525, in pyarrow.lib._CRecordBatchWriter.write_table
  File "/root/miniconda3/lib/python3.8/site-packages/fsspec/implementations/local.py", line 365, in write
    return self.f.write(*args, **kwargs)
OSError: [Errno 28] No space left on device

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "finetune.py", line 193, in <module>
    main(args.parse_args())
  File "finetune.py", line 67, in main
    train_dataset = load_dataset('/root/autodl-tmp/data/emozilla___pg_books-tokenized-bos-eos-chunked-65536/default/0.0.0/9107755b15521c04', split='train',
  File "/root/miniconda3/lib/python3.8/site-packages/datasets/load.py", line 2136, in load_dataset
    builder_instance.download_and_prepare(
  File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 954, in download_and_prepare
    self._download_and_prepare(
  File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 1049, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 1813, in _prepare_split

Sliding window perplexity with truncated documents

Hi! Thanks for sharing your nice work.
I have some questions about the perplexity evaluation setup.

In Figure 1, it is mentioned that the sliding window perplexity is reported, with documents truncated to the evaluation context length.
I was wondering if that makes sense, because (as far as I understood) sliding window evaluation is something that you do when the document length is longer than the evaluation context length.

Also, is there a particular reason you used truncation for proof-pile and not for gov_report?

Thanks in advance!

Question about Yarn environment configuration (v2)

Hi Yarn team,

I hope this issue finds you well. I clone your git code (v2, 2weeks ago) in our machine and found mistakes:

Traceback (most recent call last):
  File "/app/yarn_4/finetune.py", line 293, in <module>
    main(args.parse_args())
  File "/app/yarn_4/finetune.py", line 52, in main
    from scaled_rope.modeling_llama_yarn import LlamaForCausalLM
  File "/app/yarn_4/scaled_rope/modeling_llama_yarn.py", line 34, in <module>
    from transformers.utils import (
ImportError: cannot import name 'is_flash_attn_2_available' from 'transformers.utils' (/opt/conda/lib/python3.10/site-pack
ages/transformers/utils/__init__.py)

In our current environment, we are using the following versions:

  • Python: 3.10
  • PyTorch: 2.1.0
  • CUDA: 11.8
  • Transformers: 4.34.0
  • PyTorch-CUDA: 11.8
  • Torchtriton: 2.1.0
  • Torchvision: 0.16.0
  • Accelerate: 0.24.1
  • Deepspeed: 0.12.3
  • flash-attn: 2.3.3

We are interested in fine-tuning the Yarn environment for our specific setup. Specifically, we would like to inquire about the versions of transformers, accelerate and deepspeed used in the Yarn environment. Could you please provide details on how these tools are configured in your environment?

Any guidance or information you can offer regarding this matter would be greatly appreciated.

Thank you for your time and assistance!

Testing yarn on practical tasks.

Hello, this is Chenxin.

I am sooo excited to see the first open-source model with more than 100k context !!! This is undoubtedly a very significant progress that the open-source community has made in LCLMs.
I've noticed that the current version of Yarn only has PPL (Perplexity) experiments, which do not always correlate with practical long-context understanding tasks. I am glad๐Ÿ˜ to help test llama2-yarn-128k on LEval but I do not have resources to do SFT based on llama2-yarn-128k. Would you mind providing a instruction-following version?

Thanks again for the great work!

Running Error

When launching finetune. py using the following command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4 accelerate launch finetune.py --output-dir output/yarn-7b-64k --model /data/wy/llm_base/Llama-2-7b-hf --dataset /data/wy/LLMScaledData/pg_books-tokenized-bos-eos-chunked-6/data

The following error occurred:
Traceback (most recent call last):
File "/data/wy/yarn/finetune.py", line 293, in
main(args.parse_args())
File "/data/wy/yarn/finetune.py", line 156, in main
model.gradient_checkpointing_enable()
File "/home/centos/anaconda3/envs/llm_sacled/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DistributedDataParallel' object has no attribute 'gradient_checkpointing_enable'

Need to modify 'model.gradient_checkpointing_enable()' to 'model.module.gradient_checkpointing_enable()'

Confirmation of License

Nice work making this.

Could you clarify/confirm the license here? I see MIT License here on Github and no license on HuggingFace.

I would have assumed this has to at least be Meta Community License as that would transfer through because of using Llama 2.

It looks like the only new training data added is PG-19, which seems to be Apache 2- so it seems that YaRN could take on a Meta Community License.

Mistral-train error on deepspeed config

File "/workspace/long/yarn/finetune.py", line 143, in main
model = accelerator.prepare(model)
File "/root/miniconda3/envs/yarn/lib/python3.10/site-packages/accelerate/accelerator.py", line 1280, in prepare
result = self._prepare_deepspeed(*args)
File "/root/miniconda3/envs/yarn/lib/python3.10/site-packages/accelerate/accelerator.py", line 1515, in _prepare_deepspeed
raise ValueError(
ValueError: When using DeepSpeed accelerate.prepare() requires you to pass at least one of training or evaluation dataloaders or alternatively set an integer value in train_micro_batch_size_per_gpu in the deepspeed config fileor assign integer value to AcceleratorState().deepspeed_plugin.deepspeed_config['train_micro_batch_size_per_gpu'].
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1523510) of binary: /root/miniconda3/envs/yarn/bin/python

Training takes a long time

Why does it take so long for me to fine-tune llama2-7b-64k?
Each epoch takes 300+ seconds
I used 8xA100, turned on deepspeed, and used "yarn" for rope type.
Is it a problem with flash attention? But I see that modeling_llama_together_yarn.py uses flash attention by default?
Thanks a lot.

Phi 2

Hi,
Thank you for releasing this code! Are there any plans to train a Phi 2 model?
Thanks!

Runtime error

Hi,
I am trying to fine-tune a 7b model for 16k context length on a 8 GPU, A100, 40 GB machine. But, I am getting the following runtime error:

Traceback (most recent call last):
File "/home/ec2-user/data/yarn/finetune.py", line 222, in <module>
    main(args.parse_args())
  File "/home/ec2-user/data/yarn/finetune.py", line 150, in main
    loss = model(**batch).loss
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1801, in forward
    loss = self.module(*inputs, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/data/yarn/scaled_rope/modeling_llama_together_yarn.py", line 985, in forward
    outputs = self.model(
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/data/yarn/scaled_rope/modeling_llama_together_yarn.py", line 860, in forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/home/ec2-user/data/yarn/scaled_rope/modeling_llama_together_yarn.py", line 856, in custom_forward
    return module(*inputs, output_attentions, None)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/data/yarn/scaled_rope/modeling_llama_together_yarn.py", line 620, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/data/yarn/scaled_rope/modeling_llama_together_yarn.py", line 555, in forward
    ).reshape(bsz, q_len, h_size)
RuntimeError: shape '[1, 16384, 4096]' is invalid for input of size 13459456

Here is the command:

accelerate launch finetune.py --wandb yarn --output-dir output/yarn-7b-16k --model meta-llama/Llama-2-7b-chat-hf --max-train-steps 20 --scaling-factor 4 --scaling-type yarn --seed 31337 --dataset shossain/govreport-qa-5-16384 --gradient-accumulate-every 1

Please suggest.

Trying to set a tensor of shape torch.Size([257, 1024]) in "weight" (which has shape torch.Size([1226, 1024])), this look incorrect

้บป็ƒฆๅคงๅฎถ็œ‹็œ‹๏ผšๅœจ้ƒจ็ฝฒๆจกๅž‹็š„ๆ—ถๅ€™ๆ˜พ็คบ้‡ๅŒ–ๅ›พๅฑ‚ๅคฑ่ดฅ็š„ไฟกๆฏ๏ผŒไปฅๅŠTrying to set a tensor of shape torch.Size([257, 1024]) in "weight" (which has shape torch.Size([1226, 1024])), this look incorrect้”™่ฏฏใ€‚
็›ฎๅ‰ๆ‰พไบ†็ฝ‘ไธŠ็š„ๆ–นๆณ•๏ผŒ่ฏดTransformersๆ›ดๆ–ฐ๏ผŒๆˆ‘ๆŒ‰็…งไบ†ๅฎ˜ๆ–นๆ–‡ๆกฃ็š„่ฆๆฑ‚ๆ›ดๆ–ฐไธบpip install auto_gptq transformers==4.33.1ไบ†๏ผŒไนŸ่ฟ˜ๆ˜ฏๆŠฅๅŒๆ ท็š„้”™ใ€‚
้‡ๅŒ–ๅคฑ่ดฅ
้‡ๅŒ–ๅคฑ่ดฅ

cannot connect to hugging face

If my server cannot connect to hugging face, then I have downloaded your model on hugging face. How can I run the code in your warehouse? Thanks

Are 7B and 13B Models fine-tuned?

I've been running on a 40 GB A100 using transformers and GPTQ. To get the model working at all, there seems to be a specific order in which the packages have to be installed.

!pip3 install git+https://github.com/huggingface/transformers.git
!pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
!pip3 install git+https://github.com/huggingface/optimum.git
!pip3 install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
!pip3 install flash-attn==2.1.1 --no-build-isolation

With the above, I'm running out of memory around 8000 tokens of input (using the 7B model) and the output becomes garbled.

I've tried GPTQ, bnb nf4, and bf16 loading.

On bf16 loading, the output is garbled at 4k tokens of input.

Are the 7B and 13B yarn models fine-tuned? Do you have recommendations on how better to run them?

Questions about DynamicNTK

(self.scaling_factor * seq_len / self.max_position_embeddings) - (self.scaling_factor - 1)

Please tell me here, if I want to expand from 2K to 16K, then the factor multiplied by the base here is
$(8 * 16K / 2K) - (8 - 1) = 57$,
Is this multiple reasonable? Are there some problems here?
Please correct me if I'm wrong.

@bloc97

context length and dataset size

I am looking at the training command for mistral:

yarn/train.sh

Line 60 in 0ae3b2d

--dataset emozilla/yarn-train-tokenized-16k-mistral \

Can I train a 64k context length model with 16k long dataset? Or is it just an example?

cannot load safetensor: Trying to set a tensor of shape torch.Size([0]) in "weight" (which has shape torch.Size([32000, 4096]))

After check #45 , #40 and some hard-code modification, these command passed

# trainning
accelerate launch finetune.py   --output-dir output/yarn-7b-8k   
--model NousResearch/Llama-2-7b-hf  --scaling-factor 2  
--wandb yarn  --dataset
 emozilla/yarn-train-tokenized-8k-llama    --deepspeed

# save
accelerate launch finetune.py   --output-dir output/yarn-7b-8k   
--model NousResearch/Llama-2-7b-hf --save-only  --scaling-factor 2  
--wandb yarn --output-dir output-8k-save  --dataset
 emozilla/yarn-train-tokenized-8k-llama    --deepspeed

And I got these files:

(torch2) root@9b2ed2383075:/workspace/yarn/output/yarn-7b-8k# tree
.
|-- config.json
|-- model-00001-of-00003.safetensors
|-- model-00002-of-00003.safetensors
|-- model-00003-of-00003.safetensors
|-- model.safetensors
`-- model.safetensors.index.json

For load it with passkey.py, I merge these safetensors into original NousResearch/Llama-2-7b-hf and got this error

torch2) root@9b2ed2383075:/workspace/yarn# python3 eval/passkey.py -m /workspace/models/Llama-2-7b-hf/
Determining sequence lengths: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 6/6 [00:04<00:00,  1.48it/s]
Model:   0%|                                                                                                                                                                                                                                            | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):                                                                                                                                                                                                                                            
  File "/workspace/yarn/eval/passkey.py", line 127, in <module>
    main(add_args(parser).parse_args())
  File "/workspace/yarn/eval/passkey.py", line 90, in main
    loaded = load_model_and_apply_patches(model, args)
  File "/workspace/yarn/eval/model_loader.py", line 215, in load_model_and_apply_patches
    return apply_patches(load_model(model, args), args)
  File "/workspace/yarn/eval/model_loader.py", line 90, in load_model
    loaded = model_cls.from_pretrained(
  File "/root/miniconda3/envs/torch2/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
  File "/root/miniconda3/envs/torch2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3480, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/root/miniconda3/envs/torch2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3870, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/root/miniconda3/envs/torch2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 743, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/root/miniconda3/envs/torch2/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 285, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([0]) in "weight" (which has shape torch.Size([32000, 4096])), this look incorrect.

I noticed that your official https://huggingface.co/NousResearch/Yarn-Llama-2-7b-64k does not need any safetensor and can be test succesfully,

Did I missed any model conversion script ?

Inquiry Regarding Evaluation Metrics in Your Paper

@bloc97 @jquesnelle Dear Authors,

Firstly, I would like to extend my sincere appreciation for your remarkable work. It is truly commendable and has served as a valuable resource for the community.

Upon reading your paper, I encountered some confusion regarding the evaluation metrics employed. Specifically, in Section 4.3.1, you state: "...selected 10 random samples from Proof-pile with at least 128k tokens each and evaluated the perplexity of each of these samples when truncated at 2k steps from a sequence length of 2k tokens through 128k tokens." Could you kindly clarify what is meant by "2k steps" in this context?

Additionally, the term "Sliding window perplexity (S = 256) of ten 128k Proof-pile documents truncated to evaluation context window size" is used multiple times. However, I am uncertain how sliding window perplexity is applied if the documents are truncated to the evaluation context window size. Does it mean the documents are truncated to the maximum evaluation context window size (128k)?

Your insights and clarifications on these points would be greatly appreciated, as they might resolve some misunderstandings I have regarding the paper.

Thank you for your time and consideration.

How should I proceed with conducting an evaluation for lm-evaluation-harness?

Hello developer, I've been trying to conduct an evaluation of lm-evaluation-harness based on your paper, but I'm encountering an issue stating that the directory doesn't exist.

Could you provide more detailed instructions on how to conduct the evaluation?

Here is the command I've been using and the error that occurs.

command
pip install git+https://github.com/EleutherAI/lm-evaluation-harness
./eval-harness.sh

error-command
python: can't open file '/workspace/yarn/../lm-evaluation-harness/main.py': [Errno 2] No such file or directory

Your assistance would be greatly appreciated!
(help me plz..!!!)

Hardware equipments and training time?

I am very curious about the hardware equipment you use for training and the time it takes for the training. Do you have a detailed introduction? If so, I would be extremely grateful.

Unexpected larger perplexity on PG19

Hi Yarn team,

I hope this finds you well. I've been using your code jquesnelle/yarn for testing the PG19 dataset. While reviewing the eval.sh script, I noticed some definitions related to the PG19 dataset, but the code for testing perplexity results seems somewhat unclear.

Settings:

  • Base Model: llama2-7b
  • Base Context Size: 4096
  • Sliding Window: 256, 4096
  • Scale to: 8192

In eval.sh, I found the following definition for the PG19 dataset:

# python eval/perplexity.py -m meta-llama/Llama-2-7b-hf --dataset pg19 --split test --feature text --save-tokenized output/pg19-test-tokenized
PG19="--tokenized emozilla/pg19-test-tokenized"

However, I did not find the actual code for testing perplexity results. Therefore, I attempted to use our own defined code for testing:

python eval/perplexity.py --dataset pg19 --feature "text" --samples 5 -m meta-llama/Llama-2-7b-hf --max-tokens $max_tokens --min-tokens $max_tokens --tokens-step 4000 --tokenized emozilla/pg19-test-tokenized --yarn $((max_tokens / 4096)) --max-position-embeddings 4096 --original-max-position-embeddings 4096 --dataset-min-tokens $max_tokens --sliding-window 4096 --custom-model --aggressive-memory --flash-attention

I observed that the results differ when the sliding window is set to 4096 and 256. In comparison to other PI and dy-ntk methods, the performance is unstable with a sliding window set to 256 and stable with a sliding window set to 4096.

Results:

  • --sliding-window 4096:
    • meta-llama/Llama-2-7b-hf: 8192=9.89344
  • --sliding-window 256:
    • meta-llama/Llama-2-7b-hf: 8192=32.76145

In contrast, other PI and dy-ntk methods maintain relatively stable performance when the sliding window is set to 256 and 4096:

  • Sliding window: 4096 / 256
    • PI: 10.79598 / 10.65644
    • dy-ntk: 10.19125 / 10.214816

I would appreciate your insights on this phenomenon. Is this behavior considered normal, or could there be potential configuration issues? If possible, could you provide more detailed information about the PG19 dataset testing script to help me better understand and adjust the testing configuration?

Thank you very much for your time and assistance. I look forward to your response.

Best regards,
Yiran

OOM when doing text generation

Hi,

I have been running into out of memory issues when trying to generate some text using the model "NousResearch/Yarn-Llama-2-7b-128k". I am using a prompt with 126k tokens and running things on 1 GPU. The script that I am using is the "eval/prompt-loop.py". I tried to set load_in_4bit = True but it didn't help.

Do you have advice to solve this issue?

Thanks !

License

Currently this repository doesn't contain a license file. It would be great if you could add one to clarify under which license the code is made available. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.