Giter VIP home page Giter VIP logo

Comments (11)

zhentaocc avatar zhentaocc commented on May 26, 2024 2

We find out the root cause is chatglm uses skip_init to initialize the model, but the device was default to cpu. The following steps to quantize the model regard the model as a weights initialized model and allocate buffers to linear layers. So finally, there are two set of linear weights for chatglm which contributes to the observation.

The reason why it's normal on linux is that linux platform seems to automatically release memory while windows won't do the same.

from bigdl.

hkvision avatar hkvision commented on May 26, 2024

Hi, this is the result on our machine (windows11 + arc770):

  • Original cpu memory occupied before running bigdl: 10G
  • When loading the chatglm3-6b int4 model from disk to cpu: peak memory 18G
  • Putting the loaded int4 model to xpu: back to 10G

cpu内存会突然升到18GB,然后降回到11GB, want to ask:

  • Is there any memory already occupied by other applications beforehand?
  • Is the memory back to 11G observed when loading to CPU or putting the loaded model to GPU?

from bigdl.

hkvision avatar hkvision commented on May 26, 2024

The memory increase when loading the model is not reasonable, we are looking into this.

from bigdl.

KiwiHana avatar KiwiHana commented on May 26, 2024

The memory increase when loading the model is not reasonable, we are looking into this.

Thanks! The problem that CPU memory increase too much caused on chatglm3-6B on Arc 750 windows 10. But CPU memory increase is OK for baichuan2-7B. Therefore, it due to chatglm3-6B model

from bigdl.

hkvision avatar hkvision commented on May 26, 2024

The memory increase when loading the model is not reasonable, we are looking into this.

Thanks! The problem that CPU memory increase too much caused on chatglm3-6B on Arc 750 windows 10. But CPU memory increase is OK for baichuan2-7B. Therefore, it due to chatglm3-6B model

Sure, we also observe llama is OK but chatglm3 is abnormal (seems only on Windows, Ubuntu is fine). We are looking into this. Will find out the reason as soon as possible.

from bigdl.

zhentaocc avatar zhentaocc commented on May 26, 2024

@KiwiHana please verify if latest build fixes your issue. I suppose it would be 20240123 version.

from bigdl.

KiwiHana avatar KiwiHana commented on May 26, 2024

@KiwiHana please verify if latest build fixes your issue. I suppose it would be 20240123 version.

OK, I will check the day after tomorrow when the device A750 windows come back.

from bigdl.

KiwiHana avatar KiwiHana commented on May 26, 2024

Hi @zhentaocc Zhentao, it can't solve the problem by 2.5.0b20230123. It also exists in MTL iGPU.

bigdl-core-xe-21 2.5.0b20240123
bigdl-llm 2.5.0b20240123
intel-extension-for-pytorch 2.1.10+git8ff85d6 with oneapi 2024.0
torch 2.1.0a0+cxx11.abi
torchvision 0.16.0a0+cxx11.abi

finish to xpu
stream_chat-----input_ids: {'input_ids': tensor([[64790, 64792,  4155,  2488,   260,   622, 30932,   627, 13519,   260,
          1332,  2689,   554,  7364,   289,   431, 15672, 30930,  1165,  2456,
           289,   490,   289,  3727,   293,  1630,   623,   705, 30932,   293,
           431,   817]], device='xpu:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1]], device='xpu:0'), 'position_ids': tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
         18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]],
       device='xpu:0')}
Exception in thread Thread-1 (generate):
Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Programs\AIGC Assistant\resources\audiollm\llmsd_env_asr\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Administrator\AppData\Local\Programs\AIGC Assistant\resources\audiollm\llmsd_env_asr\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Administrator\AppData\Local\Programs\AIGC Assistant\resources\audiollm\llmsd_env_asr\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Administrator\AppData\Local\Programs\AIGC Assistant\resources\audiollm\llmsd_env_asr\lib\site-packages\transformers\generation\utils.py", line 1335, in generate
    and torch.sum(inputs_tensor[:, -1] == generation_config.pad_token_id) > 0
RuntimeError: Allocation is out of device memory on current platform.

from bigdl.

zhentaocc avatar zhentaocc commented on May 26, 2024

I have tested with 20240118 version and loading method like:

model = AutoModel.load_low_bit(load_path, optimize_model=True,device="meta",
                                          trust_remote_code=True, use_cache=True, cpu_embedding=cpu_embedding)

which works. But 20240122 version is not working. Seems other changes cause this issue.

from bigdl.

zhentaocc avatar zhentaocc commented on May 26, 2024

0118 is confirmed to be good with previous fix.
Now there is new memory issue from 0122.
memory status is normal before loading weights to meta model:
image
but it increases a lot when it's finished:
image

from bigdl.

zhentaocc avatar zhentaocc commented on May 26, 2024

This issue will be resolved in today's new wheel.

from bigdl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.