System Info TGI docker image on GCP. GPU: A100 Model: Phi-

Below is the error I'm getting <div class="snippet-clipboard-content notranslate p

TGI hard crashes after 1 OOM error about text-generation-inference HOT 4 OPEN

pranavthombare commented on June 22, 2024

TGI hard crashes after 1 OOM error

from text-generation-inference.

Comments (4)

pranavthombare commented on June 22, 2024

Below is the error I'm getting

    "timestamp": "2024-05-27T12:04:51.372064Z",
    "level": "ERROR",
    "fields": {
        "message": """'Shard complete standard error output:

        The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
        /opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed inversion 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
        warnings.warn(
        Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
        A new version of the following files was downloaded from https://huggingface.co/pranavthombare/Phi-3-mini-4k-construct:
        - configuration_phi3.py
        . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
        /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:658: UserWarning: You are using a Backend <class \'text_generation_server.utils.dist.FakeGroup\'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
        warnings.warn(
        Exception ignored in: <function Server.__del__ at 0x7c5ff5530550>
        Traceback (most recent call last):
        File "/opt/conda/lib/python3.10/site-packages/grpc/aio/_server.py", line 186, in __del__
            cygrpc.schedule_coro_threadsafe(
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 120, in grpc._cython.cygrpc.schedule_coro_threadsafe
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 112, in grpc._cython.cygrpc.schedule_coro_threadsafe
        File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 436, in create_task
            self._check_closed()
        File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
            raise RuntimeError(\'Event loop is closed\')
        RuntimeError: Event loop is closed
        sys:1: RuntimeWarning: coroutine \'AioServer.shutdown\' was never awaited
        Task exception was never retrieved
        future: <Task finished name=\'Task-2218\' coro=<<coroutine without __name__>()> exception=SystemExit(1)>
        Traceback (most recent call last):
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
            return await response
        File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
            raise error
        File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
            return await behavior(request_or_iterator, context)
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 144, in Prefill
            generations, next_batch, timings = self.model.generate_token(batch)
        File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
            return func(*args, **kwds)
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 960, in generate_token
            raise e
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 957, in generate_token
            out, speculative_logits = self.forward(batch)
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 900, in forward
            return self.model.forward(
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 394, in forward
            hidden_states = self.model(
        File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
            return self._call_impl(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
            return forward_call(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 340, in forward
            hidden_states, residual = layer(
        File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
            return self._call_impl(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
            return forward_call(*args, **kwargs)
        File"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 279, in forward
            mlp_output = self.mlp(normed_attn_res_output)
        File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
            return self._call_impl(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
            return forward_call(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 226, in forward
            return self.down_proj(self.act(gate_up_states[:, 0]) * gate_up_states[:, 1])
        torch.cuda.OutOfMemoryError:CUDA out of memory. Tried to allocate 256.00 MiB. GPU 

        During handling of the above exception, another exception occurred:

        Traceback (most recent call last):
        File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
            return get_command(self)(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
            return self.main(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
            return _main(
        File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
            rv = self.invoke(ctx)
        File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
            return _process_result(sub_ctx.command.invoke(sub_ctx))
        File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
            return ctx.invoke(self.callback, **ctx.params)
        File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
            return __callback(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
            return callback(**use_params)  # type: ignore
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
            server.serve(
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 258, in serve
            asyncio.run(
        File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
            return loop.run_until_complete(main)
        File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
            self.run_forever()
        File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
            self._run_once()
        File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
            handle._run()
        File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
            self._context.run(self._callback, *self._args)
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 702, in _handle_exceptions
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 689, in grpc._cython.cygrpc._handle_exceptions
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 821, in _handle_rpc
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 554, in _handle_unary_unary_rpc
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 408, in _finish_handler_with_unary_response
        File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
            return await self.intercept(
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 28, in intercept
            exit(1)
        File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in __call__
            raise SystemExit(code)
        SystemExit: 1'"""
    },
    "target": "text_generation_launcher",
    "span": {"rank": 0, "name": "shard-manager"},
    "spans": [{"rank": 0, "name": "shard-manager"}],
}

from text-generation-inference.

pranavthombare commented on June 22, 2024

I don't think its a model specific issue. I need to reproduce it with other models although this never used to happen pre TGI 2.0.

from text-generation-inference.

pranavthombare commented on June 22, 2024

Am able to reproduce it with mistral and llama models

from text-generation-inference.

pranavthombare commented on June 22, 2024

https://github.com/huggingface/text-generation-inference/pull/1736/files#diff-d92dc83f92b9c93839931357ef40af2ba48f62e5598a59e7478beebce4e5688eR26

I think this is the reason why.

from text-generation-inference.

TGI hard crashes after 1 OOM error about text-generation-inference HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent