Giter VIP home page Giter VIP logo

Comments (4)

pranavthombare avatar pranavthombare commented on June 22, 2024

Below is the error I'm getting

    "timestamp": "2024-05-27T12:04:51.372064Z",
    "level": "ERROR",
    "fields": {
        "message": """'Shard complete standard error output:

        The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
        /opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed inversion 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
        warnings.warn(
        Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
        A new version of the following files was downloaded from https://huggingface.co/pranavthombare/Phi-3-mini-4k-construct:
        - configuration_phi3.py
        . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
        /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:658: UserWarning: You are using a Backend <class \'text_generation_server.utils.dist.FakeGroup\'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
        warnings.warn(
        Exception ignored in: <function Server.__del__ at 0x7c5ff5530550>
        Traceback (most recent call last):
        File "/opt/conda/lib/python3.10/site-packages/grpc/aio/_server.py", line 186, in __del__
            cygrpc.schedule_coro_threadsafe(
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 120, in grpc._cython.cygrpc.schedule_coro_threadsafe
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 112, in grpc._cython.cygrpc.schedule_coro_threadsafe
        File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 436, in create_task
            self._check_closed()
        File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
            raise RuntimeError(\'Event loop is closed\')
        RuntimeError: Event loop is closed
        sys:1: RuntimeWarning: coroutine \'AioServer.shutdown\' was never awaited
        Task exception was never retrieved
        future: <Task finished name=\'Task-2218\' coro=<<coroutine without __name__>()> exception=SystemExit(1)>
        Traceback (most recent call last):
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
            return await response
        File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
            raise error
        File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
            return await behavior(request_or_iterator, context)
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 144, in Prefill
            generations, next_batch, timings = self.model.generate_token(batch)
        File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
            return func(*args, **kwds)
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 960, in generate_token
            raise e
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 957, in generate_token
            out, speculative_logits = self.forward(batch)
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 900, in forward
            return self.model.forward(
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 394, in forward
            hidden_states = self.model(
        File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
            return self._call_impl(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
            return forward_call(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 340, in forward
            hidden_states, residual = layer(
        File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
            return self._call_impl(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
            return forward_call(*args, **kwargs)
        File"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 279, in forward
            mlp_output = self.mlp(normed_attn_res_output)
        File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
            return self._call_impl(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
            return forward_call(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 226, in forward
            return self.down_proj(self.act(gate_up_states[:, 0]) * gate_up_states[:, 1])
        torch.cuda.OutOfMemoryError:CUDA out of memory. Tried to allocate 256.00 MiB. GPU 

        During handling of the above exception, another exception occurred:

        Traceback (most recent call last):
        File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
            return get_command(self)(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
            return self.main(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
            return _main(
        File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
            rv = self.invoke(ctx)
        File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
            return _process_result(sub_ctx.command.invoke(sub_ctx))
        File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
            return ctx.invoke(self.callback, **ctx.params)
        File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
            return __callback(*args, **kwargs)
        File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
            return callback(**use_params)  # type: ignore
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
            server.serve(
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 258, in serve
            asyncio.run(
        File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
            return loop.run_until_complete(main)
        File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
            self.run_forever()
        File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
            self._run_once()
        File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
            handle._run()
        File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
            self._context.run(self._callback, *self._args)
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 702, in _handle_exceptions
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 689, in grpc._cython.cygrpc._handle_exceptions
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 821, in _handle_rpc
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 554, in _handle_unary_unary_rpc
        File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 408, in _finish_handler_with_unary_response
        File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
            return await self.intercept(
        File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 28, in intercept
            exit(1)
        File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in __call__
            raise SystemExit(code)
        SystemExit: 1'"""
    },
    "target": "text_generation_launcher",
    "span": {"rank": 0, "name": "shard-manager"},
    "spans": [{"rank": 0, "name": "shard-manager"}],
}

from text-generation-inference.

pranavthombare avatar pranavthombare commented on June 22, 2024

I don't think its a model specific issue. I need to reproduce it with other models although this never used to happen pre TGI 2.0.

from text-generation-inference.

pranavthombare avatar pranavthombare commented on June 22, 2024

Am able to reproduce it with mistral and llama models

from text-generation-inference.

pranavthombare avatar pranavthombare commented on June 22, 2024

https://github.com/huggingface/text-generation-inference/pull/1736/files#diff-d92dc83f92b9c93839931357ef40af2ba48f62e5598a59e7478beebce4e5688eR26

I think this is the reason why.

from text-generation-inference.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.