Giter VIP home page Giter VIP logo

Comments (5)

stefanobranco avatar stefanobranco commented on August 17, 2024

Sometimes this also just causes the server to hang indefinitely it seems. I'll get a debug entry for generate, but nothing further happens:

DEBUG generate{parameters=GenerateParameters { best_of: None, temperature: Some(0.1), repetition_penalty: Some(1.2), frequency_penalty: None, top_k: None [...]
text_generation_router::server: router/src/server.rs:185: Input: [...]

edit:
From what I can tell, final output before the server gets stuck:

2024-06-26T10:51:06.583336Z DEBUG next_batch{min_size=None max_size=None prefill_token_budget=96000 token_budget=177600}: text_generation_router::infer::v3::queue: router/src/infer/v3/queue.rs:318: Accepting entry
2024-06-26T10:51:06.583498Z DEBUG batch{batch_size=1}:prefill:prefill{id=36 size=1}:prefill{id=36 size=1}: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-06-26T10:51:06.583502Z DEBUG batch{batch_size=1}:prefill:prefill{id=36 size=1}:prefill{id=36 size=1}: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-06-26T10:51:06.583497Z DEBUG batch{batch_size=1}:prefill:prefill{id=36 size=1}:prefill{id=36 size=1}: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-06-26T10:51:06.583513Z DEBUG batch{batch_size=1}:prefill:prefill{id=36 size=1}:prefill{id=36 size=1}: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-06-26T10:51:06.583519Z DEBUG batch{batch_size=1}:prefill:prefill{id=36 size=1}:prefill{id=36 size=1}: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-06-26T10:51:06.583531Z DEBUG batch{batch_size=1}:prefill:prefill{id=36 size=1}:prefill{id=36 size=1}: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-06-26T10:51:06.583666Z DEBUG batch{batch_size=1}:prefill:prefill{id=36 size=1}:prefill{id=36 size=1}: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-06-26T10:51:06.583798Z DEBUG batch{batch_size=1}:prefill:prefill{id=36 size=1}:prefill{id=36 size=1}: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-06-26T10:51:06.584074Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(24717), flags: (0x4: END_HEADERS) }
2024-06-26T10:51:06.584080Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717) }
2024-06-26T10:51:06.584079Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(24717), flags: (0x4: END_HEADERS) }
2024-06-26T10:51:06.584087Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717) }
2024-06-26T10:51:06.584107Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(24717), flags: (0x4: END_HEADERS) }
2024-06-26T10:51:06.584111Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717), flags: (0x1: END_STREAM) }
2024-06-26T10:51:06.584120Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717) }
2024-06-26T10:51:06.584127Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717), flags: (0x1: END_STREAM) }
2024-06-26T10:51:06.584135Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(24717), flags: (0x4: END_HEADERS) }
2024-06-26T10:51:06.584140Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717) }
2024-06-26T10:51:06.584148Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(24717), flags: (0x4: END_HEADERS) }
2024-06-26T10:51:06.584162Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717) }
2024-06-26T10:51:06.584165Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717), flags: (0x1: END_STREAM) }
2024-06-26T10:51:06.584176Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717), flags: (0x1: END_STREAM) }
2024-06-26T10:51:06.584206Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717), flags: (0x1: END_STREAM) }
2024-06-26T10:51:06.584223Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(24717), flags: (0x4: END_HEADERS) }
2024-06-26T10:51:06.584231Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717) }
2024-06-26T10:51:06.584265Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717), flags: (0x1: END_STREAM) }
2024-06-26T10:51:06.584273Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(24717), flags: (0x4: END_HEADERS) }
2024-06-26T10:51:06.584277Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717) }
2024-06-26T10:51:06.584319Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717), flags: (0x1: END_STREAM) }
2024-06-26T10:51:06.584416Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(24717), flags: (0x4: END_HEADERS) }
2024-06-26T10:51:06.584429Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717) }
2024-06-26T10:51:06.584473Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(24717), flags: (0x1: END_STREAM) }
2024-06-26T10:51:06.654835Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [230, 203, 84, 34, 176, 210, 115, 2] }
2024-06-26T10:51:06.654847Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [230, 203, 84, 34, 176, 210, 115, 2] }
2024-06-26T10:51:08.983960Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [96, 75, 106, 55, 0, 178, 95, 167] }
2024-06-26T10:51:08.983972Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [96, 75, 106, 55, 0, 178, 95, 167] }
2024-06-26T10:51:09.583407Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [255, 94, 167, 52, 201, 71, 56, 69] }
2024-06-26T10:51:09.583418Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [255, 94, 167, 52, 201, 71, 56, 69] }
2024-06-26T10:51:10.209552Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [194, 95, 72, 29, 208, 65, 68, 93] }
2024-06-26T10:51:10.209563Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [194, 95, 72, 29, 208, 65, 68, 93] }
2024-06-26T10:51:10.464379Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [12, 206, 178, 92, 23, 251, 21, 144] }
2024-06-26T10:51:10.464390Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [12, 206, 178, 92, 23, 251, 21, 144] }
2024-06-26T10:51:10.641784Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [243, 91, 107, 187, 113, 48, 53, 194] }
2024-06-26T10:51:10.641795Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [243, 91, 107, 187, 113, 48, 53, 194] }
2024-06-26T10:51:10.903416Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [89, 55, 95, 85, 205, 74, 65, 44] }
2024-06-26T10:51:10.903427Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [89, 55, 95, 85, 205, 74, 65, 44] }
2024-06-26T10:51:11.489977Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [23, 239, 155, 130, 199, 243, 20, 8] }
2024-06-26T10:51:11.489988Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [23, 239, 155, 130, 199, 243, 20, 8] }

And then nothing, other than the recognition that further request come in as described above.

edit 2:
Right before that I get what seems like a very large block allocation:
Allocation: BlockAllocation { blocks: [9100, [...], 177598], block_allocator: BlockAllocator { block_allocator: UnboundedSender { chan: Tx { inner: Chan { tx: Tx { block_tail: 0x7f77f0004800, tail_position: 73 }, semaphore: Semaphore(0), rx_waker: AtomicWaker, tx_count: 2, rx_fields: "..." } } } } }

I'm sorry if this is not relevant, I'm just trying to provide every bit of information I can that stands out to me.

from text-generation-inference.

erfanium avatar erfanium commented on August 17, 2024

same here
upgrading from v2.0.1 to v2.1.0

from text-generation-inference.

bwhartlove avatar bwhartlove commented on August 17, 2024

I've seen a similar issue with multi-gpu support seemingly non-functional after upgrading to v2.1.0 in my case. Once I disabled sharding, the issue subsided.

from text-generation-inference.

RohanSohani30 avatar RohanSohani30 commented on August 17, 2024

when I load the model using docker on a single GPU it takes 11250GB GPU memory and with 2 shards it will take approximately the same memory on both GPUs. which doubles that of a single shard.
sharding is supposed to split my model on two GPUs with approximately half of the initial size (for 2 GPUs).

sharding will work perfectly using TGI CLI but inference time is more in CLI. this may be due exllama, vllm and libraries are not installed.
Do you happen to have any idea about it?

from text-generation-inference.

github-actions avatar github-actions commented on August 17, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

from text-generation-inference.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.