Giter VIP home page Giter VIP logo

Comments (3)

tomaarsen avatar tomaarsen commented on May 18, 2024

You may experience improved speed if you use SpanMarkerModel.from_pretrained(..., torch_dtype=torch.float16) or torch.bfloat16. See e.g.:

import time
import torch
from span_marker import SpanMarkerModel

model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super", torch_dtype=torch.bfloat16, device_map="cuda")
# model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super", device_map="cuda")

text = [
    "Leonardo da Vinci recently published a scientific paper on combatting Mitocromulent disease. Leonardo da Vinci painted the most famous painting in existence: the Mona Lisa.",
    "Leonardo da Vinci scored a critical goal towards the end of the second half. Leonardo da Vinci controversially veto'd a bill regarding public health care last friday. Leonardo da Vinci was promoted to Sergeant after his outstanding work in the war."
]
BS = 64
N = 500
model.predict(text * 50, batch_size=BS)
start_t = time.time()
model.predict(text * N, batch_size=BS)
print(f"{time.time() - start_t:8f}s for {N * 2} samples with batch_size={BS} and torch_dtype={model.dtype}.")

This gave me:

20.745640s for 1000 samples with batch_size=64 and torch_dtype=torch.float16.
16.534876s for 1000 samples with batch_size=64 and torch_dtype=torch.bfloat16.

and

39.655506s for 1000 samples with batch_size=64 and torch_dtype=torch.float32.

Note that float16 is not available on CPU though! Not sure about bfloat16.

If you have a Linux (or Mac?) device, then you can also use load_in_8bit=True and load_in_4bit=True by installing bitsandbytes, but I don't know if that improves inference speed - this is also only for CUDA.

Beyond that the steps to increase the inference speeds become pretty challenging. Hope this helps a bit.

Also, you can process about 8 sentences per second with CPU and about 110 sentences per second in GPU, is that not sufficiently fast yet?

  • Tom Aarsen

from spanmarkerner.

ganga7445 avatar ganga7445 commented on May 18, 2024

thanku @tomaarsen
Using torch.float16 was working for me. It would be excellent if the operation could be completed in less than one second with a batch size of 256.

Batch Size Average Inference Time (ms) new inference time(ms)
16 0.14945 0.09211015701
32 0.28 0.1645913124
64 0.51582 0.2973537445
128 1.10669 0.6381671429
256 2.24729 1.238643169

from spanmarkerner.

tomaarsen avatar tomaarsen commented on May 18, 2024

@polodealvarado started working on ONNX support here: #26 (comment)
If we can make it work, perhaps then we can improve the speed even further. Until then, it will be hard to get even faster results. Less than a second for a batch size of 256 equals 256 sentences per second, that is already quite efficient.

  • Tom Aarsen

from spanmarkerner.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.