facebookresearch / bitsandbytes Goto Github PK

Library for 8-bit optimizers and quantization routines.

License: MIT License

bitsandbytes's Introduction

This repository is no longer supported. Please use the new bitsandbytes here: https://github.com/TimDettmers/bitsandbytes.

You can install the new bitsandbytes version via:

pip install bitsandbytes

bitsandbytes's People

Contributors

Stargazers

Watchers

bitsandbytes's Issues

import bitsandbytes as bnb 错误

import bitsandbytes as bnb
出现如下
OSError: /home/anaconda3/envs/ner/lib/python3.6/site-packages/bitsandbytes/libbitsandbytes.so: undefined symbol: __fatbinwrap_38_cuda_device_runtime_compute_75_cpp1_ii_8b1a5d37

您好，请问这该怎么解决啊

bnb.optim.AdamW

Hey @TimDettmers,

Awesome library! bnb.optim.Adam saved me from having to use model parallelism 😍

Do you think it would be easy to also add a bnb.optim.AdamW version for https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html#torch.optim.AdamW ?

Happy to give it a try if you think it's easily feasible :-)

Optimizer2State: Unsafe use of eval() in init

Optimizer2State class accepts strings in the optional betas parameter during initialization. The string value is passed to eval() without prior validation, potentially leading to execution of arbitrary code.

bitsandbytes/bitsandbytes/optim/optimizer.py

Lines 235 to 246 in 22b2877

 class Optimizer2State(Optimizer8bit): 

 def __init__(self, optimizer_name, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, 

 weight_decay=0.0, optim_bits=32, args=None, 

 min_8bit_size=4096, percentile_clipping=100, block_wise=True, max_unorm=0.0, 

 skip_zeros=False): 

 if not 0.0 <= lr: 

 raise ValueError("Invalid learning rate: {}".format(lr)) 

 if not 0.0 <= eps: 

 raise ValueError("Invalid epsilon value: {}".format(eps)) 

 if isinstance(betas, str): 

 betas = eval(betas) 

 print(betas, 'parsed')

bnb.optim.Adam, bnb.optim.Adam8bit and bnb.optim.Adam32bit exhibit the same behaviour.

#!/usr/bin/env python3

hello = "exec(\"import os;os.system('/usr/bin/id');\")"

try:
    from bitsandbytes.optim.optimizer import Optimizer2State
    Optimizer2State('test', 'test', betas=hello)
except:
    pass

try:
    import bitsandbytes as bnb
    bnb.optim.Adam('test', betas=hello)
except:
    pass

try:
    import bitsandbytes as bnb
    bnb.optim.Adam8bit('test', betas=hello)
except:
    pass

try:
    import bitsandbytes as bnb
    bnb.optim.Adam32bit('test', betas=hello)
except:
    pass

$ id
uid=1000(asdf) gid=1000(asdf) groups=1000(asdf)
$ ./test.py 
uid=1000(asdf) gid=1000(asdf) groups=1000(asdf)
None parsed
uid=1000(asdf) gid=1000(asdf) groups=1000(asdf)
None parsed
uid=1000(asdf) gid=1000(asdf) groups=1000(asdf)
None parsed
uid=1000(asdf) gid=1000(asdf) groups=1000(asdf)
None parsed

Quantization functions test fail on Pascal

The following tests fail on Pascal:

tests/test_functional.py::test_estimate_quantiles[float] FAILED
tests/test_functional.py::test_estimate_quantiles[half] FAILED
tests/test_functional.py::test_quantile_quantization FAILED

My guess is this is probably due to atomicAdd for floats working differently.

Ok

Can I quantize the tensors to other data type rather than 8bit tensor?

There's some quantization example here
It showed that there is bnb.functional.quantize_fp4(x) But I didn't find it in documentations.

sparse is not passed in init function of StableEmbedding

In this line instead of passing the sparse parameter, False is passed. Is this intended? It's a little confusing since the default for sparse is True here but False in torch.nn.Embedding

[Question] Usage of bnb.nn.Embedding with existing classes from other libraries

Replace embedding layer if necessary: torch.nn.Embedding(..) -> bnb.nn.Embedding(..)

Does it suppose user creation of custom classes to replace (for example) huggingface transformers' GPT2DoubleHeadsModel?
Or there is something like bnb.optim.GlobalOptimManager which change provided model instance to use bitsandbytes embeddings instead of torch ones?

Is AdamW8bit compatible with OSS in fairscale?

Thank you for the nice project.

When I use AdamW8bit optimizer, i could save the GPU memory.
However, when i combined the optimizer with OSS in fairscale,
the GPU memory is not reduced.

Is not this library compatible with OSS in fairscale. or another issue?

Feature request: Please, add implementation for Novograd algorithm

Great work!

Can you, please, add implementation for Novograd algorithm?
Support info:
paper: https://arxiv.org/abs/1905.11286

Novograd implementations:
https://github.com/NVIDIA/apex/blob/master/apex/optimizers/fused_novograd.py
https://github.com/jettify/pytorch-optimizer/blob/master/torch_optimizer/novograd.py
https://github.com/convergence-lab/novograd
https://github.com/lonePatient/NovoGrad-pytorch
https://github.com/titu1994/keras_novograd

8-bit optimizer crashes when fine-tuning gpt2-large

Using the bnb.optim.Adam8bit optimizer in place of torch.optim.Adam causes a crash after a handful of batches:

12it [00:22, 1.82s/it]Error an illegal memory access was encountered at line 198 in file /home/alyssa/gpt_math/bitsandbytes/csrc/ops.cu

I am fine-tuning Huggingface's version of the gpt2-large model on an Ampere 3090 GPU with CUDA version 11.6 and nVidia driver version 510.73.05. I have tried compiling bitsandbytes on my machine from source, and the set_optim_to_run_embedding_in_fp32 trick from huggingface/transformers#14819; neither of them affected the behavior. Running with the standard pytorch Adam optimizer works fine. nvidia-smi shows 16 GB of memory used on a GPU with 24 GB, so it shouldn't be running out of RAM or anywhere close to that.

'NoneType' object has no attribute 'cdequantize_blockwise_cpu_fp32'

I am trying to train GPT-J with 8bit weights. It's working well on GPU. But When I try to use it on CPU, it gives this error

'NoneType' object has no attribute 'cdequantize_blockwise_cpu_fp32'

I have used dequantize_blockwise from bitsandbytes.functional. Following is the class in which its used:

class DequantizeAndLinear(torch.autograd.Function):

    def forward(ctx, input: torch.Tensor, weights_quantized: torch.ByteTensor,
                absmax: torch.FloatTensor, code: torch.FloatTensor, bias: torch.FloatTensor):
        weights_deq = dequantize_blockwise(weights_quantized, absmax=absmax, code=code)
        ctx.save_for_backward(input, weights_quantized, absmax, code)
        ctx._has_bias = bias is not None
        return F.linear(input, weights_deq, bias)

    def backward(ctx, grad_output: torch.Tensor):
        assert not ctx.needs_input_grad[1] and not ctx.needs_input_grad[2] and not ctx.needs_input_grad[3]
        input, weights_quantized, absmax, code = ctx.saved_tensors
        # grad_output: [*batch, out_features]
        weights_deq = dequantize_blockwise(weights_quantized, absmax=absmax, code=code)
        grad_input = grad_output @ weights_deq
        grad_bias = grad_output.flatten(0, -2).sum(dim=0) if ctx._has_bias else None
        return grad_input, None, None, None, grad_bias

Is it possible to run it on CPUor should I have to run it only GPU ?

is it work on Win platform?

I am have OSError: [WinError 193] %1 is not a valid Win32 application in lib = ct.cdll.LoadLibrary(os.path.dirname(file) + '/libbitsandbytes.so') in functional.py. WAIDW?

Support for Tesla Architecture

First of all, great work!

Secondly, I can see that you specify that Maxwell Architecture is necessary, and I am wondering if

it's possible to do 8-bit optimization on Tesla Architecture
there are plans to implement it

I ask because Kaggle and Colab notebooks use Tesla Architectures (P100, K80), and I'm sure those communities, myself included, would be interested in using bitsandbytes

The code uses more GPU memory with Multi-scale Vision Transformers

Hi,

Thanks for the great work! I'm currently trying to apply your code to vision transformers, specifically, on this code base:
https://github.com/facebookresearch/SlowFast/tree/main/projects/mvit
When using torch.optim.SGD(momentum=0.9), the code consumes 9221MiB GPU memory during training. After changing it to use bnb.optim.SGD8bit() with the same arguments, it consumes even a bit more GPU memory of 9235MiB. Do you have any idea why this would happen? Thank you! My CUDA version is 10.2 and torch version is 1.9.1.

Best,
Junwei

undefined symbol: __fatbinwrap_38_cuda_device_runtime_compute_75_cpp1_ii_8b1a5d37

(torch1.8-py3.8) jiaofangkai@dell-PowerEdge-T640:/home/share/jiaofangkai$ python check_bnb_install.py
Traceback (most recent call last):
  File "check_bnb_install.py", line 1, in <module>
    import bitsandbytes as bnb
  File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/site-packages/bitsandbytes/__init__.py", line 5, in <module>
    from .optim import adam
  File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/site-packages/bitsandbytes/optim/__init__.py", line 5, in <module>
    from .adam import Adam, Adam8bit, Adam32bit
  File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/site-packages/bitsandbytes/optim/adam.py", line 6, in <module>
    from bitsandbytes.optim.optimizer import Optimizer2State
  File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/site-packages/bitsandbytes/optim/optimizer.py", line 6, in <module>
    import bitsandbytes.functional as F
  File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/site-packages/bitsandbytes/functional.py", line 13, in <module>
    lib = ct.cdll.LoadLibrary(os.path.dirname(__file__) + '/libbitsandbytes.so')
  File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/ctypes/__init__.py", line 459, in LoadLibrary
    return self._dlltype(name)
  File "/home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/ctypes/__init__.py", line 381, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/share/jiaofangkai/anaconda3/envs/torch1.8-py3.8/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes.so: undefined symbol: __fatbinwrap_38_cuda_device_runtime_compute_75_cpp1_ii_8b1a5d37

Hi, I have encountered similar questions to #5 . I have tested with TeslaT4 and RTX 2080Ti but both failed.

The environment are as follows:

# TeslaT4
Ubuntu 18.04.6, Tesla T4, cuda-10.1, driver vesion: 418.197.02, python=3.8, torch=1.8.1+cu101

# RTX 2080Ti
Ubuntu 20.04.3, RTX 2080Ti, cuda-10.1, driver version: 435.21, python=3.8, torch=1.8.1+cu101

python setup.py install error

(bitsandbytes) chenxin@chenxin-Nitro-AN515-52:/disk1/github/bitsandbytes$ python setup.py install
Traceback (most recent call last):
File "setup.py", line 15, in
name = f"bitsandbytes-cuda{os.environ['CUDA_VERSION']}",
File "/home/chenxin/disk1/anaconda3/envs/bitsandbytes/lib/python3.8/os.py", line 675, in getitem
raise KeyError(key) from None
KeyError: 'CUDA_VERSION'
(bitsandbytes) chenxin@chenxin-Nitro-AN515-52:/disk1/github/bitsandbytes$ conda list | grep cudatoolkit
cudatoolkit 11.1.1 h6406543_8 conda-forge

Unable to install via Pypi and source

Installation via pip was working a few hours ago.
Now, I am getting the following error.

I am also unable to install from source.

errors when training to the third epoch. everytime.

THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=29 error=1 : invalid argument
Traceback (most recent call last):
  File "train_pointunet.py", line 211, in <module>
    loss_seg = lossfunc_seg(outputs_seg, labels)+lossfunc_dice(outputs_seg,labels)
  File "/home/why/miniconda3/envs/3.6.8/lib/python3.6/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/why/miniconda3/envs/3.6.8/lib/python3.6/site-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: cuda runtime error (1) : invalid argument at /pytorch/aten/src/THC/generic/THCTensorMath.cu:29

im very confused because in the first several epoches it works fine.

How to use it in huggingface AdamW optimizer?

hi, thanks for this work! I want use it in huggingface AdamW optimizer to train Pre-trained language model, such as BERT. How can I use it, thanks!

undefined symbol: __fatbinwrap_38

With some CUDA versions and on some architectures this error occurs:

Traceback (most recent call last):
  File "check_bnb_install.py", line 1, in <module>
    import bitsandbytes as bnb
  File "/miniconda/envs/pytorch_env/lib/python3.7/site-packages/bitsandbytes/__init__.py", line 5, in <module>
    from .optim import adam
  File "/miniconda/envs/pytorch_env/lib/python3.7/site-packages/bitsandbytes/optim/__init__.py", line 5, in <module>
    from .adam import Adam, Adam8bit, Adam32bit
  File "/miniconda/envs/pytorch_env/lib/python3.7/site-packages/bitsandbytes/optim/adam.py", line 5, in <module>
    from bitsandbytes.optim.optimizer import Optimizer2State
  File "/miniconda/envs/pytorch_env/lib/python3.7/site-packages/bitsandbytes/optim/optimizer.py", line 6, in <module>
    import bitsandbytes.functional as F
  File "/miniconda/envs/pytorch_env/lib/python3.7/site-packages/bitsandbytes/functional.py", line 13, in <module>
    lib = ct.cdll.LoadLibrary(os.path.dirname(__file__) + '/libbitsandbytes.so')
  File "/miniconda/envs/pytorch_env/lib/python3.7/ctypes/__init__.py", line 442, in LoadLibrary
    return self._dlltype(name)
  File "/miniconda/envs/pytorch_env/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /miniconda/envs/pytorch_env/lib/python3.7/site-packages/bitsandbytes/libbitsandbytes.so: undefined symbol: __fatbinwrap_38_cuda_device_runtime_compute_75_cpp1_ii_8b1a5d37

Confirmed for CUDA 10.1 for compute capability 7.5 (V100).

Llama?

New Model out. Any chance it'll be supported by you guys?

OSError: We couldn't connect to 'https://huggingface.co' to load this file

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like decapoda-research/llama-7b-hf is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

bfloat16 grads are not supported

Is there any plans to support models/grads with bfloat16 type? Bfloat gained quite the popularity lately as every ampere GPU supports the type, and eliminates the need for loss scaling compared to float16.
This is what I get when I try to initialize bnb.AdamW with a bfloat16 casted model:
ValueError: Gradient+optimizer bit data type combination not supported: grad torch.bfloat16, optimizer torch.uint8

Did you ever try MNMT systems?

As reported in the paper, for training a bi-directional transformer model on WMT14 or WMT16 the performance of 8-bit Adam stays relatively consistent with the 32-bit counterparts. I was also able to verify this on other data sources for training bi-directional models with my own setup.

However, I've also tried multiple variations of 8-bit optimizers on multilingual neural machine translation (MNMT) models in fairseq and there it seems that even with --no-scale-embedding as well as the StableEmbedding the performance is roughly 3 BLEU behind the counterparts. The --no-scale-embedding flag amounts to roughly 7 BLEU gain, while the xavier init amounts to roughly 0.4 BLEU gain. Didn't look into the effect of the layer norm of the stable embeddings yet.

Did you do any testing on that and have practical tips on getting the performance up?

no difference in memory usage

Hi.
I am training my network with bnb.optim.Adam8bit vs torch.optim.Adam but I don't see any difference in memory consumption.

Running on GTX 2080Ti (single gpu or DDP).
with cudatoolkit 11.1.74
bitsandbytes-cuda111

looking in nvidia-smi I see 9.6GB in both cases
Am I missing something here?

	class Optimizer2State(Optimizer8bit):
	def __init__(self, optimizer_name, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8,
	weight_decay=0.0, optim_bits=32, args=None,
	min_8bit_size=4096, percentile_clipping=100, block_wise=True, max_unorm=0.0,
	skip_zeros=False):
	if not 0.0 <= lr:
	raise ValueError("Invalid learning rate: {}".format(lr))
	if not 0.0 <= eps:
	raise ValueError("Invalid epsilon value: {}".format(eps))
	if isinstance(betas, str):
	betas = eval(betas)
	print(betas, 'parsed')

facebookresearch / bitsandbytes Goto Github PK

bitsandbytes's Introduction

bitsandbytes's People

Contributors

Stargazers

Watchers

Forkers

bitsandbytes's Issues

Recommend Projects

Recommend Topics

Recommend Org