Comments (29)
We'd hoped that the nvcomp team would improve the APIs in 2.1.0, but instead they dropped them.
from kvikio.
I don't know why the compressor is not reducing the file size of something like the kjv. I'm going to ask around to see who in the C++ group on nvcomp can comment.
from kvikio.
Thanks @ajs-88, it sure looks like a bug in the nvcomp bindings.
from kvikio.
Yup, #106
from kvikio.
cc @thomcom
from kvikio.
Dear guys,
Any help / advise will be greatly appreciated.
from kvikio.
and also, when i use CascadedCompressor on the same program, the result is different.
compressor = nvc.CascadedCompressor(data_gpu.dtype)
File : /home/arul/Downloads/kjv10.txt Size: 4.4 MB
Compressed Size: 5.8 MB
from kvikio.
What is the size of the uncompressed data?
print('Uncompressed Size: ', humanize.naturalsize(data_gpu.nbytes))
Also, is /home/arul/Downloads/kjv10.txt
available somewhere?
from kvikio.
Hi @madsbk,
I used the same file from the notebook example,
https://github.com/rapidsai/kvikio/blob/branch-22.08/notebooks/nvcomp_vs_zarr_lz4.ipynb
http://textfiles.com/etext/NONFICTION/kjv10.txt
As mentioned above when i use the LZ4Compressor the compressed size is Compressed Size: 17.4 MB & when i use the CascadedCompressor the compressed size is Compressed Size: 5.8 MB. But Actual file size is 4.4MB.
For data_gpu.nbytes:
CascadedCompressor: Compressed Size: 17.3 MB
LZ4Compressor: Compressed Size: 17.3 MB
kindly advise if my answer is not clear.
Thanks,
from kvikio.
4.4MB
is the actual file size on disk, I am wondering what the size of the data (on device) is before compression?
from kvikio.
4.4MB
is the actual file size on disk, I am wondering what the size of the data (on device) is before compression?
[arul@arul-desktop-01 Downloads]$ md5sum kjv10.txt
582b1be7059586fb01ce75ff51e8e0a3 kjv10.txt
from kvikio.
In the following, I set DTYPE = cp.int8
to avoid up-scaling from 8 to 32 bits.
import os
import cupy as cp
import numpy as np
import kvikio.nvcomp as nvc
import humanize
DTYPE = cp.int8
dtype = cp.dtype(DTYPE)
filename = 'kjv10.txt'
print('File :', filename, ' Size: ', humanize.naturalsize(os.path.getsize(filename)))
testfile = open(filename).read()
data = np.frombuffer(bytes(testfile, 'utf-8'), dtype=np.int8)
data_gpu = cp.array(data, dtype=DTYPE)
print('Uncompressed Size: ', humanize.naturalsize(data_gpu.nbytes))
compressor = nvc.LZ4Compressor(data_gpu.dtype)
compressed = compressor.compress(data_gpu)
gpu_comp_file = open('kjv10-gpu-compressed.txt.lz4', 'wb')
gpu_comp_file.write(compressed.tobytes())
gpu_comp_file.close()
print('Compressed Size: ', humanize.naturalsize(compressed.size))
del compressor
del compressed
Which give:
File : kjv10.txt Size: 4.4 MB
Uncompressed Size: 4.3 MB
Compressed Size: 4.4 MB
However, I don't know enough about the LZ4 compression algorithm to say why it fails compressing the data :/
@thomcom do you know?
from kvikio.
@madsbk Added to that,
DTYPE = cp.int8
as you advised,
LZ4Compressor returns,
Compressed File Size: 4.4 MB
Compressed data_gpu.nbytes Size: 4.3 MB
Cascaded Compressor returns,
Compressed File Size: 5.8 MB
Compressed data_gpu.nbytes Size: 4.3 MB
on the other hand, i tried to compress the file using linux command line lz4 compression,
$ lz4 kjv10.txt > kjv10.txt.lz4
2.2MB
MD5Sum
b79b6c349ee0e9be23c33d3d618fbe0f kjv10.txt.lz4
from kvikio.
I even tried to validate the file encoding right but still compressed file size is of no difference and its corrupted. :(
import os
import cupy as cp
import numpy as np
import kvikio.nvcomp as nvc
import humanize
import chardet
DTYPE = cp.int8
dtype = cp.dtype(DTYPE)
filename = '/home/arul/Downloads/kjv10.txt'
detector = chardet.UniversalDetector()
for line in open(filename, 'rb'):
detector.feed(line)
if detector.done:
break
detector.close()
print(detector.result)
file_encoding = detector.result['encoding']
print('File :', filename, ' Size: ', humanize.naturalsize(os.path.getsize(filename)))
testfile = open(filename).read()
data = np.frombuffer(bytes(testfile, file_encoding), dtype=np.int8)
data_gpu = cp.array(data, dtype=dtype)
compressor = nvc.LZ4Compressor(data_gpu.dtype)
#compressor = nvc.CascadedCompressor(data_gpu.dtype)
compressed = compressor.compress(data_gpu)
gpu_comp_file = open('/home/arul/Downloads/kjv10-gpu-compressed.txt.lz4', 'wb')
gpu_comp_file.write(compressed.tobytes())
gpu_comp_file.close()
print('Compressed File Size: ', humanize.naturalsize(compressed.size))
print('Compressed data_gpu.nbytes Size: ', humanize.naturalsize(data_gpu.nbytes))
del compressor
del compressed
from kvikio.
I tried the C++ example from https://github.com/NVIDIA/nvcomp
With this examples, the results are
$ ./benchmark_lz4_chunked -f /home/arul/Downloads/kjv10.txt
files: 1
uncompressed (B): 4432803
comp_size: 2591439, compressed ratio: 1.7106
compression throughput (GB/s): 0.4120
decompression throughput (GB/s): 1.0734
from kvikio.
I'm looking at the issue based on the example you shared now @ajs-88
from kvikio.
I'm confused as to why our nvcomp bindings work at all, as the current version of nvcomp from github.com/NVIDIA/nvcomp doesn't have the same classes in include/nvcomp/lz4.hpp
as the one's that kvikio
expects. I also don't see how nvcomp is installed in the kvikio environment. @ajs-88, how did you install nvcomp and kvikio?
from kvikio.
@thomcom first of all thanks for looking into it. I downloaded the latest nvcomp library from https://developer.nvidia.com/nvcomp. Then i copied them to /usr/local/cuda [inculde, lib64] respectively.
On https://github.com/NVIDIA/nvcomp examples i followed their building instructions,
https://github.com/NVIDIA/nvcomp#building-cpu-and-gpu-examples-gpu-benchmarks-provided-on-github
On kvikio installation, i followed https://github.com/rapidsai/kvikio#conda .
and also exported the PATH's into ~/.bashrc
export CPATH=/usr/local/cuda/targets/x86_64-linux/include:$CPATH
export LD_LIBRARY_PATH=/usr/local/cuda/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda/bin:$PATH
from kvikio.
By chance, what is your CUDA version exactly?
from kvikio.
I ask because I'm trying to figure out why the current nvcomp
bindings work at all. They are out of date. I found the old versions in cuda-11.0
, and for some reason my environment is able to find them (I don't know why, yet).
I'm wondering, specifically, if you have an old version of nvcomp
that is lurking in your include path, just like mine.
from kvikio.
@thomcom kindly find the details below.
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
from kvikio.
@thomcom Its a RHEL 8.6 fresh installation. All i have is the latest nvcomp library.
from kvikio.
Drive-by comment: https://github.com/rapidsai/kvikio/blob/branch-22.08/python/cmake/thirdparty/get_nvcomp.cmake should be updated to use rapids_cpm_nvcomp
from rapids-cmake
. That will keep kvikio from getting out of sync with the rest of RAPIDS and better ensure that it stays up to date. I believe that there have been a lot of relevant bugfixes/improvements (for cudf) to nvcomp in the last couple of releases.
from kvikio.
@vyasr could you please file that as a new issue?
from kvikio.
The good news is that it looks like, at core, the issue is that kvikio
installs nvcomp version 2.1.0, which I think has a bug with the LZ4Compressor
class and family. In version 2.3.0 LZ4Compressor
has been dropped by nvcomp in favor of LZ4Manager
. The less good news I think is that the bindings will have to be updated to version 2.3.0 to get rid of the issue.
from kvikio.
@thomcom May I request, When can I expect a fix / release on this? Kindly advise
from kvikio.
It will take a week or two. I'm starting work on it today or tomorrow. :)
from kvikio.
It will take a week or two. I'm starting work on it today or tomorrow. :)
Many Thanks @thomcom
from kvikio.
Updated and improved bindings are up for review at #120
from kvikio.
Related Issues (20)
- Track Zarr-Python Integration
- Consider using a 80-120% range for `nvcomp` test value sizes.
- Relax NumPy & CuPy dependency in nvCOMP
- Can't find libcufile.so.1 HOT 3
- [Question] kvikIO out-of-the-box support for S3 reads HOT 2
- Include Zarr example/notebook
- Cannot Install Kvikio using conda HOT 2
- [Question] CuFile implementation with RDMA HOT 5
- Crash/segfault on exit when running libcudf tests with kvikIO and CUDA 11.8
- KvikIO: CUDA 12 Conda Packages HOT 1
- Moving from `pynvml` to `nvidia-ml-py`
- Optimize small reads and writes HOT 2
- [FEATURE] Connect KvikIO to a File Descriptior HOT 1
- Use `rapids-cython` & other more recent best practices HOT 3
- why is compat mode faster than gpudirect read with the given python example? HOT 5
- KvikIO: Need to canonicalize dlopen'd library names HOT 5
- Tracking support of new cuFile features HOT 2
- `numpy._typing` requires numpy>=1.23
- Work around scikit-build `include_package_data` bug in `legate/setup.py`
- Basic HDF5 support
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kvikio.