Hi, I'm trying to compress a file using the nvcomp LZ4Compressor but

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Yup, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-i

cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

What is the size of the uncompressed data? <div class="highlight highlight-source-

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

nvcomp LZ4 compression Issue about kvikio HOT 29 CLOSED

rapidsai commented on August 16, 2024

nvcomp LZ4 compression Issue

from kvikio.

Comments (29)

thomcom commented on August 16, 2024 2

We'd hoped that the nvcomp team would improve the APIs in 2.1.0, but instead they dropped them.

from kvikio.

thomcom commented on August 16, 2024 1

I don't know why the compressor is not reducing the file size of something like the kjv. I'm going to ask around to see who in the C++ group on nvcomp can comment.

from kvikio.

madsbk commented on August 16, 2024 1

Thanks @ajs-88, it sure looks like a bug in the nvcomp bindings.

from kvikio.

vyasr commented on August 16, 2024 1

Yup, #106

from kvikio.

jakirkham commented on August 16, 2024

cc @thomcom

from kvikio.

ajs-88 commented on August 16, 2024

Dear guys,

Any help / advise will be greatly appreciated.

from kvikio.

ajs-88 commented on August 16, 2024

and also, when i use CascadedCompressor on the same program, the result is different.

compressor = nvc.CascadedCompressor(data_gpu.dtype)

File : /home/arul/Downloads/kjv10.txt Size: 4.4 MB
Compressed Size: 5.8 MB

from kvikio.

madsbk commented on August 16, 2024

What is the size of the uncompressed data?

print('Uncompressed Size: ', humanize.naturalsize(data_gpu.nbytes))

Also, is /home/arul/Downloads/kjv10.txt available somewhere?

from kvikio.

ajs-88 commented on August 16, 2024

Hi @madsbk,

I used the same file from the notebook example,
https://github.com/rapidsai/kvikio/blob/branch-22.08/notebooks/nvcomp_vs_zarr_lz4.ipynb

http://textfiles.com/etext/NONFICTION/kjv10.txt

As mentioned above when i use the LZ4Compressor the compressed size is Compressed Size: 17.4 MB & when i use the CascadedCompressor the compressed size is Compressed Size: 5.8 MB. But Actual file size is 4.4MB.

For data_gpu.nbytes:
CascadedCompressor: Compressed Size: 17.3 MB
LZ4Compressor: Compressed Size: 17.3 MB

kindly advise if my answer is not clear.

Thanks,

from kvikio.

madsbk commented on August 16, 2024

4.4MB is the actual file size on disk, I am wondering what the size of the data (on device) is before compression?

from kvikio.

ajs-88 commented on August 16, 2024

4.4MB is the actual file size on disk, I am wondering what the size of the data (on device) is before compression?

[arul@arul-desktop-01 Downloads]$ md5sum kjv10.txt
582b1be7059586fb01ce75ff51e8e0a3 kjv10.txt

from kvikio.

madsbk commented on August 16, 2024

In the following, I set DTYPE = cp.int8 to avoid up-scaling from 8 to 32 bits.

import os
import cupy as cp
import numpy as np
import kvikio.nvcomp as nvc
import humanize

DTYPE = cp.int8
dtype = cp.dtype(DTYPE)

filename = 'kjv10.txt'
print('File :', filename, ' Size: ', humanize.naturalsize(os.path.getsize(filename)))
testfile = open(filename).read()
data = np.frombuffer(bytes(testfile, 'utf-8'), dtype=np.int8)

data_gpu = cp.array(data, dtype=DTYPE)
print('Uncompressed Size: ', humanize.naturalsize(data_gpu.nbytes))

compressor = nvc.LZ4Compressor(data_gpu.dtype)
compressed = compressor.compress(data_gpu)
gpu_comp_file = open('kjv10-gpu-compressed.txt.lz4', 'wb')
gpu_comp_file.write(compressed.tobytes())
gpu_comp_file.close()

print('Compressed Size: ', humanize.naturalsize(compressed.size))
del compressor
del compressed

Which give:

File : kjv10.txt  Size:  4.4 MB
Uncompressed Size:  4.3 MB
Compressed Size:  4.4 MB

However, I don't know enough about the LZ4 compression algorithm to say why it fails compressing the data :/
@thomcom do you know?

from kvikio.

ajs-88 commented on August 16, 2024

@madsbk Added to that,
DTYPE = cp.int8

as you advised,

LZ4Compressor returns,
Compressed File Size: 4.4 MB
Compressed data_gpu.nbytes Size: 4.3 MB

Cascaded Compressor returns,
Compressed File Size: 5.8 MB
Compressed data_gpu.nbytes Size: 4.3 MB

on the other hand, i tried to compress the file using linux command line lz4 compression,

$ lz4 kjv10.txt > kjv10.txt.lz4

2.2MB

MD5Sum
b79b6c349ee0e9be23c33d3d618fbe0f kjv10.txt.lz4

from kvikio.

ajs-88 commented on August 16, 2024

I even tried to validate the file encoding right but still compressed file size is of no difference and its corrupted. :(

import os
import cupy as cp
import numpy as np
import kvikio.nvcomp as nvc
import humanize
import chardet

DTYPE = cp.int8
dtype = cp.dtype(DTYPE)

filename = '/home/arul/Downloads/kjv10.txt'
detector = chardet.UniversalDetector()

for line in open(filename, 'rb'):
    detector.feed(line)
    if detector.done:
        break
detector.close()

print(detector.result)
file_encoding = detector.result['encoding']

print('File :', filename, ' Size: ', humanize.naturalsize(os.path.getsize(filename)))
testfile = open(filename).read()

data = np.frombuffer(bytes(testfile, file_encoding), dtype=np.int8)

data_gpu = cp.array(data, dtype=dtype)
compressor = nvc.LZ4Compressor(data_gpu.dtype)
#compressor = nvc.CascadedCompressor(data_gpu.dtype)
compressed = compressor.compress(data_gpu)
gpu_comp_file = open('/home/arul/Downloads/kjv10-gpu-compressed.txt.lz4', 'wb')
gpu_comp_file.write(compressed.tobytes())
gpu_comp_file.close()

print('Compressed File Size: ', humanize.naturalsize(compressed.size))
print('Compressed data_gpu.nbytes Size: ', humanize.naturalsize(data_gpu.nbytes))
del compressor
del compressed

from kvikio.

ajs-88 commented on August 16, 2024

Hi @thomcom and @madsbk ,

I tried the C++ example from https://github.com/NVIDIA/nvcomp
With this examples, the results are

$ ./benchmark_lz4_chunked -f /home/arul/Downloads/kjv10.txt

files: 1
uncompressed (B): 4432803
comp_size: 2591439, compressed ratio: 1.7106
compression throughput (GB/s): 0.4120
decompression throughput (GB/s): 1.0734

from kvikio.

thomcom commented on August 16, 2024

I'm looking at the issue based on the example you shared now @ajs-88

from kvikio.

thomcom commented on August 16, 2024

I'm confused as to why our nvcomp bindings work at all, as the current version of nvcomp from github.com/NVIDIA/nvcomp doesn't have the same classes in include/nvcomp/lz4.hpp as the one's that kvikio expects. I also don't see how nvcomp is installed in the kvikio environment. @ajs-88, how did you install nvcomp and kvikio?

from kvikio.

ajs-88 commented on August 16, 2024

@thomcom first of all thanks for looking into it. I downloaded the latest nvcomp library from https://developer.nvidia.com/nvcomp. Then i copied them to /usr/local/cuda [inculde, lib64] respectively.

On https://github.com/NVIDIA/nvcomp examples i followed their building instructions,
https://github.com/NVIDIA/nvcomp#building-cpu-and-gpu-examples-gpu-benchmarks-provided-on-github

On kvikio installation, i followed https://github.com/rapidsai/kvikio#conda .

and also exported the PATH's into ~/.bashrc

export CPATH=/usr/local/cuda/targets/x86_64-linux/include:$CPATH
export LD_LIBRARY_PATH=/usr/local/cuda/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda/bin:$PATH

from kvikio.

thomcom commented on August 16, 2024

By chance, what is your CUDA version exactly?

from kvikio.

thomcom commented on August 16, 2024

I ask because I'm trying to figure out why the current nvcomp bindings work at all. They are out of date. I found the old versions in cuda-11.0, and for some reason my environment is able to find them (I don't know why, yet).

I'm wondering, specifically, if you have an old version of nvcomp that is lurking in your include path, just like mine.

from kvikio.

ajs-88 commented on August 16, 2024

@thomcom kindly find the details below.

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

from kvikio.

ajs-88 commented on August 16, 2024

@thomcom Its a RHEL 8.6 fresh installation. All i have is the latest nvcomp library.

from kvikio.

vyasr commented on August 16, 2024

Drive-by comment: https://github.com/rapidsai/kvikio/blob/branch-22.08/python/cmake/thirdparty/get_nvcomp.cmake should be updated to use rapids_cpm_nvcomp from rapids-cmake. That will keep kvikio from getting out of sync with the rest of RAPIDS and better ensure that it stays up to date. I believe that there have been a lot of relevant bugfixes/improvements (for cudf) to nvcomp in the last couple of releases.

from kvikio.

jakirkham commented on August 16, 2024

@vyasr could you please file that as a new issue?

from kvikio.

thomcom commented on August 16, 2024

The good news is that it looks like, at core, the issue is that kvikio installs nvcomp version 2.1.0, which I think has a bug with the LZ4Compressor class and family. In version 2.3.0 LZ4Compressor has been dropped by nvcomp in favor of LZ4Manager. The less good news I think is that the bindings will have to be updated to version 2.3.0 to get rid of the issue.

from kvikio.

ajs-88 commented on August 16, 2024

@thomcom May I request, When can I expect a fix / release on this? Kindly advise

from kvikio.

thomcom commented on August 16, 2024

It will take a week or two. I'm starting work on it today or tomorrow. :)

from kvikio.

ajs-88 commented on August 16, 2024

It will take a week or two. I'm starting work on it today or tomorrow. :)

Many Thanks @thomcom

from kvikio.

thomcom commented on August 16, 2024

Updated and improved bindings are up for review at #120

from kvikio.

nvcomp LZ4 compression Issue about kvikio HOT 29 CLOSED

Comments (29)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent