Giter VIP home page Giter VIP logo

Comments (29)

thomcom avatar thomcom commented on August 16, 2024 2

We'd hoped that the nvcomp team would improve the APIs in 2.1.0, but instead they dropped them.

from kvikio.

thomcom avatar thomcom commented on August 16, 2024 1

I don't know why the compressor is not reducing the file size of something like the kjv. I'm going to ask around to see who in the C++ group on nvcomp can comment.

from kvikio.

madsbk avatar madsbk commented on August 16, 2024 1

Thanks @ajs-88, it sure looks like a bug in the nvcomp bindings.

from kvikio.

vyasr avatar vyasr commented on August 16, 2024 1

Yup, #106

from kvikio.

jakirkham avatar jakirkham commented on August 16, 2024

cc @thomcom

from kvikio.

ajs-88 avatar ajs-88 commented on August 16, 2024

Dear guys,

Any help / advise will be greatly appreciated.

from kvikio.

ajs-88 avatar ajs-88 commented on August 16, 2024

and also, when i use CascadedCompressor on the same program, the result is different.

compressor = nvc.CascadedCompressor(data_gpu.dtype)

File : /home/arul/Downloads/kjv10.txt Size: 4.4 MB
Compressed Size: 5.8 MB

from kvikio.

madsbk avatar madsbk commented on August 16, 2024

What is the size of the uncompressed data?

print('Uncompressed Size: ', humanize.naturalsize(data_gpu.nbytes))

Also, is /home/arul/Downloads/kjv10.txt available somewhere?

from kvikio.

ajs-88 avatar ajs-88 commented on August 16, 2024

Hi @madsbk,

I used the same file from the notebook example,
https://github.com/rapidsai/kvikio/blob/branch-22.08/notebooks/nvcomp_vs_zarr_lz4.ipynb

http://textfiles.com/etext/NONFICTION/kjv10.txt

As mentioned above when i use the LZ4Compressor the compressed size is Compressed Size: 17.4 MB & when i use the CascadedCompressor the compressed size is Compressed Size: 5.8 MB. But Actual file size is 4.4MB.

For data_gpu.nbytes:
CascadedCompressor: Compressed Size: 17.3 MB
LZ4Compressor: Compressed Size: 17.3 MB

kindly advise if my answer is not clear.

Thanks,

from kvikio.

madsbk avatar madsbk commented on August 16, 2024

4.4MB is the actual file size on disk, I am wondering what the size of the data (on device) is before compression?

from kvikio.

ajs-88 avatar ajs-88 commented on August 16, 2024

4.4MB is the actual file size on disk, I am wondering what the size of the data (on device) is before compression?

[arul@arul-desktop-01 Downloads]$ md5sum kjv10.txt
582b1be7059586fb01ce75ff51e8e0a3 kjv10.txt

from kvikio.

madsbk avatar madsbk commented on August 16, 2024

In the following, I set DTYPE = cp.int8 to avoid up-scaling from 8 to 32 bits.

import os
import cupy as cp
import numpy as np
import kvikio.nvcomp as nvc
import humanize

DTYPE = cp.int8
dtype = cp.dtype(DTYPE)

filename = 'kjv10.txt'
print('File :', filename, ' Size: ', humanize.naturalsize(os.path.getsize(filename)))
testfile = open(filename).read()
data = np.frombuffer(bytes(testfile, 'utf-8'), dtype=np.int8)

data_gpu = cp.array(data, dtype=DTYPE)
print('Uncompressed Size: ', humanize.naturalsize(data_gpu.nbytes))

compressor = nvc.LZ4Compressor(data_gpu.dtype)
compressed = compressor.compress(data_gpu)
gpu_comp_file = open('kjv10-gpu-compressed.txt.lz4', 'wb')
gpu_comp_file.write(compressed.tobytes())
gpu_comp_file.close()

print('Compressed Size: ', humanize.naturalsize(compressed.size))
del compressor
del compressed

Which give:

File : kjv10.txt  Size:  4.4 MB
Uncompressed Size:  4.3 MB
Compressed Size:  4.4 MB

However, I don't know enough about the LZ4 compression algorithm to say why it fails compressing the data :/
@thomcom do you know?

from kvikio.

ajs-88 avatar ajs-88 commented on August 16, 2024

@madsbk Added to that,
DTYPE = cp.int8

as you advised,

LZ4Compressor returns,
Compressed File Size: 4.4 MB
Compressed data_gpu.nbytes Size: 4.3 MB

Cascaded Compressor returns,
Compressed File Size: 5.8 MB
Compressed data_gpu.nbytes Size: 4.3 MB

on the other hand, i tried to compress the file using linux command line lz4 compression,

$ lz4 kjv10.txt > kjv10.txt.lz4

2.2MB

MD5Sum
b79b6c349ee0e9be23c33d3d618fbe0f kjv10.txt.lz4

from kvikio.

ajs-88 avatar ajs-88 commented on August 16, 2024

I even tried to validate the file encoding right but still compressed file size is of no difference and its corrupted. :(

import os
import cupy as cp
import numpy as np
import kvikio.nvcomp as nvc
import humanize
import chardet

DTYPE = cp.int8
dtype = cp.dtype(DTYPE)

filename = '/home/arul/Downloads/kjv10.txt'
detector = chardet.UniversalDetector()

for line in open(filename, 'rb'):
    detector.feed(line)
    if detector.done:
        break
detector.close()

print(detector.result)
file_encoding = detector.result['encoding']

print('File :', filename, ' Size: ', humanize.naturalsize(os.path.getsize(filename)))
testfile = open(filename).read()

data = np.frombuffer(bytes(testfile, file_encoding), dtype=np.int8)

data_gpu = cp.array(data, dtype=dtype)
compressor = nvc.LZ4Compressor(data_gpu.dtype)
#compressor = nvc.CascadedCompressor(data_gpu.dtype)
compressed = compressor.compress(data_gpu)
gpu_comp_file = open('/home/arul/Downloads/kjv10-gpu-compressed.txt.lz4', 'wb')
gpu_comp_file.write(compressed.tobytes())
gpu_comp_file.close()

print('Compressed File Size: ', humanize.naturalsize(compressed.size))
print('Compressed data_gpu.nbytes Size: ', humanize.naturalsize(data_gpu.nbytes))
del compressor
del compressed

from kvikio.

ajs-88 avatar ajs-88 commented on August 16, 2024

Hi @thomcom and @madsbk ,

I tried the C++ example from https://github.com/NVIDIA/nvcomp
With this examples, the results are

$ ./benchmark_lz4_chunked -f /home/arul/Downloads/kjv10.txt

files: 1
uncompressed (B): 4432803
comp_size: 2591439, compressed ratio: 1.7106
compression throughput (GB/s): 0.4120
decompression throughput (GB/s): 1.0734

from kvikio.

thomcom avatar thomcom commented on August 16, 2024

I'm looking at the issue based on the example you shared now @ajs-88

from kvikio.

thomcom avatar thomcom commented on August 16, 2024

I'm confused as to why our nvcomp bindings work at all, as the current version of nvcomp from github.com/NVIDIA/nvcomp doesn't have the same classes in include/nvcomp/lz4.hpp as the one's that kvikio expects. I also don't see how nvcomp is installed in the kvikio environment. @ajs-88, how did you install nvcomp and kvikio?

from kvikio.

ajs-88 avatar ajs-88 commented on August 16, 2024

@thomcom first of all thanks for looking into it. I downloaded the latest nvcomp library from https://developer.nvidia.com/nvcomp. Then i copied them to /usr/local/cuda [inculde, lib64] respectively.

On https://github.com/NVIDIA/nvcomp examples i followed their building instructions,
https://github.com/NVIDIA/nvcomp#building-cpu-and-gpu-examples-gpu-benchmarks-provided-on-github

On kvikio installation, i followed https://github.com/rapidsai/kvikio#conda .

and also exported the PATH's into ~/.bashrc

export CPATH=/usr/local/cuda/targets/x86_64-linux/include:$CPATH
export LD_LIBRARY_PATH=/usr/local/cuda/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda/bin:$PATH

from kvikio.

thomcom avatar thomcom commented on August 16, 2024

By chance, what is your CUDA version exactly?

from kvikio.

thomcom avatar thomcom commented on August 16, 2024

I ask because I'm trying to figure out why the current nvcomp bindings work at all. They are out of date. I found the old versions in cuda-11.0, and for some reason my environment is able to find them (I don't know why, yet).

I'm wondering, specifically, if you have an old version of nvcomp that is lurking in your include path, just like mine.

from kvikio.

ajs-88 avatar ajs-88 commented on August 16, 2024

@thomcom kindly find the details below.

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

from kvikio.

ajs-88 avatar ajs-88 commented on August 16, 2024

@thomcom Its a RHEL 8.6 fresh installation. All i have is the latest nvcomp library.

from kvikio.

vyasr avatar vyasr commented on August 16, 2024

Drive-by comment: https://github.com/rapidsai/kvikio/blob/branch-22.08/python/cmake/thirdparty/get_nvcomp.cmake should be updated to use rapids_cpm_nvcomp from rapids-cmake. That will keep kvikio from getting out of sync with the rest of RAPIDS and better ensure that it stays up to date. I believe that there have been a lot of relevant bugfixes/improvements (for cudf) to nvcomp in the last couple of releases.

from kvikio.

jakirkham avatar jakirkham commented on August 16, 2024

@vyasr could you please file that as a new issue?

from kvikio.

thomcom avatar thomcom commented on August 16, 2024

The good news is that it looks like, at core, the issue is that kvikio installs nvcomp version 2.1.0, which I think has a bug with the LZ4Compressor class and family. In version 2.3.0 LZ4Compressor has been dropped by nvcomp in favor of LZ4Manager. The less good news I think is that the bindings will have to be updated to version 2.3.0 to get rid of the issue.

from kvikio.

ajs-88 avatar ajs-88 commented on August 16, 2024

@thomcom May I request, When can I expect a fix / release on this? Kindly advise

from kvikio.

thomcom avatar thomcom commented on August 16, 2024

It will take a week or two. I'm starting work on it today or tomorrow. :)

from kvikio.

ajs-88 avatar ajs-88 commented on August 16, 2024

It will take a week or two. I'm starting work on it today or tomorrow. :)

Many Thanks @thomcom

from kvikio.

thomcom avatar thomcom commented on August 16, 2024

Updated and improved bindings are up for review at #120

from kvikio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.