Giter VIP home page Giter VIP logo

a-fast-transformer-based-general-purpose-losslesscompressor's Introduction

TRACE: A Fast Transformer-based General-Purpose Lossless Compressor

Introduction

This repository contains the source code and dataset link mentioned in WWW 2022 accepted paper "TRACE:A Fast Transformer-based General-Purpose Lossless Compressor". TRACE is a deep-learning based lossless compressor which compresses byte streams. The model estimates probability of incoming bytes and arithmetic coder utilize this probability to encode. We want to focus at model sequencial representation ability so our method do not have pretraining stage, which mean the program start compression when model knows nothing, and adaptively adjust model parameters during compression. If you want higher compression ratio, do several epochs pretraining would help a lot.

The code of performer is from https://github.com/google-research/google-research/tree/master/performer/models/slim_performer

Requirements

Nvidia-driver 455.38 CUDA 11.1 cudnn 7605 pytorch==1.7.0 numpy==1.18.5

Usage

git clone https://github.com/mynotwo/A-Fast-Transformer-based-General-Purpose-LosslessCompressor.git
cd ./A-Fast-Transformer-based-General-Purpose-LosslessCompressor
mkdir data
cd data

Then download dataset from https://drive.google.com/file/d/18qvfbeeOwD1Fejq9XtgAJwYoXjSV8UaC/view?usp=sharing and put it into \data. Then do tar zcxf compression_data.tar.gz

To compress data, e.g. file named 'dickens' which is already provided in this repository

python compressor.py --input_dir ./dickens --batch_size 512 --gpu_id 0 --prefix dickens --hidden_dim 256 --ffn_dim 4096 --seq_len 8 --learning_rate 1e-3 --vocab_dim 64

and the compressed file would be dickens_64_256_4096_bs512_random_seq32.compress.combined

a-fast-transformer-based-general-purpose-losslesscompressor's People

Contributors

mynotwo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

a-fast-transformer-based-general-purpose-losslesscompressor's Issues

Does it need to execute the transformer model to decompress files?

Hello, I am interested in this work, but I am not familiar with DNN-based compressors. I know that we need the transformer model to compress the data.

In standard compressors, such as zlib and 7zip, utilize deflate and inflate algorithms to compress and decompress. So, in the DNN-based compressors, do we also need the model to decompress the compressed data?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.