Giter VIP home page Giter VIP logo

data-compressor's Introduction

Data Compressor - Framework for smart meter compression algorithms

This software implements a number of lossless smart meter data compression algorithms and can also be used to evaluate them. If you use this software, please cite

[1] Andreas Unterweger and Dominik Engel. Resumable Load Data Compression in Smart Grids. IEEE Transactions on Smart Grid, 6(2):919-929, March 2015.

[2] Andreas Unterweger, Dominik Engel, and Martin Ringwelski. The Effect of Data Granularity on Load Data Compression. In Energy Informatics 2015 - 4th D-A-CH Conference, EI 2015, volume 9424 of Lecture Notes in Computer Science, pages 69-80. Springer International Publishing, Switzerland, November 2015.

These papers are also available as BibTeX entries in the papers.bib file for convenience.

This software comes with no warranty whatsoever. You may use it without charge, as long as the original copyright notices remain and the papers listed above are cited. See the LICENSE file for details.

How to build the software

On Windows, start Visual Studio 2013 and open DataCompressor/build/MSVC/DataCompressor.sln. Build the solution, i.e., all projects. Other versions of Visual Studio may work fine, but are currently not supported explicitly.

On Linux, make sure that GNU Make 3.81 and gcc 4.8.4 are installed. Navigate to DataCompressor/build/gcc and type make. Other versions of GNU Make and gcc may work fine, but are currently not supported explicitly.

This software has been tested on Intel x86-64 and ARM v7 precessors (Raspberry Pi 2). On Windows, both 32-bit and 64-bit versions of the software work out of the box. On Linux, gcc-specific parameters can be added to DataCompressor/common/build/gcc/common.mak (e.g., COMMON_CFLAGS) to build for a different "bitness" than the compiler/target default.

How to use the software

The front-end of the software is DCCLI, a command line interface for Data Compressor, which resides in the folder named DCCLI. There is a short description of the software in DataCompressor/DCCLI/doc/readme.md. Apart from that, the software itself outputs notes on usage when called without arguments or with incorrect ones.

Always use the release version of the software when evaluating algorithms. On Linux, you can use make test in either DataCompressor/build/gcc/ or DataCompressor/DCCLI/build/gcc/ to compress the supplied test file with the DEGA algorithm [1] and decompress it again for verification.

Here are some example calls for the evaluation from [2] for the MIT REDD data set, where each channel is first pre-processed using

./DCCLI "$c" "$temp_ref" decode csv separator_char=' ' column=2 # encode csv

where $c is the input channel file name and $temp_ref is the output file name of the pre-processed file. To process this file to a compressed output file, $temp_out, proceed as follows:

For DEGA coding, use ./DCCLI $temp_ref $temp_out decode csv # encode normalize # encode diff # encode seg # encode bac adaptive

For LZMH coding, use ./DCCLI $temp_ref $temp_out encode lzmh

For A-XDR coding, use ./DCCLI $temp_ref $temp_out decode csv # encode normalize

For combined compression and decompression, use, e.g., for DEGA: ./DCCLI $temp_ref $temp_out decode csv # encode normalize # encode diff # encode seg # encode bac adaptive # decode bac adaptive # decode seg # decode diff # decode normalize # encode csv

How to use the software on less powerful hardware

The software allows specifying small(er) bit sizes for I/O and processing.

For I/O, the option IO_SIZE_BITS documented in DataCompressor/common/doc/overview.txt can be set to reduce the I/O bit size. On Linux, you can set IO_SIZE_BITS in DataCompressor/common/build/gcc/ to a corresponding value for convenience. In the debug version of the software, the number of usable bits for I/O is printed on application startup.

For processing, most encoders/decoders have parameters like the block size or the value size (in bits). The parameters of each encoder are described in DataCompressor/DCLib/doc/readme.md. Other parameters like memory buffer sizes are documented in DataCompressor/DCCLI/doc/readme.md.

How to modify the software

The software is split into separate projects with an according folder structure. Each project has a short description in $project_name/doc/overview.txt. $projectname DCCLI is the Data Compressor command line interface for the data compression library, DCLib, which uses the I/O library DCIOLib. The commonly shared code can be found in the folder DataCompressor/common/.

When modifying the software, make sure to build the debug versions of all projects. In Visual Studio, this can be done by choosing the Debug configuration and rebuilding the solution. With GNU make, make debug builds a debug version.

Note that, by default, debug and release versions cannot be built simultaneously at the moment using the Makefiles. When switching between debug and release versions, make sure to call make clean in between. If you wish to build some of the projects in their debug version, it is possible to use the Makfiles in $project_name/build/gcc individually with make debug (or make).

If you modify the software, please keep it mostly C90-compliant, as assured by the Makefile options in DataCompressor/common/build/gcc/. This is done for compatibility reasons (older Visual verisons and architectures with compilers that do not support C99).

data-compressor's People

Contributors

dustsigns avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

brhat

data-compressor's Issues

Update to newer Visual Studio versions

Either add an additional solution for newer Visual Studio versions (2017, 2019) or verify that the current solution also works fine with these versions.

Add .gitignore file

Add a .gitignore file so that .o files, .a files and the like are not tracked in version control.

Add signed value support for diff module

The diff encoder currently only supports unsigned input values. Similarly, the diff decoder only supports unsigned output values.

An option, e.g., signed, should be added which allows switching between signed and unsigned values. It should default to unsigned to retain backwards compatibility.

Add tests for different platforms

Add automated tests for different platforms (x86, x86-64, ARM) to validate compilation and encoding/decoding test data with IO_SIZE_BITS of 16, 32 and 64.

Implement file reading/writing with fractional bytes

DCCLI reads and writes full bytes from and to input and output files, respectively. If only a fraction of the last byte is used, this may lead to decoding errors. Adding, e.g., a module which adds/parses a file header can solve this issue by reserving additional space to store the number of used bits.

From the documentation:

If a fractional number of bytes (i.e., a number of bits not divisible by eight) is written to the output file, decoding said output file later may lead to errors at the last byte when processing the superfluous bits at the end of the file.

Range checks do not work when IO_SIZE_BITS==8*sizeof(size_t)

When sizeof(io_int_t) is exactly sizeof(size_t), i.e., as large as IO_SIZE_BITS in bytes, range checks and other operations do not work properly anymore, some of them without any warnings or errors. As documented in the common documentation:

If IO_SIZE_BITS is the same size as size_t, the Read/Write functions in dependent libraries do not work properly if the MSB of a size_t variable specifying the size to be read/written is used. For example, if IO_SIZE_BITS is 32 and sizeof(size_t) is 4, the maximum size (parameter value) that the Read/Write function can work with is 2^31 - 1, i.e., the 32nd bit cannot be used. If it is used, the return value of the functions will be interpreted as an error (since it is interpreted as a negative number)

Two solutions are possible:

  1. Disallow parameters, e.g., valuesize to be as large as IO_SIZE_BITS and issue an error. This may have undesired side effects as the machine word size could not be used for I/O operations on the machine, e.g., 64-bit reads would not be supported on 64-bit machines.
  2. (preferred) Add more sophisticated range checks to issue warnings or errors whenever there are overflows/underflows. This might require a change in architecture (e.g., additional out parameters instead of return values to reserve the latters for errors only).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.