Giter VIP home page Giter VIP logo

Comments (8)

TerraTech avatar TerraTech commented on May 17, 2024 1

Has there been any implementation research work done on this since 2/2019?

from moosefs.

onlyjob avatar onlyjob commented on May 17, 2024 1

Btrfs supports compression. But on large file systems its performance degrade significantly over time due to fragmentation. Also Btrfs requires periodic balancing. Unfortunately Btrfs is not the best fit for Chunkservers on rotational HDDs but I would consider using it on SSDs.

from moosefs.

marcin-github avatar marcin-github commented on May 17, 2024

If chunkserver would send compressed data directly from storage to client (and client would decompress it) we could get gain in network throughput. At cost of higher CPU usage on client side.

from moosefs.

onlyjob avatar onlyjob commented on May 17, 2024

LZ4 overhead is negligible and its speed is close to RAM-to-RAM copy:

LZ4 is a very fast lossless compression algorithm, providing compression speed > 500 MB/s per core, scalable with multi-cores CPU. It also features an extremely fast decoder, with speed in multiple GB/s per core, typically reaching RAM speed limits on multi-core systems.

Also LZ4 was implemented natively in the Linux kernel 3.11.

from moosefs.

chogata avatar chogata commented on May 17, 2024

There are two separate issues raised in this thread:

  • compression of chunk data on chunkserver - this is on our roadmap, but it doesn't have a high priority, as we feel that you can use other tools (like local filesystem with compression) if this is something you really need and most MooseFS installations we know store data that is already compressed, meaning this feature would be useless in them anyway; there are other features we are currently implementing that we feel will be useful in wider range of installations
  • compression of chunk data in chunkserver - client communication - well, the problem is, we see more and more MooseFS installations using 10Gb networks and no compression algorithms are fast enough for that...

from moosefs.

onlyjob avatar onlyjob commented on May 17, 2024
* compression of chunk data in chunkserver - client communication - well, the problem is, we see more and more MooseFS installations using 10Gb networks and no compression algorithms are fast enough for that...

That depends on type of data. Highly compressible data might benefit from compression and reduce traffic congestion.
Also 10Gb network may not be used exclusively so compression might still yield improvements under many circumstances.
Let's just make it configurable, perhaps by chunkserver config option so compression could be enabled where required (e.g. on 100MB links).

from moosefs.

jkiebzak avatar jkiebzak commented on May 17, 2024

Curious, when you say:

we feel that you can use other tools (like local filesystem with compression)

what tools or filesystems have been used? I only know of ZFS that supports compression. Are there others?

from moosefs.

inkdot7 avatar inkdot7 commented on May 17, 2024

Compression could leverage the very handy storage classes, such that compression happens lazily:

A user could request data to be cheaply compressed using e.g. lz4 when the chunks get into 'keep' mode, and using something gzip-like if/when files become old enough to reach archive. Even two archive levels could then be useful, to request even more expensive xz-like compression for files that are even older.

The compression also would not need to happen at the time of the storage class move. It would be enough to just consider chunks that are at least that old eligible for compression, whenever the chunkserver feels it has spare CPU cycles.

Also, all copies of a chunk need not have the same compression. Then reading could prefer the less compressed version, while having redundancy with higher compression.

At least for the last stage, compression could also be avoided when the previous stage did not manage to squeeze out e.g. even a percent, as such files likely already are compressed, or contain in-compressible data.

from moosefs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.