Comments (8)
Has there been any implementation research work done on this since 2/2019?
from moosefs.
Btrfs supports compression. But on large file systems its performance degrade significantly over time due to fragmentation. Also Btrfs requires periodic balancing. Unfortunately Btrfs is not the best fit for Chunkservers on rotational HDDs but I would consider using it on SSDs.
from moosefs.
If chunkserver would send compressed data directly from storage to client (and client would decompress it) we could get gain in network throughput. At cost of higher CPU usage on client side.
from moosefs.
LZ4 overhead is negligible and its speed is close to RAM-to-RAM copy:
LZ4 is a very fast lossless compression algorithm, providing compression speed > 500 MB/s per core, scalable with multi-cores CPU. It also features an extremely fast decoder, with speed in multiple GB/s per core, typically reaching RAM speed limits on multi-core systems.
Also LZ4 was implemented natively in the Linux kernel 3.11.
from moosefs.
There are two separate issues raised in this thread:
- compression of chunk data on chunkserver - this is on our roadmap, but it doesn't have a high priority, as we feel that you can use other tools (like local filesystem with compression) if this is something you really need and most MooseFS installations we know store data that is already compressed, meaning this feature would be useless in them anyway; there are other features we are currently implementing that we feel will be useful in wider range of installations
- compression of chunk data in chunkserver - client communication - well, the problem is, we see more and more MooseFS installations using 10Gb networks and no compression algorithms are fast enough for that...
from moosefs.
* compression of chunk data in chunkserver - client communication - well, the problem is, we see more and more MooseFS installations using 10Gb networks and no compression algorithms are fast enough for that...
That depends on type of data. Highly compressible data might benefit from compression and reduce traffic congestion.
Also 10Gb network may not be used exclusively so compression might still yield improvements under many circumstances.
Let's just make it configurable, perhaps by chunkserver config option so compression could be enabled where required (e.g. on 100MB links).
from moosefs.
Curious, when you say:
we feel that you can use other tools (like local filesystem with compression)
what tools or filesystems have been used? I only know of ZFS that supports compression. Are there others?
from moosefs.
Compression could leverage the very handy storage classes, such that compression happens lazily:
A user could request data to be cheaply compressed using e.g. lz4 when the chunks get into 'keep' mode, and using something gzip-like if/when files become old enough to reach archive. Even two archive levels could then be useful, to request even more expensive xz-like compression for files that are even older.
The compression also would not need to happen at the time of the storage class move. It would be enough to just consider chunks that are at least that old eligible for compression, whenever the chunkserver feels it has spare CPU cycles.
Also, all copies of a chunk need not have the same compression. Then reading could prefer the less compressed version, while having redundancy with higher compression.
At least for the last stage, compression could also be avoided when the previous stage did not manage to squeeze out e.g. even a percent, as such files likely already are compressed, or contain in-compressible data.
from moosefs.
Related Issues (20)
- supports IPv6 HOT 4
- [BUG] The data displayed by mfs has garbled characters HOT 8
- mfsmaster -a restore hangs with 100% CPU usage HOT 5
- [Question] 2 copys of chunks on one chunkserver HOT 1
- [BUG] Performance impact and write amplification with CHANGELOG_SAVE_MODE = 2 HOT 9
- Do the Master and Chunk servers have to be the same architecture? HOT 3
- chunkserver: High speed rebalance blocks deletions? HOT 7
- [BUG] fuse: bad mount point `/matrix/synapse/storage/media-store/': Input/output error HOT 2
- [FEATURE] Official packages of MooseFS / MooseFS Pro for Debian 12 Bookworm HOT 2
- [BUG] mfsbdev and map + unmap + map on /dev/ndb0 = input/output error HOT 1
- [FEATURE] mfsclient mfstimeout default 0 HOT 1
- mfsmaster register error: No such file or directory HOT 3
- Can't mount MooseFS on Proxmox 8.1 properly. HOT 4
- MooseFS 3.x Erasure Code Support
- [BUG] mfsmaster hung and in unkillable D state HOT 3
- [BUG] DeprecationWarning: 'cgi' is deprecated and slated for removal in Python 3.13 HOT 2
- [FEATURE] mfsbdev as standard (TCP/Unix Socket) NBD server HOT 1
- [BUG] Empty chunks and copies with different checksums HOT 8
- Recovery data from chunks without metadata :) HOT 4
- [BUG] FUSE mount forces DIRECT I/O mode with Samba
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from moosefs.