Hi, thanks for this great backup solution! :-) I'm wondering why is

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Increasing memory usage about scat HOT 7 OPEN

roman2k commented on June 14, 2024

Increasing memory usage

from scat.

Comments (7)

goblin commented on June 14, 2024

By the way, here's the current output from -stats (I'm not exactly sure how to interpret the rates and multipliers):

           PROC INST            RATE            USE          QUOTA         FILL
           zero x1         6.8 MiB/s        2.4 TiB              ∞            ∞
            one x3         5.9 MiB/s        2.4 TiB              ∞            ∞
            two x0          11 MiB/s        2.4 TiB              ∞            ∞
          three x0         8.1 MiB/s        2.4 TiB              ∞            ∞
          split x1          12 MiB/s                                           
        backlog x24         15 MiB/s                                           
       checksum x0          26 MiB/s                                           
          index x0          11 MiB/s                                           
           gzip x0          12 MiB/s                                           
         parity x0          16 MiB/s                                           
            cmd x4          14 MiB/s                                           
          group x0             0 B/s                                           
         concur x21         15 MiB/s                                           
   (goroutines) x296

from scat.

Roman2K commented on June 14, 2024

Thanks for the detailed report!

I too noticed the ever increasing memory usage which definitely looks like a memory leak.

Could you try recompiling with a new version of Go?

from scat.

goblin commented on June 14, 2024

Thanks for the quick reply :-)

This was freshly compiled using Go 1.10.3, which I believe is the latest.

However, there's one important mistake I made: I had this other memory increase tested with restic, not zbackup (I bailed on zbackup early cause it seemed too slow). So it's entirely possible it's due to restic's chunker. (and it was restic that died after 6 TiB, not zbackup)

Later I'll try to run it with perf and pprof and see if I can figure it out where the leak is coming from. I'm a Go newbie though so it might be hard ;-)

from scat.

goblin commented on June 14, 2024

OK, so I ran some initial tests with pprof.

I started with a simple proc of split | { checksum | index - }, and the memory was increasing, although not as fast as in the original post. I fed it totally random data so there was no duplicate chunks.

I discovered it seems to be leaky by design: in procs/index.go:62, it's assigning the chunk-hash to an in-memory map. It later uses that map to see if the chunk was already processed. I originally imagined it wouldn't do that, as it can check that by seeing if the appropriate filename exists in the output directory.

So I don't see a simple way of fixing that, short of changing how it works and possibly making it slower in the process (although the filesystem checks can perhaps be cached by the OS).

I then ran it again with the original proc and some real data from tar, and there was way more places where large chunks of data were allocated. Some of them were shrinking in the process, but overall the memory consumption grew, of course. Most notable were scat/split (*splitter) Next, scat/stores/copies (*Reg) List and scat/stores/copies (*List) Add.

My plan is to rewrite it, so that it uses an on-disk database of chunks. I'll need this also for other features, such as being able to restore only particular files rather than an entire backup, or being able to keep track of tape/disk changes (i.e. backing up a huge filesystem to many smaller BluRays, tapes, or USB HDDs, only few of which are connected at a given time). This should also help with #23, as it'll be easier to rename the output chunks then (and group them into bigger ones, to also hide the individual chunks' sizes).

from scat.

Roman2K commented on June 14, 2024

Hi @goblin - glad you're still active on this project and thanks for having investigated the leak. I must admit though, I'm not using scat at the moment and most of the internals I have forgotten about, nor would I have the incentive to look at them in details. However from what I understand, I think your idea of rewriting procs/index to an on-disk database seems sensible. Index history would have to be stored within that database instead of git (since it wouldn't be a simple text file anymore), but other than that, why not. Good luck! I'd be curious to see if this this fixes the leak. Hopefully it will 🍀

May I add, I still do believe in the idea behind the project and still need such a tool. I've since fallen back to cleartext syncing to Google Drive 😫 to at least have some kind of backup despite the privacy issues and risks of loss. It's just that some open issues were preventing me from using scat as I initially envisioned it and I didn't have the guts to address them head on. I do have brewing in mind since the past few years to either give another go at it in the current code base, or rewrite the whole thing in Ruby. Yes, single-threaded, slow Matz Ruby - so enjoyable to code in that everything feels possible: easy to experiment, tinker with, tear apart and rewrite, or even... make performant, paradoxically. Should that last point prove infeasible, there's Crystal, hehe.

from scat.

goblin commented on June 14, 2024

I tried Ruby a few years ago, and I'm much more fond of learning Go at the moment ;-) Especially given that you've done so much work on it in Go.

from scat.

goblin commented on June 14, 2024

It's just that some open issues were preventing me from using scat as I initially envisioned it and I didn't have the guts to address them head on.

Which issues, specifically?

from scat.

Increasing memory usage about scat HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent