Giter VIP home page Giter VIP logo

Comments (23)

martymac avatar martymac commented on August 25, 2024

Hello Alex,

Thanks for your feedback.

Yes, your are understanding correctly and that behaviour is by design. It is explained in fpart(1), see description for option '-s'.

The problem is: when you crawl the files and want partitions with a maximum size set, you never know if every single file you will encouter will be able to fit. As fpart's job is to ensure no file is left out, it has to put it somewhere. Special partition '0' has been chosen because it allows to have a fixed -and known- partition number for such cases and that partition is the only one that can have its size > the size you have chosen with option -s. No option is provided to change that 'special partition' number.

There is no such behaviour with option -n because it is not needed as you do not limit the size of produced partitions. As a consequence, you are right, when using option '-n' partition 0 has no special meaning.

If the presence of that partition is a problem when it is empty, it could be removed in a second pass. That idea is already in the TODO list, see: https://github.com/martymac/fpart/blob/master/TODO#L23. I may work on it but, to be honest, this is not a high priority feature right now.

I'll close that issue for now. Feel free to re-open it if needed :)

Best regards,

Ganael.

from fpart.

alexhunsley avatar alexhunsley commented on August 25, 2024

Hi Ganael, thanks for the explanation.

Can I suggest a scheme that would make the output more consistent? How about if partition 0 is only used for files that are too big? And then partitions 1 and up contain the files that were ok. This way, the output of fpart is always interpretable and understandable without needing to know what flags fpart was run with, and there is no ambiguity.

You see, I'm writing a script that uses fpart and it feels strange that the part of my script that uses the output from fpart has to worry about whether I passed -n or -s etc into the fpart command. Ideally, that detail would be irrelevant, the output data would stand on its own.

It's also foreseeable that I'd store the output of fpart to come back to later. With the current use of partition 0, its contents can mean one of two things, and I don't think it's possible to tell which if you don't have the original invocation handy. It seems to be a bit of arbitrary complexity where it's not needed, if that makes sense.

from fpart.

alexhunsley avatar alexhunsley commented on August 25, 2024

If you don't feel that's a change worth making, I'll probably fork the repo and have a shot myself!
I appreciate that if you just made this change to the default behaviour it would break backwards compatibility.

from fpart.

martymac avatar martymac commented on August 25, 2024

Hello Alex,

I am not sure to understand what you mean exactly. Currently, I think that fpart's output already stands on its own because you can easily skip partition 0 if it is empty (and if it is the case, you know for sure that option -s has been used). If it is not empty, you have to take into account every partition output in your consumer program if you want to reach all files.

Adding a partition 0 for option -n would create a bucket that would never be used in that mode and would seem odd to users. Moreover, you wouldn't be able to guess afterwards if option -s or -n has been used neither. It seems to me that it would complicate code and only shift the problem.

Also, as you mentioned, it would break backwards compatibility with existing tools.

Maybe the easiest way to clean up empty partition 0 would be to add a second pass and remove it if empty ? That way, you would get a really consistent output, but I am not sure if that's what you want exactly...

from fpart.

hjmangalam avatar hjmangalam commented on August 25, 2024

from fpart.

martymac avatar martymac commented on August 25, 2024

Hello Harry,

OK, I think I better understand the problem.

There are two cases.

In non-live mode, fpart with -s option uses partition 0 only for files bigger than the given partition size (just 'because you have to put them somewhere'). No partition can exceed the max size given, except partition 0.

In live mode, fpart's behaviour is different as it produces partitions on the fly and does not cache them (so we really talk about a single partition : the current one). The given -s flag is used to check whether the given max partition size has been reached ; it cannot be as strict as in non-live mode as, again, you have to put the current file somewhere (and it has to be the current partition), so the max size is more informational and may (will, in fact) be exceeded.

(as a side note: I should probably add details about that in the man page)

That behaviour in live mode would be hard to change as we would have to generate partition 0 and cache it to finally produce it at the end of the run (no other choice as you don't know what files you will encounter during FS crawling ; the last one could be a huge one). As live mode has been designed to allow starting syncing the file tree while fpart is running you would have to sync all those big files in a single run at the end of fpart pass. That's probably not a good idea :/

On the other hand, non-live mode output could be fixed by always numbering output partitions numbers from 1, except when option -s is used and partition 0 contains files. In that case, a partition 0 could appear (containing only big files), with option -s only.

Alex, is that what you meant ? Harry, what do you think ?

from fpart.

alexhunsley avatar alexhunsley commented on August 25, 2024

Hi Ganael,

Yes, Harry has given a good example of what I’m talking about: the ambiguity of any files listed in a partition 0.

I can’t speak to how live mode would be impacted by any change, as I’ve never used it yet, but I’m a definite supporter of the idea of “regular” partitions starting at index 1 and reserving partition 0 for only files that were too big.

If it reduces ambiguity, you could also rename partition 0 in a second pass, if non-empty, to be e.g. files.overflow instead of files.0. Just to make it very clear it is not just a regular partition like the others.

from fpart.

hjmangalam avatar hjmangalam commented on August 25, 2024

from fpart.

martymac avatar martymac commented on August 25, 2024

Hello,

Thanks for your feedback.

Harry, I get the idea but it seems a bit odd to me to put big files in a dedicated partition in live mode. The original idea of that mode was to go fast, not cache anything and act as quickly as possible on generated file lists. That would break that paradigm because partition 0 would have to be cached, and would only be complete at the end of the run. As a consequence it means that fpart handlers for partition 0 would only be triggered at the end of the run too. If you end up with a really huge partition 0 (think about someone using '-s 1k'... nearly all files would end up in there) you would have to start acting on it (start a sync for example) only after FS crawling, which is the worst possible scenario.

For the consumer part, that would also complexify the code as you would have to handle that special partition manually and probably re-code a splittin
g scheme while fpart can already do a good part of the job.

That's why I think that current handling, if not perfect, is a good balance between simplicity (KISS) and efficiency.

Anyway, if you think it's really necessary to act on big file separately, another -simpler- approach could be to add an option to just exclude files bigger than max partition size and log them to stdout (even with option -o enabled, to avoid caching an additional partition), leaving the consumer program do whatever it wants with that. That would probably be a compromise and would better fit fpart's original design. But I am still not conviced this is something we want for live mode.

Anyway, I got the idea : for both modes, I will start numbering regular partitions from 1. Partition 0 may appear only in non-live mode, when option -s is used and it contains files. I'll put that on the TODO list and work on it ASAP.

Merry Christmas to both of you,

Ganael.

from fpart.

hjmangalam avatar hjmangalam commented on August 25, 2024

from fpart.

alexhunsley avatar alexhunsley commented on August 25, 2024

Ganael,
That sounds good! Always starting the regular data at partition 1 definitely will make the output easier to consume.

Merry Christmas both!

from fpart.

martymac avatar martymac commented on August 25, 2024

Hello,

Harry, I've added suggested changes for live mode to the TODO list. I'll work on that ASAP.

Thanks again to both of you for your feedback !

(I'll leave that issue open for now)

from fpart.

hjmangalam avatar hjmangalam commented on August 25, 2024

from fpart.

martymac avatar martymac commented on August 25, 2024

Hello,

I've pushed a first update that makes fpart start numbering partitions at '1' instead of '0', as requested.

Could you try it and tell me if it fits your needs ? Future updates will come to skip empty partition '0' as well as to allow excluding too big files when option -s is used, but I still have to work on that.

Cheers,

Ganael.

from fpart.

hjmangalam avatar hjmangalam commented on August 25, 2024

from fpart.

hjmangalam avatar hjmangalam commented on August 25, 2024

from fpart.

martymac avatar martymac commented on August 25, 2024

Hello Harry,

Thanks for your feedback.

Regarding building fpart from source, it is explained here :

https://www.fpart.org/#installing-from-source

As precised in my previous message, I still have to work on two changes :

  • skip partition 0 if it is empty
  • provide an option to exclude (and print) too big files when in live mode and option -s is used

so please be patient, I'll work on that ASAP :)

Best regards,

Ganael.

from fpart.

hjmangalam avatar hjmangalam commented on August 25, 2024

from fpart.

martymac avatar martymac commented on August 25, 2024

Hello,

I've pushed the missing bits. It adds option -S (Skip big files). Skipped files will appear immediately (stdout) as belonging to a special partition 'S' (as in 'S'kipped). I hope that will match what you were looking for.

I'll close that PR for now, any feedback welcome :)

(and thanks again for your suggestions, helping fpart getting better!)

from fpart.

hjmangalam avatar hjmangalam commented on August 25, 2024

from fpart.

martymac avatar martymac commented on August 25, 2024

Hi Harry,

Sorry, this is a mistake, there's no point in forbidding printing excluded files when option -o is used. This is fixed (I don't know what I had in mind!).

Regarding the other part of the question, as already discussed, sending skipped files to a specific partition file could be feasible but would require more work and I don't know if it would make sense in live mode where partition files have to be created fast. It would break that paradigm.

Also, you would end up with a huge partition file, where triggers would probably not make sense neither (so we would have to skip them for that partition, introducing an exception in their handling).

Printing them to STDOUT as skipped makes more sense to me ; as I wrote before I think it is a good balance between simplicity and efficiency, while respecting the global spirit of the tool.

Cheers,

Ganael.

from fpart.

hjmangalam avatar hjmangalam commented on August 25, 2024

from fpart.

martymac avatar martymac commented on August 25, 2024

Hello Harry,

Skipped files are printed in the same way standard partitions/files are. I understand your concern as it has been chosen at the beginning of the project and never tuned since. I've never had feedback about that ; you're right there may be a simpler display format that could be used.

As that request may not be that urgent and is not specific to the special partition (if it should be changed, let's change it also for other partitions), can you open a separate bug report ? I'll work on it a bit later...

Best regards,

Ganael.

from fpart.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.