Giter VIP home page Giter VIP logo

Comments (42)

casperdcl avatar casperdcl commented on May 10, 2024

Yes, I'm aware of this and working on a fix. It's good to still have a dynamic miniters (we don't want to check mininterval every iteration because that would imply a call to time() every iteration, which is rubbish for overhead.) My idea is to decrease miniters dynamically (as well increase dynamically like in the current version)

from tqdm.

BertrandBordage avatar BertrandBordage commented on May 10, 2024

Additional thoughts:

  • Why not remove miniters? This parameter seems useless, I really can’t imagine a case where one would use it instead of mininterval. Even mininterval is not really useful, it’s cool having it and we should keep it, but it’s unlikely that someone wants to see a slower or faster update rate than 0.1 s.
  • mininterval, like ncols (and miniters if we keep it), are not PEP8 names, and could be easier to understand. I suggest refresh_time for mininterval and width for ncols (the number of columns being the unit, not what it means to users).

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

I never really intended miniters or mininterval for end-users to make use of... like I said, they are really internal parameters for the sake of optimisation

from tqdm.

BertrandBordage avatar BertrandBordage commented on May 10, 2024

Ah didn’t realize it was a way to prevent from executing time() too much.
That makes the problem more complex, indeed.
So if the final problem is optimization, the solution is simply to write tqdm in Cython instead of Python. It will be way faster than it currently is, and executing time directly from C will take a ridiculously small time. Function and method calls in Python are way slower than getting a timestamp in C.

If needed, I can help on this part, as I’ve been using Cython for years and I’m specialized in Python/Cython/algorithm optimisation.

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

We've tested... getting the timestamp is by far the slowest thing. I only use cython for my local installs, it doesn't help tqdm significantly.

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

Unless you're talking about a different way of getting the time that is only possible in pure cython/C ?

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

About mininterval being an internal parameter, I disagree. I modify it for time sensitive applications to minimize the overhead of tqdm significantly, for example by setting it to 1s or even more. This is very useful to keep a progress bar display while having virtually no overhead. Also, it's a very simple parameter, very easy for the end user to change and to understand the impact (actually, I think it's currently the easiest way to minimize tqdm's overhead).

About Cython implementation, I think that's useless, for one reason: PyPy. tqdm is fully compatible with PyPy, and usually PyPy gets as fast as Cython or even more in some applications. In the case of tqdm, I can't see what Cython optimization could do more than PyPy can't already optimize automatically. Cython is nowadays mainly useful for vectorized operations since it can work on numpy, but since we don't do any fancy mathematical operation, I think PyPy already optimizes all tqdm operations to the maximum possible.

About the present issue, I think that the previous tqdm behavior should be restored, it was easier to understand even if it incurs a performance hit. However, if @casperdcl can find a better fix, that would ofc be better :)

from tqdm.

BertrandBordage avatar BertrandBordage commented on May 10, 2024

@lrq3000 I agree with you, the previous behaviour should be restored, tqdm will become a bit slower (for a total of about 120 to 200 ns per non-display iteration), but it’s only a small performance issue that’ll not really affect most of the cases. In case of a real rare performance problem, people should just not use tqdm.

But I don’t totally agree with you about PyPy. Yes, Pypy could make this problem disappear, but maybe not as much as a Cython implementation. For me, there is two advantages to using Cython: it makes a library that works on CPython as well as PyPy, and it allows a direct access to C functions. PyPy can optimise by guessing types and compiling on-the-fly, but it can’t optimise internal functions functions of Python (like time.time) AFAIK. And that’s what will mainly block us.

Now, is it really worth writing tqdm in Cython? I don’t think so. Even with Cython, the system calls like getting a timestamp, writing to stderr, or getting the terminal width will still take a vast amount of time.

We could also make a poll/vote to know if users would mind having a 120 ns cost per iteration instead of 60 ns.

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

About Cython vs PyPy, I doubt Cython will do a better optimization about type inference and stuff like that, so that's not where we will gain some performance. About system calls, if we can have some faster access using Cython, then yes there may be some significant gain, but I'm not sure about that, as I guess that Python is already doing system calls such as time using C, so using Cython will probably give us no gain at all.

About reverting to the previous behavior, you can check the alternative fix proposed by @casperdcl here, which preserve the current performances (theoretically): #55

I will soon review it (sorry about that, I should have reviewed it by now but I'm a bit too much busy, it will be done max on monday evening :S)

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

Just tried pypy. Wonderful stuff. I can confirm tqdm's pure python code runs faster with pypy than cython-compiled (may not be true for pure cython re-implementation). Anyway I think we're all agreed that we're definitely going to stick with pure python for this project.

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

@BertrandBordage please check if the latest version fixes your issue, you can play with the smoothing parameter to your desire. If not, we'll think something else out :)

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

Interesting that it says I closed this issue even though I didn't :)

from tqdm.

BertrandBordage avatar BertrandBordage commented on May 10, 2024

Sorry to say that, but it still freezes on the minimal example from the original post :
(Using tqdm 3.1.0 and Python 3.5)
Without changing miniters, the only workaround is to change smoothing to 0 or a really small value, like 1e-5.

@casperdcl Of course, the idea of a Cython implementation is not to just compile the Python source ;) You only win an average 30% performance by doing that. If you carefully code the Cython implementation, a lot of programs become several magnitude order faster depending on what your program uses (I even saw a 5000× speedup on a real use case once!). In tqdm’s case, the speedup would be small, maybe 2-3 times faster, but nothing groundbreaking, because most of the work is system calls.

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

The example you gave (3k near instantaneous iters then a long pause) is so wildly odd that it should require you to specify a lower miniters manually (set it to 1 to recover original behaviour). I've commented in the documentation that erratic rates (eg downloading over internet) should have miniters=1

from tqdm.

BertrandBordage avatar BertrandBordage commented on May 10, 2024

That’s right, but even with a realistic change of speed, the problem stays the same:

for i in tqdm(range(10000)):
    if i < 3000:
        sleep(1e-3)
    else:
        sleep(1)

Waiting 1 ms vs 1 s is realistic, when you check that an image exists otherwise convert it, you get waits of these type.

That’s great of you to have added a note in the documentation, but to me that’s still a regression from 1.0. Sorry to bother you :s

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

Not a bother, it's just that if we make miniters=1 the default (restoring old behaviour) the performance hit for a bunch of other cases becomes noticeable. Of course if enough people think the use case you describe is more common than quick iterations, we can revert.

from tqdm.

BertrandBordage avatar BertrandBordage commented on May 10, 2024

OK!

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

I reopen the issue because there's a bug, the exponential moving average is not behaving like it should be: normally, miniters should converge towards 1, whatever was the initial big step iterations. For example using this code:

from tqdm import tqdm
from time import sleep

t = tqdm(total=10000)
t.update(1000)
for i in range(10000-1000):
    sleep(0.01)
    t.update()
t.close()

What we get is this sequence of iterations updates: [0, 1010, 1313, 1616, 1919, etc], you see that after the first smoothing reducing miniters=1000 to miniters=303, it then does nothing at all, while it should continually reduce miniters over time...

So this clearly needs fixing, plus I have an idea to fix this kind of issue for good: adding a maxinterval argument to automatically reduce miniters if it's too high that it takes longer than maxinterval to update the bar.

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

Ah and I forgot to say that I agree with @BertrandBordage that this use case should be fixed right now: as I've read and referenced in #48, it's quite easy to make progress bars for constant rate tasks, but it becomes interesting when you face variable rate tasks, and this is quite common for lots of applications (the best example being downloading over internet). So yes, we should fix that, but @casperdcl patch should be doing the trick, it seems there's only a small glitch somewhere to fix.

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

Right. All fixed and merged and tagged and... ready to push v3.1.1?

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

btw the sequence of updates in the case you described @lrq3000 is now [0, 1010, 1308, 1520, 1671, 1780, 1859, 1962, 2079, 2163, ...] on my machine...

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

Wow @casperdcl, that was fast! I intended to try to fix it by myself today, but your fix is anyway a lot better and more elegant than what I thought 👍

I'm trying this now.

About maxinterval, my idea was to use it to do the following:

if delta_t > maxinterval:
    miniters = maxinterval * miniters / delta_t

After one too long update, this should effectively prevent any subsequent overflow of the time to update beyond maxinterval. In addition, this wouldn't cost much but an additional if statement. Do you think I should add this @casperdcl ?

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

Yeah, @lrq3000 sounds good. Maybe with a default value of 1 second or something...

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

@casperdcl I thought about a default value of 60 seconds, we don't want to force anyone on updating too often the bar by default I guess.

About your latest code change, could you please explain why did you use mininterval / delta_t ? I know that it's to weight delta_it influence on miniters depending on the time spent for the last update, but why this ratio? The issue I can see is when delta_t << mininterval, in this case, delta_it influence will be multiplied by an integer >> 1. Was this on purpose?

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

Finally I think I'll put the default at 10 seconds, I guess it's a good compromise

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

@lrq3000 yes to both your comments... when dynamic_miniters is true, my current implementation uses ema to set miniters to a values which corresponds as closely as possible to mininterval... this includes both increasing as well as decreasing.

About maxinterval I don't think 1s is too much as the overhead would be negligible for such loops - but 10s is ok too I suppose

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

@casperdcl about maxinterval, I know, the issue isn't about the overhead would be negligible as you, but just I think we should not constraint the user too much: by setting 1, if the user wants to set a mininterval higher than 1s, he will also have to raise maxinterval. I guess that likely most users won't want to set a mininterval higher than 10s so that's why I propose 10s. Anyway, we can set this for now and see in the future if there's any issue :)
I'm working now on how to test this new feature lol...

About smoothing dynamic_miniters, ok that's what I thought, you tried to make miniters converge towards mininterval. That's a very nice idea :) Ok I've done some more tests, everything's OK. The way it works may seem a bit counterintuitive at first but it works correctly as we expect. Great job 👍

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

np. thanks @BertrandBordage for the suggestions!

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

Ok, done with maxinterval, and I added all relevant unit tests (I hope they will be stable enough, since we need time.sleep() to test them it was quite a bit tricky to design those tests...).

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

Well, the test test_smoothed_dynamic_min_iters_with_min_interval() sometimes fails on Travis on pypy for the iteration-based tqdm test... So I have commented the asserts, we'll need to find a more reliable way to do that test...

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

@lrq3000 this really should've been done on a branch and PR, nor directly on master. Now the commit history looks ugly and includes tagged releases which may be broken... And since they're tagged it's not a good idea to rebase and remove them.

As a general rule of thumb I only push to master if it's a small one-commit change that isn't likely to break anything and I don't tag until it definitely passes on Travis and all my systems. Also I tend to merge (not rebase) PRs into master then ammend to include the version bump in the merge commit. I think this is best to track which branches things came from etc... let me know if you guys don't agree.

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

Thanks for the implementation and trying to write the tests, though!

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

@casperdcl Yes sorry, you're totally right, I don't know why but I thought this would be a one-commit change and that it was just small change from the last PR. I hadn't foreseen it would be such a mess to get working right with unit tests...

I will now be a lot more conservative, I will branch for anything but very small "cosmetic" changes.

About merging PRs into master, yes that's very nice indeed to see where they come from. How do you do that? Do you use Github's interface or commandline?

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

@casperdcl Also can I ask you what "rules" you usually use to know when to merge which commits together? (this will help me better know how to manage my commits, I'm not experienced working in teams with git :) )

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

Ok I fixed the issue with the iteration-based tqdm test, all should be ok now. I have uploaded the new version 3.1.4 on pypi.

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

Well I never use git's automated merge because then i'd have to pull locally, amend and then force-push, which is not great...

  1. git-checkout master (locally)
  2. git-fetch --all
  3. git-pull
  4. git-merge new_feature_branch
  5. Update _version.py
  6. git-add _version.py
  7. git-commit --amend (add 'bump version' to the 'merge' message)
  8. Test
  9. git-push
  10. Wait for Travis to be safe
  11. If fails, fix locally (preferably amending commits rather than making new ones... or if you commit new ones do a little rebase) then continue from step (8)... force-push at step (9)...
  12. git-tag vx.x.x
  13. push --tags

Optional:
13. delete branch from the PR page
14. git-branch -D new_feature_branch (locally delete branch)

General rule: once you push a tag to git (or create the tag online) you can no longer alter/mess with/rebase anything from that tag and older.

If all this is too complex just wait for someone else to do it. It's good practice not to merge your own code anyway (because that ensures someone else looks at it). This was a longer message than I thought it was going to be.

Sent from my mobile device

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

Like I said I'm happy to do all of the above if it's too complex. Can I ask what are the steps you follow for uploading to pypi? I know there are rst issues (links etc?) and dependencies for building windows binaries...

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

Oh, thank you very much @casperdcl for detailing all your steps to merge a branch 👍 I know, I won't merge by myself my own branches (I did for the latest branch but it's because it was a simple bugfix), but it's interesting to know how it's done :)

About uploading to pypi, you just need to do python setup.py make buildupload, it will run the readme checks, compile and then ask you the credentials to upload to pypi (it will ask you twice: first to upload the readme, and then to upload the builds).

For Windows, there's no dependency except that you need to run Windows :/ But anyway that's not a big deal if the win build is not done, since this is a very simple library with no dependency, there's no issue for anyone to install from the .zip.

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

PS: a tip when building: you always need to remove the folders dist, build and the file tqdm.egg-info before launching a build (this is done in the Makefile), else your build may not include the latest files changes (there is a record inside the egg) and you may upload older versions of tqdm along the new one (because when you will upload the builds using twine, it will upload everything inside the dist folder!).

from tqdm.

casperdcl avatar casperdcl commented on May 10, 2024

I think it's nice to upload to pypi with windows binaries... are you saying this is only possible while running windows? No cross-compilation possible?

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

To my knowledge, no, it's not possible. To compile Windows binaries, you need the Python win32 interpreter to work, which is compiled with Visual Studio, so you cannot make it work on a Linux platform. From what I've read, people that do cross-compilations just install Windows in a VM and make the builds inside (repeat the process if you want to compile MacOSX builds...).

If someday Python switch to MinGW or cygwin gcc to compile Python, maybe it will be possible to build win32 Python apps on Linux, but for now it seems that's not possible.

from tqdm.

lrq3000 avatar lrq3000 commented on May 10, 2024

But anyway you can try it for yourself if you're running Linux, you just need to run this command:

python setup.py sdist bdist_wininst

On Windows, it generates a win32 installer without requiring anything but native Python.

from tqdm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.