Comments (23)
Map_blocks works really well
from xmovie.
I have implemented an alternative way to save out the frames using dask:
def dask_frame_wrapper(self, tt, odir=None):
fig = self.render_frame(tt)
frame_save(
fig, tt, odir=odir, frame_pattern=self.frame_pattern, dpi=self.dpi
)
def save_frames_parallel(self, odir, partition_size=5, progress=False):
"""Save movie frames out to file.
Parameters
----------
odir : path
path to output directory
progress : bool
Show progress bar. Requires tqmd.
"""
frame_range = range(len(self.data[self.framedim].data))
frame_bag = db.from_sequence(
frame_range, partition_size=partition_size
)
mapped_frame_bag = frame_bag.map(self.dask_frame_wrapper, odir=odir)
if progress:
with ProgressBar():
mapped_frame_bag.compute(processes=False)
else:
mapped_frame_bag.compute(processes=False)
The speedup is very nice (needs ~1/4 of the time), but I am still getting absolutely strange plotting behavior.
This gif is rendered using the serial (simple for loop) method and it looks completely normal:
But when the frames are rendered in parallel the colorbar ticklabels and the label go bananas:
Does anyone out there have an idea what could be happening? Is there a better way to execute figure plotting using dask?
CC: @dcherian @rabernat
from xmovie.
I couldn't reproduce with:
import xarray as xr
import dask.bag as db
from dask.diagnostics import ProgressBar
da = xr.DataArray(np.random.randn(50, 60, 10)).chunk({'dim_2': 1})
da.attrs['long_name'] = "what's in a name?"
da.attrs['units'] = 'units'
da.load()
def make_plot(tt, data):
hfig = plt.figure()
data.isel(dim_2=tt).plot(vmin=-2, vmax=4, cmap=mpl.cm.RdBu_r, extend='both')
hfig.savefig(f"images/{tt}.png")
del hfig
frame_range = range(len(da['dim_2'].data))
frame_bag = db.from_sequence(frame_range, partition_size=2)
mapped_frame_bag = frame_bag.map(make_plot, data=da)
mapped_frame_bag.compute()
maybe it's a cartopy thing?
I have a few questions on your approach to making embarrassingly-parallel movies:
-
Are you doing this with dask arrays or numpy arrays? If I remove the
da.load()
line in the above example, things don't work. Is that expected? -
What would your approach be to calling a plot or render function on every slice along one axis of a dask array. E.g. if my dask array has a dimension time and I want a frame per time step.
from xmovie.
Thanks a lot. This is helpful and encouraging.
I have been working on more 'presets', and I think before merging this feature, I will implement more basic presets, so I can confidently test several presets (#12) and see if only the ones using cartopy (or just certain projections) fail. I think the default preset
should be exactly what xarray would plot, the rotating globe should be optional (I just implemented first, because I had it available and it looks cool hehe).
-
Hmm that is strange. I intended it the way that it would not matter, but I am not testing with dask arrays currently. I will add datasets with dask arrays for sure (#11). The one difference I see is that in xmovie the
data
variable is set in theMovie
initialization and then pulled from the class, but that should not really matter if I am not missing something. -
Movie
has a kwargframedim
, which defaults to 'time' which is passed to each 'preset' plot function. You can however also just write your own function and assign the passedframe
to any dimension you like. I try to keep things as flexible as possible so that custom functions (e.g.func(ds, fig, timestamp, **kwargs)
) get access to the full datastructure and the only input that is definitely needed (the frame - or timestamp).
I have experimented with these options quite succesfully but I realize that the most urgent step at this point is to start a thorough documentation (#13), explaining in detail the options. Do you agree?
from xmovie.
Also, random question: Why is parallelization with dask always embarrassing
from xmovie.
This is what the graph looks like with a dask array in the above example, clearly there's no benefit: I don't know how to get rid of that finalize step: mapped_frame_bag.visualize()
,
Also, random question: Why is parallelization with dask always embarrassing
🤣
Haha, it's not always I don't think. I think it's used for problems that are easy to parallelize because each step is independent of the other; so the problem tends to scale well with compute power. (This is my amateur opinion). In MATLAB this kind of thing is easily solved by parfor
.
from xmovie.
I have a solution here: ncar-hackathons/scientific-computing#6. It basically abuses map_blocks
to work on dask array chunks in parallel and passes on dims, coords, attrs
so that I can reconstruct an xarray datarray out of the numpy-fied chunk that map_blocks
provides.
and because the chunk is xarray-fied you get all the default plotting features like automatic labelling:
Also Matt Long pointed out that since matplotlib is not thread-safe, you need to prevent dask from using multithreading. Otherwise I think you get the weird behaviour you see.
This is the line we use on Cheyenne to make that happen: cluster = dask_jobqueue.PBSCluster(cores=18, processes=18,)
.
from xmovie.
Sorry I havent had time to get back to this in a while. This looks pretty nice though!
Thanks for putting this together.
Just to clarify, my original approach did work and also speed things up (Id have to check the dask graph to make sure it is as nicely parallel as this approach), it just produced these super weird glitches.
The one thing that I could think of here is that this approach basically limits us to a single datarray, right? Or can we pass a full dataset? I had some plans to be able to pass a full dataset and overlay different plotting methods for several data_variables in that dataset.
I will definitely return to this shortly, just have a review due and some other not so fun stuff for today/tomorrow.
from xmovie.
Hmm.. yes probably limited to a dataset. Did your original approach work with dask arrays? I had no luck using both dask.bag and dask.array.
from xmovie.
Actually map_blocks
lets you pass multiple dask arrays, so wrapper
and animate
could be extended to do that...
from xmovie.
Oh then thats really cool! If we can work it out so that the internally constructed xarray object is equivalent to ds.sel(framedim=index)
, then things should work pretty smoothly.
For reference, this is what I have been working with... it mostly differs from your example in the sense that its all organized in the Movie
class. Not sure if/how that would affect things.
def save_frames_serial(self, odir, progress=False):
"""Save movie frames as picture files.
Parameters
----------
odir : path
path to output directory
progress : bool
Show progress bar. Requires tqmd.
"""
# create range of frames
frame_range = range(len(self.data[self.framedim].data))
if tqdm_avail and progress:
frame_range = tqdm(frame_range)
elif ~tqdm_avail and progress:
warnings.warn("Cant show progess bar at this point. Install tqdm")
for fi in frame_range:
fig = self.render_frame(fi)
frame_save(
fig,
fi,
odir=odir,
frame_pattern=self.frame_pattern,
dpi=self.dpi,
)
def dask_frame_wrapper(self, tt, odir=None):
fig = self.render_frame(tt)
frame_save(
fig, tt, odir=odir, frame_pattern=self.frame_pattern, dpi=self.dpi
)
def save_frames_parallel(self, odir, partition_size=5, progress=False):
"""Save movie frames out to file.
Parameters
----------
odir : path
path to output directory
progress : bool
Show progress bar. Requires tqmd.
"""
frame_range = range(len(self.data[self.framedim].data))
frame_bag = db.from_sequence(
frame_range, partition_size=partition_size
)
mapped_frame_bag = frame_bag.map(self.dask_frame_wrapper, odir=odir)
if progress:
with ProgressBar():
mapped_frame_bag.compute(processes=False)
else:
mapped_frame_bag.compute(processes=False)
or could it be the (processes=False)
? I remember a while back, that was somehow necessary to get it to work, and it is faster, but I am not sure if we might be able to speed this up even more.
from xmovie.
I am having trouble interpreting your example graph: Why does the plotting step only show 5 steps? Shouldnt there be 10?
If the plotting steps are run in parallel, I still expect a significant speedup, since that is what takes most time (at least when invoking fancy mapping etc). But I am all for going with map.blocks
if we can retain the flexibility for later features, honestly. Also again thanks for working on this!
from xmovie.
Urghhhh I want to work on this sooo bad now...but I need to finish this paper first hahaha.
from xmovie.
I think you're looking at the first graph which was me trying to run with dask.bag
with 2 partitions and 5 cores (or something).
The second one has 10 in parallel (map_blocks
).
- Your rendering problem is probably fixed by forcing single-threaded workers.
- Does your approach only work with numpy arrays? That was the limitation I ran in to.
dask.bag
with a loaded dataset works awesomely well but craps out with a dask dataset.
If we can work it out so that the internally constructed xarray object is equivalent to ds.sel(framedim=index)
Currently this is like da.sel(framedim=index)
. I think we can extend it (I think) but it might be painful.
from xmovie.
I tried it with dask arrays and it worked, but I will add some explicit tests for that for sure.
from xmovie.
Finally figured this out using the new map_blocks
functionality.
def save_image(block):
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
if sum(block.shape) > 0:
# workaround 1:
# xarray passes a zero shaped array to infer what this function returns.
# we can't run plot, so avoid doing that
f = plt.figure()
ax = f.subplots(1, 1, subplot_kw={"projection": ccrs.PlateCarree()}, squeeze=True)
# xarray plotting goodness is available here!
block.plot(ax=ax, robust=True, vmin=5, vmax=28, cmap=mpl.cm.Spectral_r, cbar_kwargs={"extend": "both"})
# on pangeo.io, this will need some tweaking to work with gcsfs.
# haven't tried that. On cheyenne, it works beautifully.
f.savefig(f"images/aqua/{block.time.values[0]}.png", dpi=180)
plt.close(f)
# workaround 2:
# map_blocks expects to receive an xarray thing back.
# Just send back one value. If we send back "block" that's like computing the whole dataset!
return block["time"]
# I want to animate in time, so chunk so that there is 1 block per timestep.
tasks = merged.sst.chunk({"time": 1, "lat": -1, "lon": -1}).map_blocks(save_image)
tasks.compute()
from xmovie.
Oh sweet! Thanks for keeping at this @dcherian. Could you put a PR together? I would love to see if this takes care of those weird rendering issues I had as well!
Realistically I can maybe devote some time later this week to xmovie, but today and Mon/Tue are pretty packed already.
from xmovie.
did this ever get implemented in?
I am trying to making a 4500 frame movie, and don't want to wait 40 hours :P
from xmovie.
I've used it for other occasions and am planning to refactor this eventually, but the damn time is precious hahaha. Seems like there is more damn though, so maybe i'll get myself to do it some time soon!
from xmovie.
Dropping by because I'm also interested! I've been using xmovie
and I love how simple it is, but waiting for matplotlib (which is very slow) to plot every single frame take a lot of time even for a relatively short video.
Are there any workarounds to use dask for this issue that I can use right now? Maybe modifying the plot_func()
?
Thanks!
from xmovie.
Are there any workarounds to use dask for this issue that I can use right now? Maybe modifying the plot_func()?
I don't think this would work. Xmovie still draws a full figure for each timestep, but we could use dask to do many frames in parallel (since they do not need information about each other in most cases). I had a version that was working on a branch, but it was introducing super weird artifacts in the movies. @dcherian suggested using the new xarray.map_blocks functionality and I just haven't gotten around to implementing that.
It is on the list, but unfortunately there are a few high priority things currently occupying me. I am very keen to get back to this however. Apologies for the delay
from xmovie.
@jbusecke thanks for the answer! Yes I saw the videos with the weird label behavior. Honestly I don't mind that behavior too much. It's a small price to pay to have something that doesn't take an hour to generate each video. Is that branch still available?
from xmovie.
Its here: https://github.com/jbusecke/xmovie/tree/jbusecke_dasksave, but use at your own risk! It is quite old at this point!
from xmovie.
Related Issues (20)
- Many stale branches HOT 1
- Multiple panels with multiple xarray objects HOT 1
- How to plot contour lines on top of animation HOT 4
- `xmovie` google search is definitely NSFW (close call) HOT 1
- error during pip install (`geos_c.h` not found)
- Use pytest-xdist? HOT 1
- Gif generation breaks non-standard aspect ratios
- Lighten requirements? HOT 3
- Docs build failing
- Title on Rotating globe preset HOT 4
- Wrong version appearing on v0.2.2 release HOT 6
- Parallel plotting isn't working with Datasets HOT 4
- Fully support xr.Dataset input
- Custom plot with projection HOT 3
- Conflicting dependencies when installing via pip (Cartopy) HOT 4
- Error saving when using parallel option on Dataset, index error on the dims HOT 4
- gif always small HOT 5
- Deprecation warnings in the tests
- How can I use ProPlot? HOT 7
- Feature request for `bbox_inches = "tight"` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xmovie.