Giter VIP home page Giter VIP logo

Comments (8)

chriddyp avatar chriddyp commented on May 12, 2024 11

October 12th, 2017 Edit: This topic is now it's own part of the user guide. Please see http://plot.ly/dash/sharing-data-between-callbacks


Ah, that's interesting. Thanks for sharing! I'm not sure if these arguments apply to Dash, but I'm curious to learn more. Some notes:

  • Note that chained dependencies / intermediate inputs / intermediate steps are already supported (see the "Multiple Outputs" section in the user guide here: https://plot.ly/dash/getting-started-part-2)
  • For the sake of code modularity, you can just use regular functions like this:
global_df = pd.read_csv('...')
app.layout = html.Div([
    dcc.Graph(id='graph'), 
    html.Table(id='table'),
    dcc.Dropdown(id='dropdown')
])

def clean_data(df, value):
     # some expensive clean data step
     [...]
     return cleaned_df

@app.callback(Output('graph', 'figure'), [Input('dropdown', 'value'])
def update_graph(value):
    dff = clean_data(global_df, value)
    figure = create_figure(dff) 
    return figure

@app.callback(Output('table', 'figure'), [Input('dropdown', 'value'])
def update_table(value):
    dff = clean_data(global_df, value)
    table = create_table(dff) 
    return table
  • In this case, we're performing the clean_data step twice when the dropdown changes. In the case of something like a shared reactive expression, this could potentially only be done once. However, in Dash, all of these callbacks are executed in parallel on the server so you wouldn't end up being faster (as long as you aren't request bound).
  • If performance was really an issue, then you could add caching around clean_data, so that long expensive computations are only performed once (see https://plot.ly/dash/performance for more details)
  • If we did something like intermediate expressions, we'd have to send the intermediate data back to the client (the browser), which would incur a network delay cost
  • You can sort of already do this by just serializing your data as a string and displaying it in a hidden div. Again, this will incur a network delay cost, so this solution might not be any faster than just performing the calculation (albeit twice) in 2 different callbacks (which will be executed in parallel) and/or caching the intermediate values

Here's how you would do this in a hidden div:

global_df = pd.read_csv('...')
app.layout = html.Div([
    dcc.Graph(id='graph'), 
    html.Table(id='table'),
    dcc.Dropdown(id='dropdown'),
    html.Div(id='intermediate-value', style={'display': 'none'})
])

@app.callback(Output('intermediate-value', 'children'), [Input('dropdown', 'value')])
def clean_data(value):
     # some expensive clean data step
     cleaned_df = your_expensive_clean_or_compute_step(value)
     return cleaned_df.to_json() # or, more generally, json.dumps(cleaned_df)

@app.callback(Output('graph', 'figure'), [Input('intermediate-value', 'children'])
def update_graph(jsonified_cleaned_data):
    dff = pd.read_json(jsonified_cleaned_data) # or, more generally json.loads(jsonified_cleaned_data)
    figure = create_figure(dff) 
    return figure

@app.callback(Output('table', 'children'), [Input('intermediate-value', 'children'])
def update_table(jsonified_cleaned_data):
    dff = pd.read_json(jsonified_cleaned_data) # or, more generally json.loads(jsonified_cleaned_data)
    table = create_table(dff) 
    return table

Finally, note that when you run just app.run_server() only a single process is running which means that only one request can be made at a time. If you run the app with gunicorn or, for development purposes, just add app.run_server(processes=4), then multiple requests can happen at the same time. This means that callbacks will be executed in parallel (reducing the time cost of shared values).

from dash.

berndtlindner avatar berndtlindner commented on May 12, 2024 2

As someone who works a lot in Shiny, and am trying out dash, really expected this equivalency in dash in some form or the other. This is kind of a game changer for me.
I disagree that dash overcomes this with multi-processing, specifically the statement.

In this case, we're performing the clean_data step twice when the dropdown changes. In the case of something like a shared reactive expression, this could potentially only be done once. However, in Dash, all of these callbacks are executed in parallel on the server so you wouldn't end up being faster (as long as you aren't request bound).

What if the function (e.g. clean_data) is 1) performed more than 4 (number of core/processors available) times and 2) what if it is a long running and/or memory/computationally expensive algorithm?

from dash.

gaw89 avatar gaw89 commented on May 12, 2024 1

@chriddyp and @kmader, the hidden Div appears to be a bad idea in practice. I just spent the better (or worse) part of 2 days trying to figure out why my app was working flawlessly when run on my desktop but stumbled when pushed to the server (RHEL 7.1). As it turns out, the issue had to do with my use of a hidden Div to pass data between callbacks. When the DataFrame reached a certain number of rows (861), it would fail to execute the callbacks that depended on those Divs. I am guessing this has something to do with a size limit on Divs in HTML, but I am not certain.

I will try to post a reproducible example here in the next couple of days.

So far, Dash has been fantastic! But this has been a massive frustration. Live and learn...

from dash.

kmader avatar kmader commented on May 12, 2024

Thanks the Output('intermediate-value', 'children') seems to be the closest match. The additional step of de/serialization is a bit clumsy but maybe a good start. The primary issue I was having in a current use case is that I had quite repetitive code with the same input arguments being copy and pasted across multiple callbacks.

On the performance side, the additional benefit of having a 'ReactiveExpressions' is the caching could be invisibly globally handled by Dash rather than having it on a function-by-function basis (what Shiny does). I'll need to study the code a bit better to see if there is anything else that might work.

from dash.

chriddyp avatar chriddyp commented on May 12, 2024

For future readers, I have written up some solutions to this problem in a new section of the user guide: http://plot.ly/dash/sharing-data-between-callbacks

from dash.

chriddyp avatar chriddyp commented on May 12, 2024

As it turns out, the issue had to do with my use of a hidden Div to pass data between callbacks. When the DataFrame reached a certain number of rows (861), it would fail to execute the callbacks that depended on those Divs. I am guessing this has something to do with a size limit on Divs in HTML, but I am not certain.

I will try to post a reproducible example here in the next couple of days.

Please try to recreate a reproducible example. I have used this method with 5MB of data successfully before, I'm not aware of any inherent limitations. Another issue could be a request or response size limitation on the server that you are deploying it on (frequently the default is like 1MB).

from dash.

gaw89 avatar gaw89 commented on May 12, 2024

I'll try to recreate an example. It is probably the request/response size limitation as you indicate. I'm a newb web-dev, hence why Dash has been fantastic for me!

I'm also checking with my server admin to see if they're imposing some kind of size limit.

Thanks for the speedy response.

from dash.

chriddyp avatar chriddyp commented on May 12, 2024

Closing this. We have to do things differently than other frameworks because we support multiple-processes. We have documented several ways to pass intermediate data around in https://dash.plot.ly/sharing-state-between-callbacks and this will only get better with declarative client-side transformations (#266 ) and support for multiple outputs (#80 )

from dash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.