Export notebooks on save instead of polling and building them every few seconds. <

I have used this snipeet to autosave html on save. <div class="snippet-clipboard-c

We need the export to have these things: <div class="snippet-clipboar

This is done. <a href="https://github.com/pipalacademy/pipalhub/compare/3871d55..c

Export notebooks on save event to avoid polling about pipalhub HOT 8 CLOSED

anandology commented on July 30, 2024

Export notebooks on save event to avoid polling

from pipalhub.

Comments (8)

anandology commented on July 30, 2024 1

I'm not able to get this post_save_hook to work. I think it might be related to how jupyterhub reads configuration for spawned instances, but I even tried putting the jupyter_notebook_config.py in the spawned user's ~/.jupyter/... and it's still unchanged.

I think it is important to figure out how to make that work. Let's dig a bit more.

from pipalhub.

anandology commented on July 30, 2024

I have used this snipeet to autosave html on save.

# ~/.jupyter/jupyter_notebook_config.py

_script_exporter = None
import io
import os
from notebook.utils import to_api_path

def script_post_save(model, os_path, contents_manager, **kwargs):
    """convert notebooks to Python script after save with nbconvert

    replaces `jupyter notebook --script`
    """
    from nbconvert.exporters.html import HTMLExporter

    if model['type'] != 'notebook':
        return

    global _script_exporter

    if _script_exporter is None:
        _script_exporter = HTMLExporter(parent=contents_manager)

    log = contents_manager.log

    base, ext = os.path.splitext(os_path)
    script, resources = _script_exporter.from_filename(os_path)
    script_fname = base + resources.get('output_extension', '.txt')
    log.info("Saving script /%s", to_api_path(script_fname, contents_manager.root_dir))

    with io.open(script_fname, 'w', encoding='utf-8') as f:
        f.write(script)

c.FileContentsManager.post_save_hook = script_post_save

from pipalhub.

nikochiko commented on July 30, 2024

We need the export to have these things:

A summary of each student's notebook
An index from which all students' notebooks can be visited
A single page for each student that displays the complete notebook

For this approach:

Components:
- Summary: We will have to handle this specially. A new save should not re-compile the whole summary file. At the same time, saves to multiple files should not create race conditions. I don't know if that is a case we should be worried about. The summary file would have to have separate unambiguous divs for each user that can be selected and specifically swapped out. There is some complexity - we would have to figure out how to build the first copy without any content.
- Index of students: This can be handled by simple nginx file serving
- Student page: Can be handled by nginx file serving
Pros:
- Easy to export as files
- Mostly already implemented
Cons:
- Some unknowns around race conditions to write the summary file
- Would depend on nginx and its configuration

Two other approaches that were discussed besides using a post_save_hook to direclty generate HTML:

Something like notebook-html without an express build step, but rather creating the HTML/JS files with knowledge of each student's name.
- Components:
  - Summary: We will need to extend the code to be able to take a max number of code cells that we convert to HTML. This change is simple (add an extra param to settings and override the simpleBuildHTML method to use it.
  - Index of students: We will need a list of the students' user names to build this. We could either setup some HTTP endpoint that does this or a single script that looks at some specific directory and returns the sub-directory names. I think the former should be the better way to do it when we have the admin interface ready, until then it would be better to use a simple script.
  - Student page: We can return this content dynamically using a template. The notebook-html library can be used and we'll have a unique URL for each student.
- Pros:
  - No need to run build scripts in advance
  - Content is always fresh when reloaded
  - No need for a separate build component (endpoints can be part of the same webapp as notebook docs)
- Cons:
  - Summary page would load all students' complete notebooks into memory each time (from the URL fetch) before taking the first 10 cells out from it. This might have an impact on the instructors' experience.
  - Exporting as HTML files (for backups and sharing with students) would get harder. Or we'll have to use the Python exporter to do that.
Like post_save_hook but with each cell as a SQL row and rendering dynamically
- Components:
  - Summary: Getting the summary for each student would be an SQL query and then the HTML exporter can convert that JSON to HTML. We would only load a limited amount of data.
  - Index: Would be served by an extra endpoint on notebook-html
  - Student page: Would be served by an endpoint on notebook-html
- Pros:
  - Everything is in database. Easy to play with
  - Rendered server-side, so easy to export
  - Easy to scale to a very large number of users
- Cons:
  - Implementation would be complex. We'd have to deal with edge cases and need to test it.

from pipalhub.

nikochiko commented on July 30, 2024

I think the most reliable way to move ahead would be to use notebook-html with javascript. There are least unknowns and the instructor is guaranteed to have a fresh copy.
For file exports, we can use Python's HTMLExporter or write some logic on frontend to create a download button for each file after loading it.

from pipalhub.

nikochiko commented on July 30, 2024

I wasn't able to get this exact approach to work, with extensions. As an alternate approach, we could have a service instead: https://jupyterhub.readthedocs.io/en/stable/reference/services.html
and make a simple Flask app that will serve the desired pages. The flask process would be managed by jupyterhub (we won't need to have custom start/stop or use systemctl) and proxied to with a /services/{service_name} url. I have tested that it works.
The notebook content cannot be fetched directly as an IPYNB and would need an authenticated HTTP request. Because we don't want to share the auth token with frontend, we can delegate that part to a separate unauthenticated endpoint on our service that would in turn fetch the content with its auth token, or using the file system directly.

from pipalhub.

anandology commented on July 30, 2024

We need the export to have these things:

* A summary of each student's notebook

* An index from which all students' notebooks can be visited

* A single page for each student that displays the complete notebook

We already do these things as part of the build process. The issue is it is done repeatedly every couple of seconds.

I think it would be easier to do the same process on every save and try to optimize from there rather than taking up a completely new approach.

from pipalhub.

nikochiko commented on July 30, 2024

I'm not able to get this post_save_hook to work. I think it might be related to how jupyterhub reads configuration for spawned instances, but I even tried putting the jupyter_notebook_config.py in the spawned user's ~/.jupyter/... and it's still unchanged.

from pipalhub.

nikochiko commented on July 30, 2024

This is done.
https://github.com/pipalacademy/pipalhub/compare/3871d55..c2ff9580c38dbb98c816344798614d0507b17706

from pipalhub.

Export notebooks on save event to avoid polling about pipalhub HOT 8 CLOSED

Comments (8)

Related Issues (12)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent