Giter VIP home page Giter VIP logo

bpt_app's Introduction

Hi, my name is Sage Hahn

  • ๐Ÿ”ญ I'm a Data Scientist / Developer currently working at DeepHealth.
  • โœจ Check out some of my projects pinned below, or on my website.

bpt_app's People

Contributors

sahahn avatar

Watchers

 avatar  avatar  avatar

Forkers

harel-coffee

bpt_app's Issues

First time loading bug

On first time loading, will try and get the ML_options before setup_info.py has been run. Need to change the order of operations so that ML_options are not loaded until after setup has been called.

Support for submitting jobs to remote clusters

Add in support for integrating remote clusters, i.e., ideally, you could set it up to submit jobs to a cluster. Implementation would likely be similar to VACC_EXT setup, but could involve something different (e.g., maybe running the full docker setup + app on an interactive slurm job??? Can you submit jobs from within interactive slurm jobs?)

Show 'X' not preserved on new set search

When searching for a new set, if the user had previously changed the DataTable to display more than 5 entries, this choice will be refreshed back to 5 upon a new search. Should look into propagating the user choice to a new search.

Logs on start screen

Add an explicit log for data loading / proc changes that can be viewed instead of the waiting screen. Also, change the start screen to look a bit better, especially since it will pop up first every time. So maybe have it start with something like: "Checking for changes to the underlying dataset!"

Better logs for data loading

Have in the logs it tell you where the latest ML object from a save is stored. Or rather, should have an option somewhere to download the pickled ML object, so one could for example just use the GUI for data loading.

Datatables sometimes disappearing

When switching between project tabs, sometimes when both tabs have a datatable loaded, going back to the first tab (i.e.,a loaded set) will cause that table to appear empty until re-drawn (by switching pages or searching)

Caching w/ data loading

Changing the event shortname still seems to break data loading caching. I.e., will still load an incorrectly cached previous copy.

GUI upload datasets

Create a nice GUI interface for uploading custom datasets - i.e., with more controls and flexibility for comma-separated vs tab-separated, options for which column is the subject id, and which is event name.

Related, would be a GUI screen to see what datasets are available and maybe delete some?

Comparing two (or more) completed jobs

Interface/option to compare the results from two different runs. Maybe some requirement like they both need to be on the same target, and either both Evaluate or Test?

Would involve mostly generating tables? Or some plotting? Not sure.

Only show imputer if any NaN

Along the lines of automating things that people shouldn't need to think about, add in the automatic hiding of imputers if no NaN data is loaded.

Build in explicitly a single / multiuser mode

A multi-user mode will eventually be put on the DEAP. This means that a number of settings need to be quickly enabled/disabled based on which mode is being run.

This includes:

  • Linking to a different Sets page
  • Removing the choice of dataset
  • Changing how variables are selected
  • Changing the helper text in some cases
  • Changing how data is loaded
  • Removing the load database check + associated pieces
  • Likely more...

Add settings dataset name

On the settings page, for adding short event names, include an indicator for which dataset has that event name (how to handle cases where multiple datasets have the same one?)

Visual feedback on pressing save projects

When clicking save projects, it would be nice to have the button change for a second, just to provide some visual feedback that clicking the button actually did something

Loading set variable display issues

Figure out a fix for what to display when loading a single set variable, when the whole set is loaded, e.g., if filter set on the whole set, then the log will be flooded with before + after values for every variable in the set.

Caching settings

Add the different caching options to the settings page. Make more transparent plus add optional storage limits.

Deleting temp jobs

Temp jobs should be somewhat regularly deleted, to avoid building up space, and also in cases when silent errors occur. One way of doing this is how validation jobs work, where if the output already exists at the start of the job, it is deleted. This won't work for jobs with names though

Filter by percent and outlier

Right now the UI makes it seems like you can filter by both outlier + std, make sure that the UI reflects the actual behavior / decide on what might be the best behavior. Also make sure it is well documented in the help string

Saving / importing / sharing pipelines

Add in support for saving pipelines both in version across multiple users, and just between projects. e.g., user should be able to "import" a pipeline from a different project into their current one.

Better ensemble support

Better integration / support / controls + options for integrating Ensembles. E.g., DES split, stacking regressor, etc...

Should have automatic detection of base model type, and show relevant options. Should be able to specify the model responsible for stacking for example.

Develop more in-depth help pages

An interesting idea would be to add in a separate set of pages which could be filled in with more advanced descriptions of certain things. E.g., descriptions on considerations for cross-validation w/ pictures or whatever.

Automatically detect if parameter search is needed

Change behavior s.t., parameter search starts as None, but when any set of params requiring a search or Select is specified, automatically change the search to RandomSearch.

Could alternatively change it to be a warning, i.e., cause a visual change, and have that pipeline not appear as valid.

Improve look of Set's page

Improve look and feel of Set's page, i.e., right now variable names can easily be too long. The top dataset selector is a linky clunky, etc...

HTML entrypoints

Right now, most of the logic is handled in javascript. Could be helpful in the future to add meaningful entry points, e.g. /project_name/page

Multiple logins

Investigate how to handle multiple tabs/windows open from the same user? Right now, this behavior will likely just break things in unexpected ways. Not sure the best way to fix it, seems like it would require more frequent communication with the server / re-writing a lot of how things are currently stored.

Add control for merge behavior

Current merge behavior is fixed as computing the inner overlap of subjects across different loading. Alternatively, could give the user control to set any non-overlapping subjects data to NaN and still keep those subjects, i.e., outer merge.

More info on jobs in results

There should potentially be more entries in the main table, maybe Elapsed? But also, when opening a job, it should allow the user to see more detailed information on how that job was run, e.g., what pipeline was used, etc...

Add support for Feature Importance's

This constitutes a fairly large effort and might involve to some extent changes on the BPt side of things. Thinking now that the place to specify what feature importances to calculate should be on the Evaluate tab. In whatever ways possible, the options should be "smartly" generated, i.e., so not displaying irrelevant params. Then also involved is added a section on the results for each job to view the feature importances in different ways.

NaN threshold for loading sets

For loading, sets add in support for a NaN threshold like is currently implemented in base BPt in Load_Data. This would also involve improved support for printing information about patterns of NaN.

Quick copy pipeline

It would be nice to have a feature where you could select to make a copy of an existing pipeline, e.g., similar to how param dists are setup, could be helpful when only trying to change a few small things.

Re-visit param caching

In some cases, the current implementation might not be working 100% correctly. Also should change it so param caching is dataset-specific rather than global~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.