sahahn / bpt_app Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 2.39 MB

The Brain Predictability toolbox app (BPt_app) is designed to offer a GUI experience to the base BPt python library.

Python 23.44% Hack 1.94% JavaScript 66.31% PHP 3.11% CSS 4.88% Dockerfile 0.27% Shell 0.05%

bpt_app's Introduction

Hi, my name is Sage Hahn

🔭 I'm a Data Scientist / Developer currently working at DeepHealth.
✨ Check out some of my projects pinned below, or on my website.

bpt_app's People

Contributors

Watchers

Forkers

harel-coffee

bpt_app's Issues

First time loading bug

On first time loading, will try and get the ML_options before setup_info.py has been run. Need to change the order of operations so that ML_options are not loaded until after setup has been called.

Sorting results by progress not working

Sorting results by progress not working, should sort by percent done as percent, and finished as 1

Info button needed on Sets page

Add help info buttons on the settings page explaining what the Short Event Names are.

Support for submitting jobs to remote clusters

Add in support for integrating remote clusters, i.e., ideally, you could set it up to submit jobs to a cluster. Implementation would likely be similar to VACC_EXT setup, but could involve something different (e.g., maybe running the full docker setup + app on an interactive slurm job??? Can you submit jobs from within interactive slurm jobs?)

Show 'X' not preserved on new set search

When searching for a new set, if the user had previously changed the DataTable to display more than 5 entries, this choice will be refreshed back to 5 upon a new search. Should look into propagating the user choice to a new search.

Logs on start screen

Add an explicit log for data loading / proc changes that can be viewed instead of the waiting screen. Also, change the start screen to look a bit better, especially since it will pop up first every time. So maybe have it start with something like: "Checking for changes to the underlying dataset!"

Better logs for data loading

Have in the logs it tell you where the latest ML object from a save is stored. Or rather, should have an option somewhere to download the pickled ML object, so one could for example just use the GUI for data loading.

Add in a check to not display a plot with too many categorical categories (i.e., right now just explodes if accidentally passed a variable with say all unique values)

Not the biggest bug, but should be fixed

Datatables sometimes disappearing

When switching between project tabs, sometimes when both tabs have a datatable loaded, going back to the first tab (i.e.,a loaded set) will cause that table to appear empty until re-drawn (by switching pages or searching)

Caching w/ data loading

Changing the event shortname still seems to break data loading caching. I.e., will still load an incorrectly cached previous copy.

GUI upload datasets

Create a nice GUI interface for uploading custom datasets - i.e., with more controls and flexibility for comma-separated vs tab-separated, options for which column is the subject id, and which is event name.

Related, would be a GUI screen to see what datasets are available and maybe delete some?

Comparing two (or more) completed jobs

Interface/option to compare the results from two different runs. Maybe some requirement like they both need to be on the same target, and either both Evaluate or Test?

Would involve mostly generating tables? Or some plotting? Not sure.

Weird behavior when multiple set tables are loaded

Seems to also effect other loaded tables

Only show imputer if any NaN

Along the lines of automating things that people shouldn't need to think about, add in the automatic hiding of imputers if no NaN data is loaded.

Build in explicitly a single / multiuser mode

A multi-user mode will eventually be put on the DEAP. This means that a number of settings need to be quickly enabled/disabled based on which mode is being run.

This includes:

Linking to a different Sets page
Removing the choice of dataset
Changing how variables are selected
Changing the helper text in some cases
Changing how data is loaded
Removing the load database check + associated pieces
Likely more...

Add settings dataset name

On the settings page, for adding short event names, include an indicator for which dataset has that event name (how to handle cases where multiple datasets have the same one?)

Transition Choice should support an order

As transition choice implies an ordering, this should really be an option

Visual feedback on pressing save projects

When clicking save projects, it would be nice to have the button change for a second, just to provide some visual feedback that clicking the button actually did something

Loading set variable display issues

Figure out a fix for what to display when loading a single set variable, when the whole set is loaded, e.g., if filter set on the whole set, then the log will be flooded with before + after values for every variable in the set.

Caching settings

Add the different caching options to the settings page. Make more transparent plus add optional storage limits.

Deleting temp jobs

Temp jobs should be somewhat regularly deleted, to avoid building up space, and also in cases when silent errors occur. One way of doing this is how validation jobs work, where if the output already exists at the start of the job, it is deleted. This won't work for jobs with names though

Force casting index to string.

Not 100% sure this is actually a problem, but it is worth looking into more

Initializing the data may not work as intended

For some reason, it seems that one of the merge settings isn't working quite as intended. E.g., ended up with overlapped subjects at the same timepoint in OVERLAP0.csv

The new event appended name breaks the validation table

The Show info validation table bases new columns off of the presence of spaces. Change this

Filter by percent and outlier

Right now the UI makes it seems like you can filter by both outlier + std, make sure that the UI reflects the actual behavior / decide on what might be the best behavior. Also make sure it is well documented in the help string

Saving / importing / sharing pipelines

Add in support for saving pipelines both in version across multiple users, and just between projects. e.g., user should be able to "import" a pipeline from a different project into their current one.

Set page minor looks changes

Change it so adding a new set is always at the top instead of the bottom? Maybe not.

Better ensemble support

Better integration / support / controls + options for integrating Ensembles. E.g., DES split, stacking regressor, etc...

Should have automatic detection of base model type, and show relevant options. Should be able to specify the model responsible for stacking for example.

Develop more in-depth help pages

An interesting idea would be to add in a separate set of pages which could be filled in with more advanced descriptions of certain things. E.g., descriptions on considerations for cross-validation w/ pictures or whatever.

Make Model + Parameter Search non-draggable

Very low priority bug

Long variable names in plots and tables

Added some functionality already to truncate when too long, but still breaks in a lot of cases and doesn't look great

Automatically detect if parameter search is needed

Change behavior s.t., parameter search starts as None, but when any set of params requiring a search or Select is specified, automatically change the search to RandomSearch.

Could alternatively change it to be a warning, i.e., cause a visual change, and have that pipeline not appear as valid.

Improve look of Set's page

Improve look and feel of Set's page, i.e., right now variable names can easily be too long. The top dataset selector is a linky clunky, etc...

Changing from single select to multiple select

Every by-val, from train only subjects, to inclusions / exclusions should support multiple values (instead of just a single value as currently implemented)

Remove sample-wise scorers from BPt

Sample-wise scorers are just for multi-label problems, we have no plans to support those for now, so should remove them as options.

HTML entrypoints

Right now, most of the logic is handled in javascript. Could be helpful in the future to add meaningful entry points, e.g. /project_name/page

Multiple logins

Investigate how to handle multiple tabs/windows open from the same user? Right now, this behavior will likely just break things in unexpected ways. Not sure the best way to fix it, seems like it would require more frequent communication with the server / re-writing a lot of how things are currently stored.

Add control for merge behavior

Current merge behavior is fixed as computing the inner overlap of subjects across different loading. Alternatively, could give the user control to set any non-overlapping subjects data to NaN and still keep those subjects, i.e., outer merge.

More info on jobs in results

There should potentially be more entries in the main table, maybe Elapsed? But also, when opening a job, it should allow the user to see more detailed information on how that job was run, e.g., what pipeline was used, etc...

Jobs that result in an error are not handled properly.

Jobs that result in an error are not handled properly. Relevant error text should be displayed when clicking show, their might be other errors too

Add support for Feature Importance's

This constitutes a fairly large effort and might involve to some extent changes on the BPt side of things. Thinking now that the place to specify what feature importances to calculate should be on the Evaluate tab. In whatever ways possible, the options should be "smartly" generated, i.e., so not displaying irrelevant params. Then also involved is added a section on the results for each job to view the feature importances in different ways.