On first time loading, will try and get the ML_options before setup_info.py has been run. Need to change the order of operations so that ML_options are not loaded until after setup has been called.
Add in support for integrating remote clusters, i.e., ideally, you could set it up to submit jobs to a cluster. Implementation would likely be similar to VACC_EXT setup, but could involve something different (e.g., maybe running the full docker setup + app on an interactive slurm job??? Can you submit jobs from within interactive slurm jobs?)
When searching for a new set, if the user had previously changed the DataTable to display more than 5 entries, this choice will be refreshed back to 5 upon a new search. Should look into propagating the user choice to a new search.
Add an explicit log for data loading / proc changes that can be viewed instead of the waiting screen. Also, change the start screen to look a bit better, especially since it will pop up first every time. So maybe have it start with something like: "Checking for changes to the underlying dataset!"
Have in the logs it tell you where the latest ML object from a save is stored. Or rather, should have an option somewhere to download the pickled ML object, so one could for example just use the GUI for data loading.
When switching between project tabs, sometimes when both tabs have a datatable loaded, going back to the first tab (i.e.,a loaded set) will cause that table to appear empty until re-drawn (by switching pages or searching)
Create a nice GUI interface for uploading custom datasets - i.e., with more controls and flexibility for comma-separated vs tab-separated, options for which column is the subject id, and which is event name.
Related, would be a GUI screen to see what datasets are available and maybe delete some?
Interface/option to compare the results from two different runs. Maybe some requirement like they both need to be on the same target, and either both Evaluate or Test?
Would involve mostly generating tables? Or some plotting? Not sure.
A multi-user mode will eventually be put on the DEAP. This means that a number of settings need to be quickly enabled/disabled based on which mode is being run.
This includes:
Linking to a different Sets page
Removing the choice of dataset
Changing how variables are selected
Changing the helper text in some cases
Changing how data is loaded
Removing the load database check + associated pieces
On the settings page, for adding short event names, include an indicator for which dataset has that event name (how to handle cases where multiple datasets have the same one?)
When clicking save projects, it would be nice to have the button change for a second, just to provide some visual feedback that clicking the button actually did something
Figure out a fix for what to display when loading a single set variable, when the whole set is loaded, e.g., if filter set on the whole set, then the log will be flooded with before + after values for every variable in the set.
Temp jobs should be somewhat regularly deleted, to avoid building up space, and also in cases when silent errors occur. One way of doing this is how validation jobs work, where if the output already exists at the start of the job, it is deleted. This won't work for jobs with names though
For some reason, it seems that one of the merge settings isn't working quite as intended. E.g., ended up with overlapped subjects at the same timepoint in OVERLAP0.csv
Right now the UI makes it seems like you can filter by both outlier + std, make sure that the UI reflects the actual behavior / decide on what might be the best behavior. Also make sure it is well documented in the help string
Add in support for saving pipelines both in version across multiple users, and just between projects. e.g., user should be able to "import" a pipeline from a different project into their current one.
Better integration / support / controls + options for integrating Ensembles. E.g., DES split, stacking regressor, etc...
Should have automatic detection of base model type, and show relevant options. Should be able to specify the model responsible for stacking for example.
An interesting idea would be to add in a separate set of pages which could be filled in with more advanced descriptions of certain things. E.g., descriptions on considerations for cross-validation w/ pictures or whatever.
Change behavior s.t., parameter search starts as None, but when any set of params requiring a search or Select is specified, automatically change the search to RandomSearch.
Could alternatively change it to be a warning, i.e., cause a visual change, and have that pipeline not appear as valid.
Every by-val, from train only subjects, to inclusions / exclusions should support multiple values (instead of just a single value as currently implemented)
Investigate how to handle multiple tabs/windows open from the same user? Right now, this behavior will likely just break things in unexpected ways. Not sure the best way to fix it, seems like it would require more frequent communication with the server / re-writing a lot of how things are currently stored.
Current merge behavior is fixed as computing the inner overlap of subjects across different loading. Alternatively, could give the user control to set any non-overlapping subjects data to NaN and still keep those subjects, i.e., outer merge.
There should potentially be more entries in the main table, maybe Elapsed? But also, when opening a job, it should allow the user to see more detailed information on how that job was run, e.g., what pipeline was used, etc...
This constitutes a fairly large effort and might involve to some extent changes on the BPt side of things. Thinking now that the place to specify what feature importances to calculate should be on the Evaluate tab. In whatever ways possible, the options should be "smartly" generated, i.e., so not displaying irrelevant params. Then also involved is added a section on the results for each job to view the feature importances in different ways.
For loading, sets add in support for a NaN threshold like is currently implemented in base BPt in Load_Data. This would also involve improved support for printing information about patterns of NaN.
It would be nice to have a feature where you could select to make a copy of an existing pipeline, e.g., similar to how param dists are setup, could be helpful when only trying to change a few small things.
In some cases, the current implementation might not be working 100% correctly. Also should change it so param caching is dataset-specific rather than global~
Sometimes upon init/refresh, the distribution name does not properly appear above model / object pieces. Having difficulty re-creating this bug though.