skywolf829 / gstk Goto Github PK

Gaussian Splatting toolkit application. One stop shop for preprocessing your dataset, training your model with human-in-the-loop training, and editing saved GSplat PLY files.

License: MIT License

HTML 2.25% CSS 12.62% JavaScript 85.13%

gstk's Issues

Initialize model from the start

Currently, the model isn't initialized until a dataset is initialized. By default, the model should do the random blob initialization that it currently does for NeRF synthetic data as soon as the server is started. If a SfM dataset is loaded, the UI should ask if the user would like to load the SfM points.

Backend begin/pause/play training

A button will be clicked on frontend to begin training, or pause/play if training already started (#4).

Requirements

Check that everything that should be initialized is already (dataset, model, trainer)
Begin training (threading required?)

Extra

Reset training? Reinitializes optimizer, go to iteration 0, but keeps model the same?

Camera model

In the backend, a camera model should be used to represent the user's view into the scene during training/editing. The camera will be interacted with via the user using their mouse/keyboard in the frontend viewer, and will send messages for how they'd like to move in the scene.

Requirements

Hold a persistent camera model that views into the scene
Camera should also hold the resolution to render the scene, and should listen for messages regarding the resolution change
Listen for messages about activity within the viewing screen for WSAD + mouse or mouse + modifiers, scrollwheel
Implement standard 3D viewer transformations to affect the camera model. For example, scrollwheel zooms on lookat target, click and drag rotates around lookat, shift/control + drag translates along camera x/y axis, etc. WSAD + mouse can also be implemented like a video game.

Create executable builds for frontend application

For convenience, a frontend user may prefer double clicking an application (.exe file) instead of using terminal commands. A [very] low priority item is to see how easy it might be to create an executable build people can download and double click. It should still support our frontend as it is with minimal changes.

I saw this repo and thought it could be useful: https://github.com/marcelotduarte/cx_Freeze

Web-based frontend

As we are now a server-client based model, we could try to re-create our app for web. Large undertaking, but may be even more usable for most people.

May require some backend changes as well, such as the communication scheme. Instead of multiprocessing.connection, direct HTML or a vanilla socket connection might be necessary.

Export model (frontend)

Add a way to export the current model to storage. Perhaps with File > Export As or File > Save As. Should send a message to the backend to perform the saving.

Add training statistics view in frontend

While training, the backend Trainer sends updates to the frontend every X iterations (currently 100). It sends the iteration, maximum iteration, loss, and moving average loss (all from the most recent step). The image from the last iteration is also available but currently not sent (see backend.py on_train_step(...))

In the TrainerWindow on the frontend, a graph showing the training error may be useful. Could be the moving average and just send one update every 100 iterations or so, or maybe you can keep a running list of errors in the backend from each step and send them all at once every 100 iterations. dpg.add_line_series might be good for line chart. Other charts seem to exist like candle charts and histograms.

Backend toy data

To avoid needing a full-functioning backend for testing (some aspects of) the frontend, we need some way to generate toy data and transmit it to the frontend.

Requirements

Frontend

Window with a button, which when pressed, will tell the server to begin sending only toy data. Which data to send is TBD and will change as our features progress.
Include relevant information for the test in the sent message, such as "FPS: 120" to test the backend sending images at 120 FPS. Also consider resolution, loss metric updates, etc. Include as much as possible in the message to the backend for testing useful cases.
Include a Stop button to tell the backend to stop sending toy data

Backend

Listen for the message from the frontend instructing the use of toy data, or to stop sending data.
Once signaled for toy data, the backend should begin producing data at the requested frequency/resolution within the ServerController. May require threading.

Loading window and error window modals

When the backend is loading or catches an error it sends to the frontend, the frontend should pop up a window. For the loading window, it should be modal (non-escapable) and only close when the server finishes loading or disconnects. The error popup should just be a notice of the error with the option to close.

Backend already sends messages with the type as loading or as error with a header and body in the data attachment.

Logging Window

A window for the GUI application which is used as a logging tool for useful messages the user (or debugger) may be interested in seeing.

Requirements

Debug text (either from the standard print() via intercepting standard output writes or through a custom class) should appear in a Logging window which has a scrolling pane of output from the front end python application.
The window should be moveable, resizable, dockable, closable, and re-openable. May or may not be open by default.
The window should be available to click in the "View" menu item in order to open the logger or close it if already open, with the check mark displaying if it is open or not.
Logged messages should persist throughout the entire time the application is open (perhaps up to a certain line or character limit, like a standard terminal), even after closing the logging window and reopening.

Extra

Consider features such as logging level (verbose, warnings, errors, etc.) as well as an text input to it to run python code there.

Export gaussian model (backend)

Add support for listening for a message to export/save the model.

Frontend settings window for training setup, starting/pausing training

Before training a GS model, a few things need to be defined. Mainly, the dataset location (on the remote server) and the hyperparameters for the training/model. We need a window to allow editing of the parameters and dataset loading.

The window(s) should allow this process to work

Set up Dataset (tell backend where to load dataset from remote storage), then a button to actually load the data.
Choose training hyperparameters
Button to initialize trainer/model
Buttons to play/pause/stop training.

Requirements

Dataset loading

In the backend, the dataset can be loaded with only a few parameters hosted within a Settings object (see /src/backend/settings). Part of the setup for the dataset will require the user picking the following in the frontend:
-- a path to the dataset to load settings.dataset_path (string value)
-- a settings.resolution_scale (whether the images are downsized for faster training) (float value, 0.0-1.0)
-- white_background for if the images use a white background or not (boolean)
When the "load dataset" button is clicked, it should trigger the AppController/AppCommunicator to send a message to the backend with the dataset information/settings required for the data to be loaded (#5).

Trainer/model setup

The trainer has many hyperparameters of different type. Please see the Settings object, starting from iterations down to random_background. Each hyperparameter should be adjustable from the trainer setup window
Clicking initialize model button should transmit the parameters to the backend, at which point the backend will initialize a trainer and model for training (#6).
Trainer/model setup will REQUIRE that a dataset is already loaded.

Start training

Buttons for start/stop/pause/etc should transmit to the backend the intent. Separate issue for handling the communication on backend (#7 ).

Extra

Dataset loading

With extra communication, the frontend could navigate the folders on the backend easier than typing in a direct path. Some way to show the remote folder structure would help users.

Button clicking

It would help if the buttons could only be pressed when the prerequisites are met. For instance, for the trainer/model initialization, the dataset must already exist because the trainer needs the dataset. For training start, it should only be clickable once the dataset+model+trainer are initialized. To pause training, the model must be currently training.

Adjusting settings during training

Sometimes it would be nice to adjust the hyperparameters during training. If the user changes a learning rate or number of iterations, a message should be sent to the backend to update those settings again and continue training. Pausing training is not necessary (but may be wise!).

Create documentation

I added Sphinx and readthedocs setup to this repo, but I have no idea how to actually get it going. Would be nice to get a handle on that so we know how to update it as this progresses.

Main viewing window

The frontend should have a primary viewing window to view the rendered GS model. Interacting in this window (mouse, WSAD, etc) should move a viewing camera in the backend, which is used to render the model.

Requirements

A window in the application that will display rendered images. AppCommunicator should listen for a messages with the rendered image to display and update the window with the new image.
Click and dragging (with/without control/shift modifiers), or WSAD+mouse movement, should transmit data to the backend so it can update the camera model.
The viewing window's resolution should be kept and sent to the backend camera model each time it changes (on window resize)

Generative model training (Dream)

DreamFustion by Poole et al. and DreamGaussian by Tang et al. are two methods that enable a Text-to-3D model. This issue is to support training a GS model like this, from either a text string or a single image.

Include other packages as needed, since some NLP/LLM may be necessary, as well as other metrics like CLIP. Adjust the GaussianModel, Trainer, and Dataset objects as necessary to fit the standard GS training as well as "dream" training.

OpenGL rendering speed

The (backend) OpenGL renderer gets slowed down a number of ways.

Baseline framerate - 535 FPS to render one cube outline.

When the frontend application opens (on the same machine), drops FPS to 418.
When moving the RGBA + D buffers to a torch tensor (RGBA as char), drops FPS to 438.
When moving the RGBA + D buffers to a torch tensor (RGBA as float, drops FPS to 277.
With both (1) and (2), drops FPS to 345.
With both (1) and (3), drops FPS to 220.

Together, this means that per frame, just the OpenGL part is using 2.2-4.5ms, which is significant considering the gaussian model itself renders as fast as that, and must wait for this.

Will start by sending color data to GS CUDA kernels as uint8 instead of float32 to save time and memory.

In the future, could consider an in-kernel rendering of the OpenGL stuff we need.

Allow installation and running of backend locally without CUDA

To debug, we may want to run the backend locally even if we dont have CUDA installed. Without CUDA, we cannot install the pip libraries for the rendering and knn packages.

#Requirements

Backend should be installable without the CUDA pip packages via a custom env.yml for conda
Should be able to run the generic backend.py script with the reduced backend environment
Necessary scripts should check if the CUDA packages can be imported, and handle the situation when it isn't

Add preferences window

There may be some default user functionality they'd like to change about the program that should persist each time the app is opened. For example, the layout of windows.

From the top menu bar, a "Preferences" window should be able to be opened. I think it should be a pop-up window as opposed to a modular window, but I'm open to suggestions. In the preferences window, the user should be able to adjust any settings that may be re-loadable from run to run, such as default IP and port, a dropdown to choose from pre-built layouts, a button to save their own layout and name it and be able to load it later, etc.

Other app-wide settings may be useful here, such as VSync enabled, font size, and other style selections (dark mode, light mode, other pre-made styles, etc)

Backend trainer/model initialization

The frontend will click a button and attempt to initialize a trainer/model on the backend (see #4).
The message from the frontend will include all the hyperparameters needed in the Settings object.

Requirements

Update the settings object, initialize the model, then initialize the trainer. No training should start, but it should be ready to be started immediately.
After (trying) to initialize the model and trainer return a message to the frontend:
-- Upon success, return error code 0 and a string with some message like "Model initialized" etc
-- Upon failure, return some nonzero error code and a message with the issue.

Render thread slowed down by communication thread

The GIL may be impacting our rendering speed performance - when the communication thread is disabled, the backend can train/render much faster - up to 3x it seems. There may be a better way to engineer the backend threads to be more efficient.

One solution might be to use multiprocessing to launch the communicator as its own process, so it can truly send and receive messages in parallel to the rendering/training threads. Then, we just need some way for the 2 processes to talk to each other without introducing another thread and overhead.

Dataset creation from images

Some users may not be computer savvy enough to install COLMAP and run it using convert.py on their images. We should introduce a workflow to "Create a Dataset". The user should pick a folders (of images for their scene), and we should take care of running the appropriate COLMAP commands to build a dataset (essentially running convert.py behind the scenes for them).

Will require installing COLMAP (and maybe ImageMagick) in backend and integrating that location into the arguments for the convert.py command. Possibly thru conda?

Once created, the user should be notified with a popup, and the dataset should automatically be loaded.

Resizing screen doesn't update window until click

Observed

When resizing the viewport (entire window), the rendering window doesn't adjust the resolution until a mouse is clicked inside the app somewhere and released.

Internally, the render screen is listening for resizes of the window directly (not entire app), and resizes on mouse release. The mouse release callback is only triggered if the mouse was originally clicked inside the viewport, not on the edge to resize the entire window.

Known DPG bug related to this: hoffstadt/DearPyGui#2217

Desired

Resizing viewport should update the render resolution of the RenderWindow when the mouse is released.

Loading existing model

In the backend, need to support loading a GS model from a saved PLY file location (I may or may not have ripped out that code/made it incompatible, needs checking).

Display training camera locations

During training, it may be useful for the user to see the training (and/or testing) camera positions within the scene for better understanding of where they may not have sampled their dataset enough. Render the outline of camera locations and possibly the actual image as a texture like nerfstudio.

In frontend, would help to also add a setting somewhere to enable/disable viewing the train/test cameras. Maybe in renderer settings?

Renderer settings window

Create a renderer settings window that gives control over some changes to the renderer. Things to include:

Density rendering boolean (checkbox?)
Resolution scaling (float slider between 0.25-1.0?)
Global gaussian scaling parameter

Does not have to be hooked up to networking yet until pre-required components are finished.

Extra

Spherical harmonics debugger

Frontend Style/Design of UI

The default ImGUI design leaves something to be desired. Look into styles, fonts, etc and see about adjusting the look of the app to look more like a professional tool.

Remove points operation

Add an operation for removing points within the selector. Should allow a variable percentage, where 100% deletes all in the selector, and any % less than that uses some geometry-aware processing to delete the % asked, while adjusting the remaining gaussians to fit the shape/colors that were there before.

Adjustable rendering/training speed

Currently, the renderer will run as fast as it can while training is ongoing. This may not be desired if the user wishes to focus compute power on training at the moment.

We need some way for the user to be able to control which thread gets more time - the training thread or the render thread.
Can use something very direct such as a variable for how long to sleep after either a render/train update, or can use something like nerfstudio's app that had a percentage slider for if they should be 50/50 or more weight toward one or the other. May be more nuanced and prone to error, though.

Additionally, a button to completely cease backend rendering while training is also useful. Maybe just a check box in the RendererSettingsWindow.

Update GitHub landing page to look good!

This is just to make the GitHub page look good! A useful readme with update installation and running instructions, examples of using the tool, etc. Can also create a github site if that works better.

Implement support for compression-aware training

A number of compression papers exist for gaussian splatting (see here), but it seems current state of the art is a paper by Niedermayr et al..

This issue is related to implementing this compression training approach in the backend as an option, with it defaulted to disabled.

Can be implemented with extra arguments to the default CUDA renderer code, or extra cpp/cu files defining new methods can be added and can be called depending on whether the compression aware training is enabled or disabled.

Should support saving in the compressed format or in standard PLY format other GS renderers are using.

Rendering and sending image to frontend

Given the current viewing camera and gaussian model, the scene should be rendered and returned to the frontend as frequently as possible both during training and while it is paused.

Requirements

May need some render thread that will continue rendering while a camera and gaussian model exist
Each rendered frame should be sent to the frontend (compressed/encoded first?)

Extra

If rendering speed is slow, adjust render resolution adaptively.
If training while rendering, allow a slider for how much time to spend rendering vs training.

Undo/redo edit operations

Need to support undo/redo on edit operations.

Design "selector" paradigm in backend

In the frontend, we want to allow the user to select groups of points. This selector should be viewable in the frontend render window, and in the backend there should be some representation for a selector to be used within the renderer or other code that edits the gaussians. This issue is only related to the backend's representation of the selectors, nothing with the frontend.

Types of selectors

At the moment, there are 2 simple selectors that should be considered, but also consider expanding this:

Rectangular prism
Sphere

The selectors should be adjustable (scale, position, possibly rotation), and allow some way to see if the points are selected or not for use in rendering/point removal/other operations.

Extra

Consider a "magic select" that allows the user to just click on a pixel in the screen and magic select will attempt to select all points in 3D similar to the point selected

Depth rendering

Similar to #11 , we might want to visualize depth instead of RGB output. This ticket is to add backend CUDA support for rendering depth instead of RGB.

Sync camera settings on init

On init, the camera settings are not synced between backend and frontend. FoV/etc shown on frontend are not the ones actually being used on backend at start.

Training speed slowed from threads

Obeserved

Training speed in train.py for the mic test dataset reaches a max around ~220 updates per second. Meanwhile, training through the frontend app gives training speeds around ~120 fps.

I've noticed that when some threads are stopped, performance increases.

Disabling the render thread gives +40 fps (mostly due to less GPU work, frontend is rendering at ~100 fps)
Disabling communication thread gives + 80 fps
Closing app and letting only backend run the training gives full fps (~220 fps)

Desired

The communication thread should not cause a slow down for backend training speed. If the renderer is completely disabled, backend training should run at full speed.

Frontend camera path for renders

For nice renders of the scene after training, we want to allow the user to create a render path in space. On the frontend, the user should be able to place the camera, click a button in some window to snapshot the camera state, and continue doing this to build a set of camera poses that the backend will go through (with a b-spline or something similar) to smoothly render.

In the camera path editor window, there should be:

the button to add the current camera pose
a button to clear all poses
options for FPS, resolution, FoV, total length of video, and save name

Density rendering

Add support for splatting gaussian density to screen instead of RGB.

Requirements

Add a boolean flag in the forward pass of the python/CUDA code for the render for a density variable which defaults to False (so as to not disrupt existing code).
Instead of normal compositing, the density should accumulate for all gaussians splatting on each pixel. In other words, the pixel should never reach some final accumulated value and finish processing further gaussians. This may make the render with density actually slower than normal.
The gaussian value (at a pixel) multiplied by the alpha should be the corresponding "density" for each gaussian, and this should be added for each gaussian splatted to a pixel.
Result may need to be normalized [0,1.0] or [0, 255] for final display to screen in the app.

Launch backend from frontend

For convenience, launching the backend code from the frontend (before connected to anything) could save users time.

When the user launches the backend, it should attempt to connect to it immediately and tell the user when everything is ready.
It should create a new process for the backend to run in that should automatically be killed if the frontend ever goes down.

Add points edit operation

Implement the edit operation for adding points into the model using the selector. The selector should become visible when the add points edit is selected, and clicking add should add the points.

In the edit window, should have options for the number of points and the distribution the points are added using.

Backend initialize dataset

The frontend will send necessary settings to the backend to initialize a Dataset object (see #4 )

#Requirements

In ServerCommunicator, listen for the signal to load a dataset with certain settings. When received, attempt to update the dataset object in ServerController.
Upon (trying) to load the dataset, return a error code (int) and status (string) pair to the frontend:
If the dataset cant be created, return an some nonzero error code with a string defining the issue for the frontend to see.
If the dataset is correctly initialized, return 0 error code with something like "OK" or "Dataset loaded", etc

Combine standard OpenGL rendering with GS model

In order to render some effects such as camera locations, current selector area, or even other objects, we want to allow other rendering to mesh seamlessly with the GS forward render. This issue is the backend work to support rendering standard OpenGL and allowing the model to still be rendered in that same scene.

Will be used for other issues like #14 #28 #26 etc.

Edit tool panel

In the app, we need an "edit tools panel" that has options for selectors (#14 ) as well as operations on those selectors.

Flow

Select a selector icon (box, sphere, etc) or just a mouse icon to not have an active selector
Adjust selector to where you'd like to be in the scene
Click on an operation (densify, delete, sparsify, etc)

I think it should be a narrow window with primarily clickable icons for each thing.

Extra

Consider undo with the operations - may be useful to have when you see the result of what you just did!

Backend camera path for renders

We want to support rendering a pre-designed path, given by a set of camera keyframes, interpolated with a B-spline.

In the front end, each time a keyframe is added, a message should be sent to the backend. We should keep a representation of the keyframes in the backend. There is also a button to clear the keyframes.

When the frontend clicks render, it will send a message to the backend with the total time and FPS and save name. The video should be rendered and saved, and optionally displayed in their viewport as it is being rendered.

In the viewport, we should also display the render keyframes for the camera and the b-spline connecting them so the user knows the path they created.

Finish networking framework

Complete an outline for the networking system, including threading and support for sending and receiving messages on both the frontend and backend.

Additionally, designing an archetype for message format should be considered. When the backend sends some loss values for iterations to the front end, how are they packaged? JSON/dict? How should the entries be named for parsing in the backend?

skywolf829 / gstk Goto Github PK

gstk's Issues

Requirements

Extra

Requirements

Requirements

Frontend

Backend

Requirements

Extra

Requirements

Dataset loading

Trainer/model setup

Start training

Extra

Dataset loading

Button clicking

Adjusting settings during training

Requirements

Requirements

Observed

Desired

Extra

Requirements

Extra

Types of selectors

Extra

Obeserved

Desired

Requirements

Flow

Extra

Recommend Projects

Recommend Topics

Recommend Org