danbider / lightning-pose Goto Github PK
View Code? Open in Web Editor NEWAccelerated pose estimation and tracking using semi-supervised convolutional networks.
License: MIT License
Accelerated pose estimation and tracking using semi-supervised convolutional networks.
License: MIT License
Hello,
Not an issue, so feel free to label as question!
Thanks for releasing lightning-pose. Is there a way to make it work with multi-agent videos, or is the added complexity of handling spatiotemporal constraints on different tracklets the difficult part here?
Hi Team,
I want to know, what the post-processing approach used by LightningPose ? Also, I have following queries:
Many thanks in advance,
Currently the Lightning Pose code requires construction of a dataset/data module in order to load model parameters and perform inference. This requires users to move training datasets around with the model checkpoint, which is not ideal. Consider removing this requirement.
hi, lightning pose team
As the tutorial mentions, both base model and context model make use of unlabeled frames but context model utilizes temporal context frames.
so what is the difference and relation between unlabeled frames vs temporal context frames? Are temporal context frames derived from unlabeled frames?
dali.base.train.sequence_length - number of unlabeled frames per batch in regression and heatmap models (i.e. “base” models that do not use temporal context frames)
dali.context.train.batch_size - number of unlabeled frames per batch in heatmap_mhcrnn model (i.e. “context” models that utilize temporal context frames); each frame in this batch will be accompanied by context frames, so the true batch size will actually be larger than this number
Hi @themattinthehatt , I have 5 labeled keypoints per frame.
Thanks for the info that heatmaps are more accurate.
Also, I have noticed that when I use DLC image augmentation and when the image rotation aug is above 10, the code throws an error as below.
Error executing job with overrides: []
Traceback (most recent call last):
File "scripts/train_hydra.py", line 175, in train
trainer.fit(model=model, datamodule=data_module)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
self._call_and_handle_interrupt(
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 737, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1168, in _run
results = self._run_stage()
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1254, in _run_stage
return self._run_train()
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1285, in _run_train
self.fit_loop.run()
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 270, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 203, in advance
batch_output = self.batch_loop.run(kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 87, in advance
outputs = self.optimizer_loop.run(optimizers, kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 201, in advance
result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 248, in _run_optimization
self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 358, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1552, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 1673, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 216, in optimizer_step
return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 153, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/torch/optim/optimizer.py", line 113, in wrapper
return func(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/torch/optim/adam.py", line 118, in step
loss = closure()
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 138, in _wrap_closure
closure_result = closure()
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 146, in call
self._result = self.closure(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 132, in closure
step_output = self._step_fn()
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 407, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1706, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 358, in training_step
return self.model.training_step(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/typeguard/init.py", line 1033, in wrapper
retval = func(*args, **kwargs)
File "/home/walthamadmin/notebooks/projects/lightning-pose/lightning_pose/models/base.py", line 347, in training_step
loss = self.evaluate_labeled(train_batch, "train")
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/typeguard/init.py", line 1033, in wrapper
retval = func(*args, **kwargs)
File "/home/walthamadmin/notebooks/projects/lightning-pose/lightning_pose/models/base.py", line 321, in evaluate_labeled
data_dict = self.get_loss_inputs_labeled(batch_dict=batch_dict)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/typeguard/init.py", line 1033, in wrapper
retval = func(*args, **kwargs)
File "/home/walthamadmin/notebooks/projects/lightning-pose/lightning_pose/models/heatmap_tracker.py", line 233, in get_loss_inputs_labeled
predicted_keypoints, confidence = self.run_subpixelmaxima(predicted_heatmaps)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/typeguard/init.py", line 1033, in wrapper
retval = func(*args, **kwargs)
File "/home/walthamadmin/notebooks/projects/lightning-pose/lightning_pose/models/heatmap_tracker.py", line 143, in run_subpixelmaxima
confidences = evaluate_heatmaps_at_location(heatmaps=softmaxes, locs=preds)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/typeguard/init.py", line 1033, in wrapper
retval = func(*args, **kwargs)
File "/home/walthamadmin/notebooks/projects/lightning-pose/lightning_pose/data/utils.py", line 333, in evaluate_heatmaps_at_location
heatmaps_padded[i, j, k_offset, l_offset].squeeze(-1).squeeze(-1)
IndexError: index -9223372036854775808 is out of bounds for dimension 2 with size 388
Kindly check is there is a bug and help in correcting this.
Originally posted by @prateekdhawalia in #56 (comment)
hi, lightning pose team
is there any manual refining and re-training function in lightning pose if I feel unsatisfied with the prediction?
Please review Label Studio Setup Instructions
A method to convert Lightning Pose Annotation data with Label Studio Annotation
Lightning Pose Annotation uses:
Label Studio can import pre-annotated data
JSON-MIN version
[
{
"img": "/data/upload/1/18928a62-img1.png",
"id": 1,
"kp-1": [
{
"x": 96.71717171717172,
"y": 7.389162561576355,
"width": 0.5050505050505051,
"keypointlabels": [
"Nose"
],
"original_width": 396,
"original_height": 406
},
{
"x": 92.17171717171718,
"y": 5.41871921182266,
"width": 0.5050505050505051,
"keypointlabels": [
"Face"
],
"original_width": 396,
"original_height": 406
}
],
"annotator": 1,
"annotation_id": 1,
"created_at": "2022-07-06T12:49:47.101659Z",
"updated_at": "2022-07-06T12:49:47.101700Z",
"lead_time": 5.507
}
]
TODO:
@kathleenislee https://github.com/robert-s-lee/lightning-pose/blob/label-studio/tests/utils/csv_to_label_studio.py has sample code to help get started that reads CSV. the script needs to export in JSON-MIN format.
Hi, do you have plans to provide a Windows-compatible installation option? The installation instructions specify Linux compatibility only and I have now run out of credits for further use of the cloud version. Thanks in advance.
Hi Team,
Cool paper.
This is a question not an issue: I'm considering trying out lightning pose on my data but was hoping you could provide some info on inference speed before I try it out. I didn't see anything in the paper. About how many frames/second are you able to transcribe after training? Do you have any benchmarks I missed?
Thanks,
Hi, lightning pose team
In your preprint, Figure 4: Unlabeled frames improve pose estimation (raw network predictions.), Fig 4C and 4D show that when there are 75 labeled frames, semi-super context model performs the best. But when it goes to 631 labeled frames, it seems that different models(dlc, baseline, semi-super, semi-super context) performance would be very similar but all are better than those in 75 label frames.
So how many frames should I extract to label at the very beginning? With less as tens or more as hundreds?
Hi Team,
When I try to run the Lightning pose with animal data on NeSI platform it ends up with "killed" .
NesI platform (https://support.nesi.org.nz/hc/en-gb)
Why is this happening ? Please help me..
Thanks in advance :)
Performed a fresh install of lightning-pose, received the following error for all semi-supervised model tests:
def get_loss_inputs_unlabeled(self, batch_dict: UnlabeledBatchDict) -> Dict:
"""Return predicted heatmaps and their softmaxes (estimated keypoints)."""
predicted_keypoints = self.forward(batch_dict["frames"])
# undo augmentation if needed
> if batch_dict["transforms"].shape[-1] == 3:
E IndexError: tuple index out of range
lightning_pose/models/regression_tracker.py:198: IndexError
Hi, every time I run training via Pose-app GUI, it crashes at the last moment before finishing, like the figure attached. The trained model will be kept and could be used to predict new videos. However, I do not know its impact on models trained. Would you give me any suggestions about this? Thank you!
Hi lightning-pose team,
I have multiple body parts labeled. But some of them have a very limited range of movement (1/5 of width of the video) but others moves at a much larger range (nearly across the width of the video) . I tried to use same epsilon for all of the body parts but it seems not working well. Can I set different temporal loss parameters for each boday parts?
Or do you have any suggestions on how I can adjust the parameters?
Another alternative I am thinking is to train two models for the body parts. But it would definetly make it easier for me to just label once, train once and infer once.
Thanks! Appreciate your reply!
Best,
Nora
Currently, the initialization of all backbone models, such as ResNet 50, 101, ViT, and EffNet, is hardcoded in the models/base.py file. This approach lacks modularity and can lead to code duplication. To improve the code structure and maintainability, it is proposed to refactor the code by moving the backbone model initialization logic to a dedicated folder called 'backbones'. Each backbone model will have its own file, for example, resnets.py. Additionally, a build_backbone function will be created to handle the initialization of the backbones based on the provided configuration.
The current implementation suffers from several drawbacks. Firstly, having all backbone model initializations hardcoded in a single file makes it challenging to locate and modify specific backbone configurations. Secondly, it leads to code duplication if multiple files require the same backbone model. This lack of modularity can hinder the scalability and maintainability of the codebase.
Please let me know if you need any further clarification or have any questions regarding the proposed refactor.
Hi, lightning pose team
lightning pose is a great tool. and it is very important to my project.
I used to use deeplabcut before, and now I have multiple videos with multiple csv files.
How can I move on with lightning pose?
should I put all the csv files into one or should I train one model with one csv file?
Hoping for your suggestions.
Hi, lighting pose team
I am very interested in this super cool tool.
How to do multi-GPU with lighting pose?
I can't find multi-GPU information in the tutorial.
Hello,
I tried running training on toy dataset using the default hydra script and it fails when loss is set to pca_singleview/pca_multiview with the following stack trace.
Kindly help in resolving this.
train_batch_size: 16
val_batch_size: 16
test_batch_size: 16
train_prob: 0.8
val_prob: 0.1
train_frames: 1
num_gpus: 0
num_workers: 4
early_stop_patience: 3
unfreezing_epoch: 25
dropout_rate: 0.1
min_epochs: 100
max_epochs: 500
log_every_n_steps: 1
check_val_every_n_epoch: 10
gpu_id: 0
unlabeled_sequence_length: 16
rng_seed_data_pt: 42
rng_seed_data_dali: 43
rng_seed_model_pt: 44
limit_train_batches: 10
multiple_trainloader_mode: max_size_cycle
profiler: simple
accumulate_grad_batches: 2
lr_scheduler: multisteplr
lr_scheduler_params: {'multisteplr': {'milestones': [100, 200, 300], 'gamma': 0.5}}
pca_multiview: {'log_weight': 7.0, 'components_to_keep': 3, 'empirical_epsilon_percentile': 1.0, 'empirical_epsilon_multiplier': 1.0, 'epsilon': None, 'error_metric': 'reprojection_error'}
pca_singleview: {'log_weight': 7.25, 'components_to_keep': 0.99, 'empirical_epsilon_percentile': 1.0, 'empirical_epsilon_multiplier': 1.0, 'epsilon': None, 'error_metric': 'reprojection_error'}
temporal: {'log_weight': 7.5, 'epsilon': [12.9, 11.3, 10.5, 12.0, 5.0, 7.3, 0.7, 61.8, 11.2, 9.9, 9.7, 10.1, 4.8, 4.9, 1.0, 19.2, 6.8]}
unimodal_mse: {'log_weight': 6.5, 'prob_threshold': 0.0}
unimodal_kl: {'log_weight': 6.5, 'prob_threshold': 0.0}
image_orig_dims: {'width': 396, 'height': 406}
image_resize_dims: {'width': 256, 'height': 256}
data_dir: toy_datasets/toymouseRunningData
video_dir: unlabeled_videos
csv_file: CollectedData_.csv
header_rows: [1, 2]
downsample_factor: 2
num_keypoints: 17
mirrored_column_matches: [[0, 1, 2, 3, 4, 5, 6], [8, 9, 10, 11, 12, 13, 14]]
columns_for_singleview_pca: [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14]
losses_to_use: ['pca_singleview']
learn_weights: False
resnet_version: 50
model_type: heatmap
heatmap_loss_type: mse
model_name: my_base_toy_model
anneal_weight: {'attr_name': 'total_unsupervised_importance', 'init_val': 0.0, 'increase_factor': 0.01, 'final_val': 1.0, 'freeze_until_epoch': 0}
/anaconda/envs/lightning-pose/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2895.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Number of labeled images in the full dataset (train+val+test): 90
Size of -- train set: 72, val set: 9, test set: 9
Warning: the argument {farg[0]}
shadows a Pipeline constructor argument of the same name.
[/opt/dali/dali/operators/reader/loader/video_loader.h:178] file_list_include_preceding_frame
is set to False (or not set at all). In future releases, the default behavior would be changed to True.
[/opt/dali/dali/operators/reader/nvdecoder/nvdecoder.cc:80] Warning: Decoding on a default stream. Performance may be affected.
Results of running PCA (pca_singleview) on keypoints:
Kept 13/28 components, and found:
Explained variance ratio: [0.315 0.242 0.209 0.073 0.048 0.034 0.021 0.015 0.01 0.007 0.007 0.005
0.004 0.003 0.002 0.001 0.001 0.001 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ]
Variance explained by 13 components: 0.991
/home/walthamadmin/notebooks/projects/lightning-pose/lightning_pose/losses/losses.py:326: UserWarning: Using empirical epsilon=0.194 * multiplier=1.000 -> total=0.194 for pca_singleview loss
warnings.warn(
/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py:22: LightningDeprecationWarning: pytorch_lightning.core.lightning.LightningModule has been deprecated in v1.7 and will be removed in v1.9. Use the equivalent class from the pytorch_lightning.core.module.LightningModule class instead.
rank_zero_deprecation(
Initializing a SemiSupervisedHeatmapTracker instance.
/anaconda/envs/lightning-pose/lib/python3.8/site-packages/torchvision/models/_utils.py:135: UserWarning: Using 'weights' as positional parameter(s) is deprecated since 0.13 and will be removed in 0.15. Please use keyword parameter(s) instead.
warnings.warn(
/anaconda/envs/lightning-pose/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None
for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=ResNet50_Weights.IMAGENET1K_V1
. You can also use weights=ResNet50_Weights.DEFAULT
to get the most up-to-date weights.
warnings.warn(msg)
/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:446: LightningDeprecationWarning: Setting Trainer(gpus=[0])
is deprecated in v1.7 and will be removed in v2.0. Please use Trainer(accelerator='gpu', devices=[0])
instead.
rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:285: LightningDeprecationWarning: The Callback.on_epoch_start
hook was deprecated in v1.6 and will be removed in v1.8. Please use Callback.on_<train/validation/test>_epoch_start
instead.
rank_zero_deprecation(
Missing logger folder: tb_logs/my_base_toy_model
Number of labeled images in the full dataset (train+val+test): 90
Size of -- train set: 72, val set: 9, test set: 9
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
134 K Trainable params
23.5 M Non-trainable params
23.6 M Total params
94.356 Total estimated model params size (MB)
/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:219: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers
argument(try 6 which is the number of cpus on this machine) in the
DataLoader` init to improve performance.
rank_zero_warn(
Epoch 0: 0%| | 0/10 [00:00<?, ?it/s]/home/walthamadmin/notebooks/projects/lightning-pose/lightning_pose/data/dali.py:103: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
return torch.tensor(
Error executing job with overrides: []
Traceback (most recent call last):
File "scripts/train_hydra.py", line 110, in train
trainer.fit(model=model, datamodule=data_module)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
self._call_and_handle_interrupt(
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 737, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1168, in _run
results = self._run_stage()
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1254, in _run_stage
return self._run_train()
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1285, in _run_train
self.fit_loop.run()
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 270, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 203, in advance
batch_output = self.batch_loop.run(kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 87, in advance
outputs = self.optimizer_loop.run(optimizers, kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 201, in advance
result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 240, in _run_optimization
closure()
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 146, in call
self._result = self.closure(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 141, in closure
self._backward_fn(step_output.closure_loss)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 304, in backward_fn
self.trainer._call_strategy_hook("backward", loss, optimizer, opt_idx)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1706, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 191, in backward
self.precision_plugin.backward(self.lightning_module, closure_loss, optimizer, optimizer_idx, *args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 80, in backward
model.backward(closure_loss, optimizer, optimizer_idx, *args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 1418, in backward
loss.backward(*args, **kwargs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/torch/_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/anaconda/envs/lightning-pose/lib/python3.8/site-packages/torch/autograd/init.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 14]], which is output 0 of LinalgVectorNormBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Epoch 0: 0%|
The current package only supports multiview setups that have fused views across cameras into a single frame. This does not scale well past 2-4 views.
Enabling learn_weights in model_params.yaml throws errors
learn_weights: True
Variable naming issue in factory.py
Hello, I am trying to install lightning-pose as outlined here: https://lightning-pose.readthedocs.io/en/latest/source/installation.html
I've created a conda environment in Python 3.8, installed lightning_pose from git, ran python -c "import lightning_pose"
successfully, installed the dependencies with a success message, but when I try pytest
I get the following error:
sys.exit(console_main())
File "/home/ubuntu/.local/lib/python3.8/site-packages/_pytest/config/__init__.py", line 198, in console_main
code = main()
File "/home/ubuntu/.local/lib/python3.8/site-packages/_pytest/config/__init__.py", line 156, in main
config = _prepareconfig(args, plugins)
File "/home/ubuntu/.local/lib/python3.8/site-packages/_pytest/config/__init__.py", line 338, in _prepareconfig
config = pluginmanager.hook.pytest_cmdline_parse(
File "/home/ubuntu/.local/lib/python3.8/site-packages/pluggy/_hooks.py", line 501, in __call__
return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pluggy/_manager.py", line 119, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pluggy/_callers.py", line 138, in _multicall
raise exception.with_traceback(exception.__traceback__)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pluggy/_callers.py", line 121, in _multicall
teardown.throw(exception) # type: ignore[union-attr]
File "/home/ubuntu/.local/lib/python3.8/site-packages/_pytest/helpconfig.py", line 105, in pytest_cmdline_parse
config = yield
File "/home/ubuntu/.local/lib/python3.8/site-packages/pluggy/_callers.py", line 102, in _multicall
res = hook_impl.function(*args)
File "/home/ubuntu/.local/lib/python3.8/site-packages/_pytest/config/__init__.py", line 1096, in pytest_cmdline_parse
self.parse(args)
File "/home/ubuntu/.local/lib/python3.8/site-packages/_pytest/config/__init__.py", line 1449, in parse
self._preparse(args, addopts=addopts)
File "/home/ubuntu/.local/lib/python3.8/site-packages/_pytest/config/__init__.py", line 1326, in _preparse
self.pluginmanager.load_setuptools_entrypoints("pytest11")
File "/home/ubuntu/.local/lib/python3.8/site-packages/pluggy/_manager.py", line 414, in load_setuptools_entrypoints
plugin = ep.load()
File "/usr/lib/python3.8/importlib/metadata.py", line 77, in load
module = import_module(match.group('module'))
File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "/home/ubuntu/.local/lib/python3.8/site-packages/_pytest/assertion/rewrite.py", line 178, in exec_module
exec(co, module.__dict__)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torchtyping/__init__.py", line 11, in <module>
from .typechecker import patch_typeguard
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "/home/ubuntu/.local/lib/python3.8/site-packages/_pytest/assertion/rewrite.py", line 178, in exec_module
exec(co, module.__dict__)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torchtyping/typechecker.py", line 4, in <module>
import typeguard
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "/home/ubuntu/.local/lib/python3.8/site-packages/_pytest/assertion/rewrite.py", line 178, in exec_module
exec(co, module.__dict__)
File "/home/ubuntu/.local/lib/python3.8/site-packages/typeguard/__init__.py", line 48, in <module>
load_plugins()
File "/home/ubuntu/.local/lib/python3.8/site-packages/typeguard/_checkers.py", line 874, in load_plugins
for ep in entry_points(group="typeguard.checker_lookup"):
TypeError: entry_points() got an unexpected keyword argument 'group'
Based on related issues 1, 2, 3, 4, it seems like there's some incompatibility with the version of importlib-metadata
. I've tried manually installing several different versions and still get the same error. Can you recommend a particular version that is compatible?
After following all the directions on the lightning pose installation page (I'm on Ubuntu 20.04 so I skipped installing fiftyone-db-ubuntu2204), I get the following from conda list
:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
attrs 23.2.0 pypi_0 pypi
blinker 1.7.0 pypi_0 pypi
bzip2 1.0.8 hd590300_5 conda-forge
ca-certificates 2024.2.2 hbcca054_0 conda-forge
certifi 2024.2.2 pypi_0 pypi
charset-normalizer 3.3.2 pypi_0 pypi
click 8.1.7 pypi_0 pypi
idna 3.6 pypi_0 pypi
importlib-metadata 7.1.0 pypi_0 pypi
jsonschema 4.21.1 pypi_0 pypi
jsonschema-specifications 2023.12.1 pypi_0 pypi
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 13.2.0 h807b86a_5 conda-forge
libgomp 13.2.0 h807b86a_5 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libsqlite 3.45.2 h2797004_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
lightning-pose 1.1.0 dev_0 <develop>
markdown 3.6 pypi_0 pypi
ncurses 6.4.20240210 h59595ed_0 conda-forge
nvidia-dali-cuda110 1.34.0 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
openssl 3.2.1 hd590300_1 conda-forge
pip 24.0 pyhd8ed1ab_0 conda-forge
pkgutil-resolve-name 1.3.10 pypi_0 pypi
pyasn1 0.5.1 pypi_0 pypi
pyasn1-modules 0.3.0 pypi_0 pypi
python 3.8.19 hd12c33a_0_cpython conda-forge
pyyaml 6.0.1 pypi_0 pypi
readline 8.2 h8228510_1 conda-forge
referencing 0.34.0 pypi_0 pypi
requests 2.31.0 pypi_0 pypi
rpds-py 0.18.0 pypi_0 pypi
setuptools 69.2.0 pyhd8ed1ab_0 conda-forge
six 1.16.0 pypi_0 pypi
tk 8.6.13 noxft_h4845f30_101 conda-forge
urllib3 1.26.18 pypi_0 pypi
wheel 0.42.0 pyhd8ed1ab_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
zipp 3.18.1 pypi_0 pypi
Thanks for any suggestions you have.
Hi there, amazing work. I am wondering how vital the dependency on fixed image sizes for training your setup is. Reading the code it seems you are currently requiring a fixed image size but DALIs transform module should be able to scale dynamically.
Best
Jan
Hello,
Thank you so much for this amazing module/library. It's really well written and it has a very informative documentation.
(This issue should be more like a pull request, but I'm quite new on github and I don't have too much time right now. Sorryyy)
I successfully used train_hydra.py
to train the model on my dataset.
I now would like to use predict_new_vids.py
to do inference on some new videos coming from an additional dataset. I would to use the pca_singleview_error
and the temporal_norm to select some frames from these videos to manually relabel. The problem is that predict_new_vids.py
as it's written, outputs only the likelihood, which is not very informative.
I modified this predict_new_vids.py so that it outputs the estimated keypoints, the pca_singleview_error
, and the temporal_norm
in 3 separate csv files as train_hydra.py
does during its "predict" phase.
from lightning_pose.utils.scripts import (
compute_metrics,
export_predictions_and_labeled_video,
get_data_module,
get_dataset,
get_imgaug_transform,
)
...
@typechecked
class VideoPredPathHandler:
# ...
def build_pred_file_basename(self, extra_str="") -> str:
# return "%s_%s%s%s.csv" % (
# self.video_basename,
# self.model_cfg.model.model_type,
# self.loss_str,
# extra_str,
# )
return f"{self.video_basename}.csv"
...
@hydra.main(config_path="configs", config_name="config_mirror-mouse-example")
def predict_videos_in_dir(cfg: DictConfig):
# ...
for _, hydra_relative_path in enumerate(cfg.eval.hydra_paths):
# ...
for video_file in video_files:
# ...
print(f"\n\n{prediction_csv_file = }\n\n")
export_predictions_and_labeled_video(
video_file=video_file,
cfg=cfg,
ckpt_file=ckpt_file,
prediction_csv_file=prediction_csv_file,
labeled_mp4_file=labeled_mp4_file,
trainer=trainer,
model=model,
data_module=data_module,
save_heatmaps=cfg.eval.get(
"predict_vids_after_training_save_heatmaps", False
),
)
# compute and save various metrics
try:
compute_metrics(
cfg=cfg,
preds_file=prediction_csv_file,
data_module=data_module,
)
except Exception as e:
print(f"Error predicting on video {video_file}:\n{e}")
continue
if __name__ == "__main__":
predict_videos_in_dir()
First of all, thank you for such a well-written code and document. Everything was easy to read and understand, and the codes were organized very nicely. As a person with no computer degree, this is very much appreciated, especially when you don't see it often from the people in your field.
What I have tried to do is what the title says: I have a custom-trained YOLOv8n pose model, which is in .pt format. I wanted to add this model as 1) supervised tracking model, and 2) possibly extend this into a semi-supervised tracker.
However, even after following your detailed instructions on adding a new model, I have failed to do so.
What I have done is the following. It is in the order of the document page.
Added [[YOLOtracker.py]that defines two new tracker classes - YOLOtracker and SemisupervisedYOLOtracker
Added ‘YOLOtracker’ to ALLOWED_MODELS in models/init.py
Created new config with model_type: “YOLOtracker”
line85 of utils/scripts.py
if cfg.model.model_type == "regression" or cfg.model.model_type == "YOLOtracker":
added
elif cfg.model.model_type == "YOLOtracker":
model = YOLOtracker(
num_keypoints=cfg.data.num_keypoints,
# loss_factory=loss_factories["supervised"],
backbone=cfg.model.backbone,
# torch_seed=cfg.training.rng_seed_model_pt,
# lr_scheduler=lr_scheduler,
# lr_scheduler_params=lr_scheduler_params,
# image_size=image_h, # only used by ViT
added in get_model_class
elif map_type == "YOLOtracker":
from lightning_pose.models import YOLOtracker as Model
I have also went ahead and added "YOLOtracker":RegressionMSELoss to losses.losses, which was missing from the document.
then I went to the unit test and created,
def test_supervised_YOLO(
cfg, base_data_module, video_dataloader, trainer, remove_logs
):
"""Test the initialization and training of a supervised YOLO model."""
# cfg = '/home/tarislada/Behavitproject/lightning-pose/scripts/configs/config_custom.yaml'
cfg_tmp = copy.deepcopy(cfg)
cfg_tmp.model.model_type = "YOLOtracker"
cfg_tmp.model.losses_to_use = []
run_model_test(
cfg=cfg_tmp,
data_module=base_data_module,
video_dataloader=video_dataloader,
trainer=trainer,
remove_logs_fn=remove_logs,
)
Which failed with the message
FAILED tests/models/test_custom_trackers.py::test_supervised_YOLO - RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Because I did see the "Initializing a YOLOtracker instance." message and the usual YOLO run command line messages, I assume that the implementation regarding the lightning-pose document was successful.
Any idea on how to get this working?
Hi, lightning pose team
based on your tutorial, I can run the training with 'train_hydra.py' without error reporting:
python train_hydra.py --config-path=/root/autodl-tmp/DLC_LP --config-name=config_LP.yaml
and get a new directory: outputs/2024-04-07/11-48-45/
Now I want to run 'predict_new_vids.py'
python predict_new_vids.py --config-path=/root/autodl-tmp/DLC_LP --config-name=config_LP.yaml
but it gives the error as:
[2024-04-07 23:33:49,971][HYDRA] /root/miniconda3/lib/python3.8/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Error executing job with overrides: []
Traceback (most recent call last):
File "predict_new_vids.py", line 116, in predict_videos_in_dir
absolute_cfg_path = return_absolute_path(hydra_relative_path, n_dirs_back=2)
File "/root/miniconda3/lib/python3.8/site-packages/lightning_pose/utils/io.py", line 153, in return_absolute_path
raise IOError("%s is not a valid path" % abs_path)
OSError: /root/autodl-tmp/DLC_LP/outputs/outputs/2024-04-07/11-48-45/ is not a valid path
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
root@autodl-container-6be511a5ae-e94941e1:~/autodl-tmp/DLC_LP# python predict_new_vids.py --config-path=/root/autodl-tmp/DLC_LP --config-name=config_LP.yaml
[2024-04-07 23:34:41,733][HYDRA] /root/miniconda3/lib/python3.8/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Error executing job with overrides: []
Traceback (most recent call last):
File "predict_new_vids.py", line 116, in predict_videos_in_dir
absolute_cfg_path = return_absolute_path(hydra_relative_path, n_dirs_back=2)
File "/root/miniconda3/lib/python3.8/site-packages/lightning_pose/utils/io.py", line 153, in return_absolute_path
raise IOError("%s is not a valid path" % abs_path)
OSError: /root/autodl-tmp/DLC_LP/outputs/outputs/2024-04-07/11-48-45/ is not a valid path
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
And here is my config file:
data:
image_orig_dims:
height: 2160
width: 2160
image_resize_dims:
height: 512
width: 512
data_dir: /root/autodl-tmp/DLC_LP
video_dir: /root/autodl-tmp/DLC_LP/videos
csv_file: CollectedData.csv
downsample_factor: 2
num_keypoints: 6
keypoint_names:
- snout
- forepaw_L
- forefaw_R
- hindpaw_L
- hindpaw_R
- base
mirrored_column_matches: null
columns_for_singleview_pca: null
training:
imgaug: dlc
train_batch_size: 8
val_batch_size: 32
test_batch_size: 32
train_prob: 0.95
val_prob: 0.05
train_frames: 1
num_gpus: 1
num_workers: 4
early_stop_patience: 3
unfreezing_epoch: 20
min_epochs: 5
max_epochs: 10
log_every_n_steps: 10
check_val_every_n_epoch: 5
gpu_id: 0
rng_seed_data_pt: 0
rng_seed_model_pt: 0
lr_scheduler: multisteplr
lr_scheduler_params:
multisteplr:
milestones:
- 150
- 200
- 250
gamma: 0.5
model:
losses_to_use:
- pca_singleview
- temporal
backbone: resnet50_animal_ap10k
model_type: heatmap_mhcrnn
heatmap_loss_type: mse
model_name: DLC_LP
dali:
general:
seed: 123456
base:
train:
sequence_length: 32
predict:
sequence_length: 96
context:
train:
batch_size: 16
predict:
sequence_length: 96
losses:
pca_multiview:
log_weight: 5.0
components_to_keep: 3
epsilon: null
pca_singleview:
log_weight: 5.0
components_to_keep: 0.99
epsilon: null
temporal:
log_weight: 5.0
epsilon: 20.0
prob_threshold: 0.05
eval:
hydra_paths: ["outputs/2024-04-07/11-48-45/"]
predict_vids_after_training: true
save_vids_after_training: false
fiftyone:
dataset_name: test
model_display_names:
- test_model
launch_app_from_script: false
remote: true
address: 127.0.0.1
port: 5151
test_videos_directory: /root/autodl-tmp/DLC_LP/videos
saved_vid_preds_dir: null
confidence_thresh_for_vid: 0.9
video_file_to_plot: null
pred_csv_files_to_plot:
- ' '
callbacks:
anneal_weight:
attr_name: total_unsupervised_importance
init_val: 0.0
increase_factor: 0.01
final_val: 1.0
freeze_until_epoch: 0
hydra:
run:
dir: outputs/${now:%Y-%m-%d}/${now:%H-%M-%S}
sweep:
dir: multirun/${now:%Y-%m-%d}/${now:%H-%M-%S}
subdir: ${hydra.job.num}
so any suggestion? Thank you.
Hi, Thanks a lot for the development of this amazing package.
Just trying to run the demo notebook in colab when running !pytest
got the error below.
Also, I noticed that the config_toy-dataset.yaml file is missing from the directory where it should be according to notebook.
Thanks a lot for your help,
Anto
============================= test session starts ==============================
platform linux -- Python 3.10.6, pytest-7.3.1, pluggy-1.2.0
rootdir: /content/lightning-pose
plugins: torchtyping-0.1.4, hydra-core-1.3.2, typeguard-3.0.2, anyio-3.7.1
collected 56 items / 1 error
==================================== ERRORS ====================================
__________________ ERROR collecting tests/models/test_base.py __________________
ImportError while importing test module '/content/lightning-pose/tests/models/test_base.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/models/test_base.py:3: in <module>
import segment_anything
E ModuleNotFoundError: No module named 'segment_anything'
=========================== short test summary info ============================
ERROR tests/models/test_base.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
=============================== 1 error in 1.79s ===============================
lightning pose prompted to install an update
went through ok - no issue
used LP for a while - firefox then prompted for update
updated firefox
now when calling: "lightning run app app.py" nothing happens and we get: "Please call fabric run model
instead"
when running this we now get
Root Cause (first observed failure):
[0]:
time : 2024-04-23_13:04:26
host : lightningpose-901045
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 6409)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
bit unsure how to fix our current system now...
do we need to convert our system/code to fabric? https://lightning.ai/docs/fabric/stable/fundamentals/convert.html
bit unsure!
log files - where are they (path) for analysis later?
any pointers or more recent documentation would be gratefully received....
*****A short question about the online version: how does the GPU is used? I created an account yesterday and I used it for few minutes, I left it open and today I needed to pay. So how should i save GPU hours?
Anyway I think the online version is problematic in terms of GPU hours allowance so i would like to install it locally. Could you please help with the above situation?
Have a good day!
hello.
I'm trying to run lightning-pose for CalMS21 and test how it perfroms. I configured a new config .yaml file for this specific dataset, but the predictions on the video isn't really working. The new config file for the calm21 is mostly the same with crim13 config file, but added a video to leverage the unsupervised losses (pca_singleview & temporal), and also tested up to 1000 training epochs. The predicted keypoints stay at the top left corner of the video. Since, there was a crim13 config file as default, I wonder how the model performed on crim13 dataset, since these two datasets share a lot of features.
Below is the frame result of the predicted video, hoping it will help. Thank you in advance.
Best regards
Hello Dan,
I am very eager to try this tool. I have successfully installed lightning-pose and label-studio. I am somewhat experienced with DeepLabCut.
I have a video I would like to extract frames from to label and train a model. I am stuck at not knowing how to extract frames. When I try to import my video into label-studio, I get an error. I have spent a good deal of time reading through the examples here on your Github, and it seems to me that most of the demos assume you already have labeled data. Are there instructions somewhere for how to implement the full workflow (fig. 6 in your paper) starting from a new video?
Hello!
I'm using lightning pose for my project, but it got stuck on the cell for model building and won't run through. I downloaded Lightning Pose using the conda from source installation method. I've tried running the command from the terminal and also breaking it down step by step in Jupyter (like the one used in demo on Colab).
Here's what I got when I ran this cell in Jupyter:
Thank you in advance!
Happy holidays!
Alan
Hello and happy Friday,
I couldn't find these topics in the documentation so some clarification would be appreciated.
When adding new videos to an existing project,
Thank you very much for your work!
Hello!
Excited to try lightning-pose, but cannot get through pytest. I am mostly interested in the features that deepgraphpose had, to avoid paw switches and try to follow fast moving things better, at least right now.
I am trying to install on codeocean, so there might be platform specific issues. Having a dockerimage would probably help, I wonder if you have that handy.
pytest didn't want to start first:
patch_typeguard() in /data/utils throws an error in the first try, saying there is no _CallMemo in typeguard
this fixed the issue:
try:
patch_typeguard() # use before @typechecked
except:
print('patch_typeguard() failed for the first try, retrying')
patch_typeguard()
But there is probably a better way to do this. :D
then I needed to change how the CombinedLoader is imported from pytorch_lightning to the following:
from pytorch_lightning.utilities.combined_loader import CombinedLoader
then pytest started, but threw a bunch of errors.
I wonder if I am not using the correct version, before I would go in and try to fix the errors.
Please see the output of pytest below.
Best,
Marton
=============================== short test summary info ===============================
FAILED tests/test_metrics.py::test_pixel_error - UnboundLocalError: local variable 'pixel_error' referenced before assignment
FAILED tests/test_metrics.py::test_pca_singleview_reprojection_error - TypeError: cannot create weak reference to 'property' object
FAILED tests/test_metrics.py::test_pca_multiview_reprojection_error - TypeError: cannot create weak reference to 'property' object
FAILED tests/data/test_dali.py::test_video_pipe - RuntimeError: Critical error when building pipeline:
FAILED tests/data/test_dali.py::test_PrepareDALI - RuntimeError: Critical error when building pipeline:
FAILED tests/data/test_datasets.py::test_base_dataset - TypeError: cannot create weak reference to 'property' object
FAILED tests/data/test_datasets.py::test_base_dataset_context - TypeError: cannot create weak reference to 'property' object
FAILED tests/data/test_utils.py::test_generate_heatmaps_weird_shape - TypeError: cannot create weak reference to 'property' object
FAILED tests/losses/test_helpers.py::test_empirical_epsilon - TypeError: isinstance() arg 2 must be a type or tuple of types
FAILED tests/losses/test_helpers.py::test_convert_dict_values_to_tensor - TypeError: isinstance() arg 2 must be a type or tuple of types
FAILED tests/losses/test_losses.py::test_heatmap_mse_loss - TypeError: cannot create weak reference to 'property' object
FAILED tests/losses/test_losses.py::test_heatmap_kl_loss - TypeError: cannot create weak reference to 'property' object
FAILED tests/losses/test_losses.py::test_heatmap_js_loss - TypeError: cannot create weak reference to 'property' object
FAILED tests/losses/test_losses.py::test_pca_singleview_loss - TypeError: cannot create weak reference to 'property' object
FAILED tests/losses/test_losses.py::test_pca_multiview_loss - TypeError: cannot create weak reference to 'property' object
FAILED tests/losses/test_losses.py::test_temporal_loss - TypeError: cannot create weak reference to 'property' object
FAILED tests/losses/test_losses.py::test_unimodal_mse_loss - TypeError: cannot create weak reference to 'property' object
FAILED tests/losses/test_losses.py::test_unimodal_kl_loss - TypeError: cannot create weak reference to 'property' object
FAILED tests/losses/test_losses.py::test_unimodal_js_loss - TypeError: cannot create weak reference to 'property' object
FAILED tests/losses/test_losses.py::test_regression_mse_loss - TypeError: cannot create weak reference to 'property' object
FAILED tests/losses/test_losses.py::test_regression_rmse_loss - TypeError: cannot create weak reference to 'property' object
FAILED tests/models/test_base.py::test_backbone - OSError: [Errno 28] No space left on device
FAILED tests/models/test_base.py::test_representation_shapes_truncated_resnet - OSError: [Errno 28] No space left on device
FAILED tests/models/test_base.py::test_representation_shapes_full_resnet - OSError: [Errno 28] No space left on device
ERROR tests/data/test_datamodules.py::test_heatmap_datamodule - TypeError: cannot create weak reference to 'property' object
ERROR tests/data/test_datamodules.py::test_base_data_module_combined - TypeError: cannot create weak reference to 'property' object
ERROR tests/data/test_datamodules.py::test_heatmap_data_module_combined - TypeError: cannot create weak reference to 'property' object
ERROR tests/data/test_datasets.py::test_heatmap_dataset - TypeError: cannot create weak reference to 'property' object
ERROR tests/data/test_datasets.py::test_heatmap_dataset_context - TypeError: cannot create weak reference to 'property' object
ERROR tests/data/test_datasets.py::test_equal_return_sizes - TypeError: cannot create weak reference to 'property' object
ERROR tests/data/test_utils.py::test_data_extractor - TypeError: cannot create weak reference to 'property' object
ERROR tests/data/test_utils.py::test_generate_heatmaps - TypeError: cannot create weak reference to 'property' object
ERROR tests/models/test_heatmap_tracker.py::test_supervised_heatmap - TypeError: cannot create weak reference to 'property' object
ERROR tests/models/test_heatmap_tracker.py::test_supervised_heatmap_context - TypeError: cannot create weak reference to 'property' object
ERROR tests/models/test_heatmap_tracker.py::test_semisupervised_heatmap_temporal - TypeError: cannot create weak reference to 'property' object
ERROR tests/models/test_heatmap_tracker.py::test_semisupervised_heatmap_pcasingleview_context - TypeError: cannot create weak reference to 'property' object
ERROR tests/models/test_heatmap_tracker_mhcrnn.py::test_supervised_heatmap_mhcrnn - TypeError: cannot create weak reference to 'property' object
ERROR tests/models/test_heatmap_tracker_mhcrnn.py::test_semisupervised_heatmap_mhcrnn_pcasingleview - TypeError: cannot create weak reference to 'property' object
ERROR tests/models/test_regression_tracker.py::test_supervised_regression - TypeError: cannot create weak reference to 'property' object
ERROR tests/models/test_regression_tracker.py::test_supervised_regression_context - TypeError: cannot create weak reference to 'property' object
ERROR tests/models/test_regression_tracker.py::test_semisupervised_regression_temporal - TypeError: cannot create weak reference to 'property' object
ERROR tests/models/test_regression_tracker.py::test_semisupervised_regression_pcasingleview_context - TypeError: cannot create weak reference to 'property' object
ERROR tests/utils/test_pca.py::test_train_loader_iter - TypeError: cannot create weak reference to 'property' object
ERROR tests/utils/test_pca.py::test_pca_keypoint_class - TypeError: cannot create weak reference to 'property' object
ERROR tests/utils/test_pca.py::test_singleview_format_and_loss - TypeError: cannot create weak reference to 'property' object
========== 24 failed, 13 passed, 10 warnings, 21 errors in 94.60s (0:01:34) ===========
Hi lightning-pose team.
Our lab collects video using avi formats and it seems the lightning-pose only support the mp4 format (correct me if i am wrong!). Could you add features to support avi formats as Deeplabcut did? It seems the online video format convert tools are not free. This will make our work much easier.
Thank you so much!
Best,
Nora
Hello Lightning Pose,
I have a 12Gb RTX3060 GPU and I received a CUDA out-of-memory error when trying to run the "semi-supervised" model:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 11.76 GiB total capacity; 9.16 GiB already allocated; 38.31 MiB free; 9.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
This was run using Pose-app/app.py. It was training a model based on a 180Mb video and 20 labeled frames with 4 keypoints. I could upgrade to a 24 Gb GPU if I knew that this would fix the issue. Is there a recommended GPU size for running Lightning-Pose?
trainer.fit
will crash if the unlabeled data are in .avi
format, just hangs without any errors.
This is obviously dealt with in the io module with a couple of asserts and booleans, but before digging into the details I figured asking why here would be worth while.
Hi, thanks for releasing lightning-pose.
I got an error message when generating the heatmaps for the testing video by calling the export_predictions_and_labeled_video() function, from https://github.com/danbider/lightning-pose/blob/6135e5a50523d4a9d8f1ba986b76e28e3dcd0cf1/scripts/train_hydra.py. The error message is shown as below.
export_predictions_and_labeled_video(
video_file=video_file,
cfg=cfg,
ckpt_file=best_ckpt,
prediction_csv_file=prediction_csv_file,
labeled_mp4_file=labeled_mp4_file,
trainer=trainer,
model=model,
data_module=data_module_pred,
save_heatmaps=cfg.eval.get(
"predict_vids_after_training_save_heatmaps", True
),
)
Error message:
Traceback (most recent call last):
File "/root/capsule/scratch/lightning-pose/scripts/train_hydra.py", line 297, in train
export_predictions_and_labeled_video(
File "/lightning-pose/lightning_pose/utils/scripts.py", line 675, in export_predictions_and_labeled_video
preds_df = predict_single_video(
File "/lightning-pose/lightning_pose/utils/predictions.py", line 397, in predict_single_video
keypoints, confidences, heatmaps = _predict_frames(
File "/lightning-pose/lightning_pose/utils/predictions.py", line 437, in _predict_frames
def _predict_frames(
File "/opt/conda/lib/python3.8/site-packages/typeguard/_functions.py", line 113, in check_argument_types
check_type_internal(value, expected_type, memo=memo)
File "/opt/conda/lib/python3.8/site-packages/typeguard/_checkers.py", line 680, in check_type_internal
raise TypeCheckError(f"is not an instance of {qualified_name(origin_type)}")
typeguard.TypeCheckError: argument "model" (lightning_pose.models.heatmap_tracker.HeatmapTracker) is not an instance of pytorch_lightning.core.module.LightningModule
Thanks,
Di
Hi,lightning pose team
I tried lightning pose with temporal model. It seemed that the model would converge at around 100 epochs. And the default setting is around 100-300 epochs in the example. Would a few hundreds epochs of training be enough in temporal model?
And how about basic model? Still a few hundred epochs?
Usually when I used deeplabcut, I have to go to 200k - 500k iterations or even more. I am not sure what relationship would be between epoch in LP and the iteration in DLC. But it seems that LP could converge or finish training much faster in my dataset.
When loading weights from a fine-tuned vit_b_sam backbone, if the fine-tuning frame size is not 1024x1024 the following error is raised:
RuntimeError: Error(s) in loading state_dict for HeatmapTracker:
size mismatch for backbone.pos_embed: copying a param with shape torch.Size([1, 16, 16, 768]) from checkpoint, the shape in current model is torch.Size([1, 64, 64, 768]).
The problem:
The solution:
Instead of loading the state dict directly into the model using Model.load_from_checkpoint
, this step needs to be broken into several parts:
Hi, thanks for releasing lightning-pose.
I found a bug when predicting a folder of videos using the script lightning-pose/scripts/train_hydra.py.
The x and y coordinates of keypoints on the labeled videos will be shifted when the testing videos have different dimensions.
To avoid the shift of xy coordinates, I updated the image's original dimension for the testing video with the following codes before calling export_predictions_and_labeled_video():
clip = VideoFileClip(video_file)
cfg.data.image_orig_dims.width = clip.w
cfg.data.image_orig_dims.height = clip.h
Thanks,
Di
We have run into this issue twice now.
We are running ~/Pose-app/app.py to run lightning pose.
It seems like something gets messed up in sqlite. The problem is we haven't been able to clear the error. We only solved this the first time with a complete re-install.
If you understand how to reset this database to somehow get lightning-pose going again, that would be helpful.
We tried removing and re-installing ~/venv-label-studio, but that did not fix the problem.
Runtime error
database disk image is malformed
Traceback (most recent call last):
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/db/backends/sqlite3/base.py", line 423, in execute
return Database.Cursor.execute(self, query, params)
sqlite3.DatabaseError: database disk image is malformed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/rest_framework/views.py", line 506, in dispatch
response = handler(request, *args, **kwargs)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/utils/decorators.py", line 43, in _wrapper
return bound_method(*args, **kwargs)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/label_studio/projects/api.py", line 165, in get
return super(ProjectListAPI, self).get(request, *args, **kwargs)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/rest_framework/generics.py", line 239, in get
return self.list(request, *args, **kwargs)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/rest_framework/mixins.py", line 40, in list
page = self.paginate_queryset(queryset)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/rest_framework/generics.py", line 171, in paginate_queryset
return self.paginator.paginate_queryset(queryset, self.request, view=self)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/rest_framework/pagination.py", line 204, in paginate_queryset
self.page = paginator.page(page_number)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/core/paginator.py", line 76, in page
number = self.validate_number(number)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/core/paginator.py", line 54, in validate_number
if number > self.num_pages:
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/utils/functional.py", line 48, in get
res = instance.dict[self.name] = self.func(instance)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/core/paginator.py", line 103, in num_pages
if self.count == 0 and not self.allow_empty_first_page:
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/utils/functional.py", line 48, in get
res = instance.dict[self.name] = self.func(instance)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/core/paginator.py", line 97, in count
return c()
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/db/models/query.py", line 412, in count
return self.query.get_count(using=self.db)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/db/models/sql/query.py", line 528, in get_count
number = obj.get_aggregation(using, ['__count'])['__count']
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/db/models/sql/query.py", line 513, in get_aggregation
result = compiler.execute_sql(SINGLE)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1175, in execute_sql
cursor.execute(sql, params)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/sentry_sdk/integrations/django/init.py", line 596, in execute
return real_execute(self, sql, params)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/db/utils.py", line 90, in exit
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/home/plafave/venv-label-studio/lib/python3.8/site-packages/django/db/backends/sqlite3/base.py", line 423, in execute
return Database.Cursor.execute(self, query, params)
django.db.utils.DatabaseError: database disk image is malformed
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.