z-x-yang / segment-and-track-anything Goto Github PK

An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-frame segmentation and Associating Objects with Transformers (AOT) for efficient tracking and propagation purposes.

License: GNU Affero General Public License v3.0

Python 4.40% Shell 0.02% Jupyter Notebook 95.58%

interactive-segmentation segment-anything segment-anything-model video-object-segmentation visual-object-tracking

segment-and-track-anything's Introduction

A Github Pages template for academic websites. This was forked (then detached) by Stuart Geiger from the Minimal Mistakes Jekyll Theme, which is © 2016 Michael Rose and released under the MIT License. See LICENSE.md.

I think I've got things running smoothly and fixed some major bugs, but feel free to file issues or make pull requests if you want to improve the generic template / theme.

Note: if you are using this repo and now get a notification about a security vulnerability, delete the Gemfile.lock file.

Instructions

Register a GitHub account if you don't have one and confirm your e-mail (required!)
Fork this repository by clicking the "fork" button in the top right.
Go to the repository's settings (rightmost item in the tabs that start with "Code", should be below "Unwatch"). Rename the repository "[your GitHub username].github.io", which will also be your website's URL.
Set site-wide configuration and create content & metadata (see below -- also see this set of diffs showing what files were changed to set up an example site for a user with the username "getorg-testacct")
Upload any files (like PDFs, .zip files, etc.) to the files/ directory. They will appear at https://[your GitHub username].github.io/files/example.pdf.
Check status by going to the repository settings, in the "GitHub pages" section
(Optional) Use the Jupyter notebooks or python scripts in the markdown_generator folder to generate markdown files for publications and talks from a TSV file.

See more info at https://academicpages.github.io/

To run locally (not on GitHub Pages, to serve on your own computer)

Clone the repository and made updates as detailed above
Make sure you have ruby-dev, bundler, and nodejs installed: sudo apt install ruby-dev ruby-bundler nodejs
Run bundle clean to clean up the directory (no need to run --force)
Run bundle install to install ruby dependencies. If you get errors, delete Gemfile.lock and try again.
Run bundle exec jekyll liveserve to generate the HTML and serve it from localhost:4000 the local server will automatically rebuild and refresh the pages on change.

Changelog -- bugfixes and enhancements

There is one logistical issue with a ready-to-fork template theme like academic pages that makes it a little tricky to get bug fixes and updates to the core theme. If you fork this repository, customize it, then pull again, you'll probably get merge conflicts. If you want to save your various .yml configuration files and markdown files, you can delete the repository and fork it again. Or you can manually patch.

To support this, all changes to the underlying code appear as a closed issue with the tag 'code change' -- get the list here. Each issue thread includes a comment linking to the single commit or a diff across multiple commits, so those with forked repositories can easily identify what they need to patch.

segment-and-track-anything's People

Contributors

Stargazers

Watchers

Forkers

wenguanwang fanscy ange233 ai-jie01 hechang25 haorand jonnys1226 chen742 yamy-cheng nobugw hokhyk luoyizhi516 shuowang-ai benmfeng licongguan yonglinz ylqi qingqingniu crystaltianr soeaver caopulan zhulinxiaohai wenhe-jia wilyzhao8 csldali alialemimatinpour moorthi07 allenisaacjose twonp168 eisenh paperwave techthiyanes hyunjunlee-hub danielwuuuuuu anshuai666 morbi25 xqdsj healthonrails pablovela5620 billionerd rayliu-admin undarkr wangyxxjtu mistyr0se cspchong obsidian6s fskeo hnhbcc xiaokunfeng a61730607 ntt720 dimitrius-ion nicolesherwood hay-man point-1 marsexe sono-kan nitishymtpl sergejhorvat maigone e-kiss-me limzh00 minisoco dabirsagar 0xkiki qianqian121 xymfei neonestoras hsaigroup d3p10y jkd2021 webliupeng tinotinotino22 batman-do staccats closegoingaway eitanhemed mirlansmind jbluv zhoufanking iam20cm imagr-ltd cerviny vamoko luluchou herpacker wensiyuansix winterchaufr nicbair masemxiao omigeft windb3ll gsiogkas s8xy tutuna zaku-zaku luoolu atilaxavier nahidalam n0wwa

segment-and-track-anything's Issues

Testing - tracking objects with click or text

Hi,

I understand that I can track selected objects with click or with text from WebUI. But, I would like to know if there is a script which I can use it to run instead of in UI (like we have a notebook - demo.ipynb to test segment everything)?
If this is not the case, could you please tell me which code runs when we give an input text to track in WebUI? so that I can look at that part of the code and understand if I can reuse for my requirements.

Thanks in advance

Run app.py Error

Control inputs resolution

Can we control SAM/Deaotl final input resolution from the UX/params?

Can we enable Gradio zooming in the UX? Sometimes it is hard to click some fine details.

Testing on video segmentation datasets

Is it possible for this project to be tested on video segmentation datasets (YoutubeVIS)?
What are the main difficulties: SAM not only segments instances, but also segments backgrounds? Is the segmentation mask output by SAM without category labels? Each video in the dataset is a sequence of images, and this project cannot input this file format and cannot output the result file (JSON) for metric evaluation?

HIL full sequence prompting for drifts and ambiguities

Do you think that prompted frames (e.g. to fix drift) could be dinamically added in the long term memory as reference frame?
Or we could add this as an explicit option.

Interactively segment and track individual objects

I tried the web UI and it works, but I didn't find the direct selection described by demo1 that can be clicked on the object to interact with, where merged_mask == specific Id tracks the segmentation, it always splits them all.

ModuleNotFoundError: No module named 'blib2to3'

After the step of running bash script/install.sh, to run python app.py I meet this error message, then I download a likely-module under the link blib2to3, now another error message appears
ImportError: cannot import name 'Label' from 'blib2to3.pgen2.grammar
Hope anybody could help, thanks!

Python 3.9.16
pytorch 1.12.1
torchaudio 0.12.1
torchvision 0.13.1
cudatoolkit 11.3.1

SAM multiple prediction

Can you expose the multiple SAM segmentation candidates in the UX?

Test image sequences folders with a script

Hello!

Thank you for sharing your great project! I noticed in the WebUI it supports uploading one image sequences folder to evaluate. Could it support evaluating multi-image sequence folders like DAVIS/Youtube dataset using a script instead of WebUI?

Thank you for your reply!

new request

We can achieve the effect of demo1 now. Thank you very much. What method is there to keep only the selected part and output the video without retaining the rest? In other words, it is like the movie's green cloth, only retaining the main body and removing the background.

Annotation panel is not refreshed

When I change the video the annotation panel is not refreshed with the new frame 0 (e.g. text).

Tracking result looks wrong when running the cars demo, any tips for adjusting the parameters?

Hi, I tried to run web UI demo with the cars video, I used text prompt to label all cars in the first frame and expect the tracker could track them correctly in the following frames. However, the tracking result seems not correct. Not sure if it is caused by unsuitable parameter setting. Could anyone help me with this? :)
Checkpoints used: sam_vit_b_01ec64, R50_DeAOTL_PRE_YTB_DAV, groundingdino_swint_ogc
Here is the result:

Here are all parameters I used:

My email: [email protected]

run app.py error

when I run app.py，but I got a mistake ,that is "No module named 'blib2to3'". then I install "2to3" packages ,but the mistake still exist

Dataset from the experiment

Hello, can you provide the data introduced in the experiment, I always report an error when I use my own .mp4.

tracking programmatically using a bounding box?

Let's say that I have a set of videos in which I want to track an object and get masks in return. As an input I have a bounding box containing the single object to be tracked.
Is there a straightforward way to do it programmatically, without the manual user interface?

Thanks for the feedback

Is there a way to add label to object/segment?

Thx for the great work. It looks quite promising.

I am wandering whether there is a way to 'label' the segment/object, s.t. the algorithm (by labeling and training) could automatically recognize the same (or similar) object(s) in other videos, similar to how conventional object detection + tracking works.

Specify click budget and SAM size model

Can you specify the click budget and the SAM size model in your technical report?

Image sequences

It would be nice to have the web demo compatibile with image sequences input other then Video.
It will be easier to test it on the fly with Davis and YouTube VOS dataset

How to reproduce demo6 on webUI app

Hello! Thank you very much for the useful demo :)

I was wondering if I could reproduce the demo 6 described on the intro page on the web demo since there seems to be no way to navigate through frames.

Thanks in advance!

坤坤

亲爱的唱跳rap中分先生，为什么会出现在这里

Questions about the parameters in sam_args

Thank you very much for your work.Could you please briefly explain the meaning of the parameters in sam_args and how they will affect the results?

How to deal with the disappeared objects

I tried to export trackID derived by Seg-Track and got many errors.
I found that there are no codes to deal with the object disappearance.

Segment-and-Track-Anything/seg_track_anything.py

Lines 107 to 117 in 709cf2a

 elif (frame_idx % sam_gap) == 0: 

 seg_mask = SegTracker.seg(frame) 

 torch.cuda.empty_cache() 

 gc.collect() 

 track_mask = SegTracker.track(frame) 

 # find new objects, and update tracker with new objects 

 new_obj_mask = SegTracker.find_new_objs(track_mask,seg_mask) 

 save_prediction(new_obj_mask,output_dir,str(frame_idx)+'_new.png') 

 pred_mask = track_mask + new_obj_mask 

 # segtracker.restart_tracker() 

 SegTracker.add_reference(frame, pred_mask)

I think Seg-Track, as a object tracking model, should handle this case better.
My basic idea about this is to compare the track_mask and seg_mask like what find_new_objects does.

Will you handle the issue soon?

Thanks for your patience.

跟踪结束后可以输出每帧画面的标注信息吗，如VOC标注格式

grounding_caption value?

Hi @yamy-cheng

I would like to use my own video to segment and track. But, I don't understand what should be the value of grounding_caption in the below piece of code if I want to segment everything?

pred_mask, annotated_frame = segtracker.detect_and_seg(frame, grounding_caption, box_threshold, text_threshold, box_size_threshold)

With the default value I get NameError: name 'grounding_caption' is not defined

Thanks

RuntimeError: CUDA error: invalid device function

Thank you for your great work, but when run demo.py, I meet a problem like this:
Object number: 44. Inference size: 577x1041. Output size: 1080x1920.
./anaconda3/envs/seg/lib/python3.8/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1646755853042/work/aten/src/ATen/native/TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined]
./aot/networks/layers/position.py:64: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
dim_t = self.temperature(2 * (dim_t // 2) / self.num_pos_feats)
Traceback (most recent call last):
File "./aot/tools/demo.py", line 338, in
main()
File "./aot/tools/demo.py", line 283, in main
demo(cfg)
File "./aot/tools/demo.py", line 202, in demo
engine.add_reference_frame(current_img,
File "./aot/networks/engines/aot_engine.py", line 601, in add_reference_frame
aot_engine.add_reference_frame(img,
File "./aot/networks/engines/aot_engine.py", line 234, in add_reference_frame
self.curr_lstt_output = self.AOT.LSTT_forward(curr_enc_embs,
File "./aot/networks/models/aot.py", line 103, in LSTT_forward
lstt_embs, lstt_memories = self.LSTT(curr_emb, long_term_memories,
File "./anaconda3/envs/seg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "./aot/networks/layers/transformer.py", line 106, in forward
output, memories = layer(output,
File "./anaconda3/envs/seg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "./aot/networks/layers/transformer.py", line 342, in forward
tgt3 = self.short_term_attn(local_Q, local_K, local_V)[0]
File "./anaconda3/envs/seg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "./aot/networks/layers/attention.py", line 354, in forward
qk = qk + relative_emb
RuntimeError: CUDA error: invalid device function
Segmentation fault (core dumped)
Could you help me solve this problem, thanks a lot !

Segmenting objects based on given bounding box

Hi @yamy-cheng , @yoxu515 , @z-x-yang , @lingorX

Thanks again for the great work. I know that we can give input in text format or stroke or even click on the object which we would like to track. I would like to know if its possible with the current code base to use the object's bounding box as an input to track those objects?

Really appreciate your response.
Thanks

ModuleNotFoundError: No module named 'groundingdino'

I use the command line 'bash script/install.sh' but meet the above error. I do not know why. Maybe the version of pytorch is not compatibility with the author.

Cloud I just segment and track interested object with prompt?

Is it convenient to use GroundingDINO to provide bbox and do next step with your current code?

OutOfMemoryError: CUDA out of memory.

Hello, when I trying to another video on 'demo.ipynb' I get 'OutOfMemoryError', I trying to increase the 'sam_gap' and it's work. But I want to work with more long videos (maybe real time video) and I try another video but this time again I get same error. Should I change the 'sam_gap' value for every trying? What should I do, what is your suggestion?

WebUI cannot display tracking results

When I import a video to WebUI, WebUI cannot display the tracking results.
At first, the program will report an error:

SegTracker has been initialized
Click
Click
Start tracking !
processed frame 49, obj_num 2
finished
frame 49 writed
/home/weedy/Projects/Segment-and-Track-Anything/tracking_results/blackswan/blackswan_seg.mp4 saved

finished
Traceback (most recent call last):
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/gradio/routes.py", line 422, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/gradio/blocks.py", line 1323, in process_api
    result = await self.call_function(
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/gradio/blocks.py", line 1051, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/weedy/Projects/Segment-and-Track-Anything/app.py", line 251, in tracking_objects
    return tracking_objects_in_video(Seg_Tracker, input_video, input_img_seq, fps)
  File "/home/weedy/Projects/Segment-and-Track-Anything/seg_track_anything.py", line 94, in tracking_objects_in_video
    return video_type_input_tracking(SegTracker, input_video, io_args, video_name)
  File "/home/weedy/Projects/Segment-and-Track-Anything/seg_track_anything.py", line 196, in video_type_input_tracking
    imageio.mimsave(io_args['output_gif'], masked_pred_list, fps=fps)
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/imageio/v2.py", line 484, in mimwrite
    return file.write(ims, is_batch=True, **kwargs)
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/imageio/plugins/pillow.py", line 354, in write
    raise TypeError(
TypeError: The keyword `fps` is no longer supported. Use `duration`(in ms) instead, e.g. `fps=50` == `duration=20` (1000 * 1/50).

So I changed the 196th line of the file "seg_track_anything. py":
imageio. mimsave (io_args ['output_gif '], masked_pred_list, fps=fps)
to
imageio. mimsave (io_args [" output_gif "], masked_pred_list, duration=int (1000/fps))

But WebUI still cannot output results. It will report an error:

Start tracking !
processed frame 49, obj_num 1
finished
frame 49 writed
/home/weedy/Projects/Segment-and-Track-Anything/tracking_results/blackswan/blackswan_seg.mp4 saved

finished
/home/weedy/Projects/Segment-and-Track-Anything/tracking_results/blackswan/blackswan_seg.gif saved
sh: 1: zip: not found
/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/gradio/components.py:2239: UserWarning: Video does not have browser-compatible container or codec. Converting to mp4
  warnings.warn(
Traceback (most recent call last):
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/gradio/routes.py", line 422, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/gradio/blocks.py", line 1326, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/gradio/blocks.py", line 1260, in postprocess_data
    prediction_value = block.postprocess(prediction_value)
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/gradio/components.py", line 2824, in postprocess
    "name": self.make_temp_copy_if_needed(y),
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/gradio/components.py", line 259, in make_temp_copy_if_needed
    temp_dir = self.hash_file(file_path)
  File "/home/weedy/miniconda3/envs/samtrack/lib/python3.10/site-packages/gradio/components.py", line 223, in hash_file
    with open(file_path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/weedy/Projects/Segment-and-Track-Anything/tracking_results/blackswan/blackswan_pred_mask.zip'

How should I fix that bug?

RuntimeError: CUDA out of memory.

Hello, 我在运行 app.py 时遇到了以下报错，使用的是一块3070的显卡（8G），请问除了换更好的显卡之外还有什么办法解决吗，比如调小batch_size或num_workers，但是我没有在代码中找到相应的位置，可以帮我看一下该如何解决吗，非常感谢！

/home/alienware/anaconda3/envs/SAMTrack/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined]
./aot/networks/layers/position.py:63: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
dim_t = self.temperature(2 * (dim_t // 2) / self.num_pos_feats)
Traceback (most recent call last): 0, obj_num 154
File "/home/alienware/anaconda3/envs/SAMTrack/lib/python3.9/site-packages/gradio/routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "/home/alienware/anaconda3/envs/SAMTrack/lib/python3.9/site-packages/gradio/blocks.py", line 1302, in process_api
result = await self.call_function(
File "/home/alienware/anaconda3/envs/SAMTrack/lib/python3.9/site-packages/gradio/blocks.py", line 1025, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/alienware/anaconda3/envs/SAMTrack/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/alienware/anaconda3/envs/SAMTrack/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/alienware/anaconda3/envs/SAMTrack/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/media/alienware/新加卷1/Research/Segment-and-Track-Anything-main/app.py", line 21, in predict
return seg_track_anything(input_video_file, model, sam_gap, max_obj_num, points_per_side)
File "/media/alienware/新加卷1/Research/Segment-and-Track-Anything-main/seg_track_anything.py", line 98, in seg_track_anything
pred_mask = segtracker.track(frame,update_memory=True)
File "/media/alienware/新加卷1/Research/Segment-and-Track-Anything-main/SegTracker.py", line 75, in track
pred_mask = self.tracker.track(frame)
File "/home/alienware/anaconda3/envs/SAMTrack/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/media/alienware/新加卷1/Research/Segment-and-Track-Anything-main/aot_tracker.py", line 79, in track
pred_logit = self.engine.decode_current_logits((output_height, output_width))
File "./aot/networks/engines/aot_engine.py", line 623, in decode_current_logits
pred_id_logits = self.soft_logit_aggregation(all_logits)
File "./aot/networks/engines/aot_engine.py", line 578, in soft_logit_aggregation
merged_prob = torch.cat([bg_prob] + fg_probs,
RuntimeError: CUDA out of memory. Tried to allocate 1.24 GiB (GPU 0; 7.78 GiB total capacity; 4.38 GiB already allocated; 989.56 MiB free; 4.54 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Precomputed embedding

For the interactive version do you think that the embedding could be precomputed to speed up the demo?

README.md TODO YouTube URLs lead to private videos

The links on the TODO section 1.0-Version Interactive WebUI lead to private videos.

Could it be that the correct videos are referenced under New Features [2023/4/24] Tutorial V1.0?

'groundingdino' module not found error

I am following the setup steps and trying to run the demo.ipynb.

I believe this step in the install.sh downloads the necessary groundingdino module
pip install -e git+https://github.com/IDEA-Research/GroundingDINO.git@main#egg=GroundingDINO

It is saying that all requirements are met. and setup was successful.

However, running the fi rst cell of demo.ipynb still throws the following error:

Any one else experiencing this? I tried running on google colab and faced the same issue.

Run it with multiple bbox prompts

Hey, I need to know how can I run segtracker.seg_acc_bbox function with more than one bbox prompts as can be done in SAM?

Adaptiveaclick

You could be interested in some interesting benchmarking at:
https://github.com/lab206/AdaptiveClick

Run time performance

Hi,

Thank you for such an amazing and useful project! I am wondering what the best performance of the running time for each frame you got when tracking an object in a video? The reason I am asking is that I am thinking to use this powerful tool in a real-time AR application but not sure whether it could be capable of this. Thank you again for your time and effort!

Support different aspect ratio

Can you adapt the UX to respect the different aspect ratio of the input video?

ModuleNotFoundError: No module named 'spatial_correlation_sampler'

Hello ！Thanks for sharing your code. I transfered the demo.ipynb to demo.py and when I run it I encounted the following error: '
File "/home/xwh/Segment-and-Track-Anything/./aot/networks/layers/attention.py", line 282, in init
from spatial_correlation_sampler import SpatialCorrelationSampler
ModuleNotFoundError: No module named 'spatial_correlation_sampler'''

Could you give me some suggestions ?

What's the video resolution? What is the complete error log?

The resolution of the video is 512x720
The complete error is as follows:

Failed to load OpenH264 library: openh264-1.8.0-win64.dll
        Please check environment and/or download library: https://github.com/cisco/openh264/releases

[libopenh264 @ 0000015e95976d00] Incorrect library version loaded
[ERROR:[email protected]] global cap_ffmpeg_impl.hpp:3049 open Could not open codec libopenh264, error: Unspecified error (-22)
[ERROR:[email protected]] global cap_ffmpeg_impl.hpp:3066 open VIDEOIO/FFMPEG: Failed to initialize VideoWriter
H:\Deepfacelab\Deepface\Segment-and-Track-Anything\venv\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ..\aten\src\ATen\native\TensorShape.cpp:3484.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Traceback (most recent call last): 0, obj_num 20
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\venv\lib\site-packages\gradio\routes.py", line 401, in run_predict
    output = await app.get_blocks().process_api(
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\venv\lib\site-packages\gradio\blocks.py", line 1302, in process_api
    result = await self.call_function(
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\venv\lib\site-packages\gradio\blocks.py", line 1025, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\app.py", line 18, in predict
    return seg_track_anything(input_video_file, model, sam_gap, max_obj_num, points_per_side)
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\seg_track_anything.py", line 98, in seg_track_anything
    pred_mask = segtracker.track(frame,update_memory=True)
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\SegTracker.py", line 77, in track
    self.tracker.update_memory(pred_mask)
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\aot_tracker.py", line 89, in update_memory
    self.engine.update_memory(pred_label)
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\./aot\networks\engines\aot_engine.py", line 629, in update_memory
    aot_engine.update_short_term_memory(separated_mask,
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\./aot\networks\engines\deaot_engine.py", line 27, in update_short_term_memory
    curr_id_emb = self.assign_identity(curr_one_hot_mask)
  File "H:\Deepfacelab\Deepface\Segment-and-Track-Anything\./aot\networks\engines\aot_engine.py", line 173, in assign_identity
    id_emb = self.AOT.get_id_emb(one_hot_mask).view(
RuntimeError: shape '[1, -1, 1518]' is invalid for input of size 368640

NameError: name 'annotated_frame' is not defined

Hi,

Thanks for such a wonderful work.
I am trying to run demo_instseg.ipynb I get the below error. However, I can run demo.ipynb without any issues.
NameError: name 'annotated_frame' is not defined

Can someone help me with this?

tracking model not working in demo.ipynb

I am testing the demo.ipynb segmenting + tracking functions on the cell.mp4 video. However, I am only seeing the masks generated by the SAM model, and no tracking seems to be taking place. I am looking at the cell_masks directory and cell_seg output video to verify this. Anyone idea as to what might be going wrong? The only change I made to the code is changing the max_obj_num argument to the segtracker_args to minimize gpu memory usage.

Not sure if this is related but the cell is outputting this warning for the cell that runs the tracker:

final text_encoder_type: bert-base-uncased
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Model loaded from ./ckpt/groundingdino_swint_ogc.pth 
 => _IncompatibleKeys(missing_keys=[], unexpected_keys=['label_enc.weight'])
SegTracker has been initialized
processed frame 343, obj_num 32
finished

Train my own dataset

Whether could I train my own dataset based on this project?

Segmentation And Tracking For Video

I am trying to do segmentation and tracking for a particular subject in a video , but using text prompt I was able to set up the app correctly, and it was detecting the object I wanted in the first frame properly ,but it's having issues when I start tracking and it segments everything that is present in the video instead of just segmenting the required object in the video .
I wanted to know if it's possible to segment and track anything simply based on text prompt ?

'Image' object has no attribute 'select'

an error occurred while running app.py：
Traceback (most recent call last):
File "E:\Segment-and-Track-Anything\app.py", line 674, in
seg_track_app()
File "E:\Segment-and-Track-Anything\app.py", line 522, in seg_track_app
input_first_frame.select(
AttributeError: 'Image' object has no attribute 'select'

Thanks for your suggestion

I entered a video of approximately 1M, but there was an error like this

RuntimeError: shape '[1, -1, 2145]' is invalid for input of size 524288

APP.error

When I tried the latest uploaded app. py, I found an error in the program.

	elif (frame_idx % sam_gap) == 0:
	seg_mask = SegTracker.seg(frame)
	torch.cuda.empty_cache()
	gc.collect()
	track_mask = SegTracker.track(frame)
	# find new objects, and update tracker with new objects
	new_obj_mask = SegTracker.find_new_objs(track_mask,seg_mask)
	save_prediction(new_obj_mask,output_dir,str(frame_idx)+'_new.png')
	pred_mask = track_mask + new_obj_mask
	# segtracker.restart_tracker()
	SegTracker.add_reference(frame, pred_mask)