Giter VIP home page Giter VIP logo

Comments (6)

yelantf avatar yelantf commented on May 24, 2024

Thank you for your attention! As shown in the paper, our model takes the bounding box on the center frame to do RoIAlign on all frames of the input video clips. This is mainly following previous works, but also because AVA dataset only provides boxes annotated on the center frame. Of course, we could use a tracking algorithm to generate more accurate bounding boxes on every single frame, and then use them to get more robust results. Actually, there are some previous works [link] trying that. However, we did not find a very robust tracker (especially for fast motion scenes), so we chose to use the current design in our method.

from alphaction.

pxssw avatar pxssw commented on May 24, 2024

Copy that,wish you to make greater success with the progress in the relevant fields!
At the same time, there are some little problems I meet in the project. (Maybe they are just my personal misunderstanding or bugs, if that please ignore them)

  1. The part update_action_dictionary of visualizer.py: the finall result self.action_dictionary includes the all IDs results (from the first person ), if the project is running for a leng time or for crowds maybe there will be a large demand for calculate resources? Maybe there needs a clean for the long long ago IDS'results.
  2. The cur_millis = stream.get(cv2.CAP_PROP_POS_MSEC) of video_detection_loader.py : I find , in my webcam mode, the begin value of cur_millis is very big (just like 410^8+) , I really don't konw why it not is 0ms, and the value keeps going up for different new running of my project(4.X10^8+, 5.X10^8+...). It's a common problem? I really don't konw.

from alphaction.

yelantf avatar yelantf commented on May 24, 2024

Thanks for pointing out these problems! First, I have to admit that our current demo program is not well-designed. It could have some little bugs and is also hard to read. As to these two problems you mentioned above:

  1. Yes, you are right. This is indeed a problem for long time running. We will try to enhance it following your suggestions when we are free. Of course, pull requests are also welcomed.

  2. We did not notice this issue before, and actually we did not fully test the demo script in webcam mode because that requires a server with graphical interfaces and a camera, which is not always available to us. According to the documentation of opencv, this flag should give current position of the video file in milliseconds or video capture timestamp. I'm inclined to think that it is the right format for video timestamp, which is relevant to specific camera?

from alphaction.

pxssw avatar pxssw commented on May 24, 2024

good job! 瑕不掩瑜

from alphaction.

jun0wanan avatar jun0wanan commented on May 24, 2024

Thank you for your attention! As shown in the paper, our model takes the bounding box on the center frame to do RoIAlign on all frames of the input video clips. This is mainly following previous works, but also because AVA dataset only provides boxes annotated on the center frame. Of course, we could use a tracking algorithm to generate more accurate bounding boxes on every single frame, and then use them to get more robust results. Actually, there are some previous works [link] trying that. However, we did not find a very robust tracker (especially for fast motion scenes), so we chose to use the current design in our method.

hi,
sorry to disturb you , I want to ask how the 1th clip's person bbox link to 2rd clip's person bbox (the same person)?

best,
jun

from alphaction.

jun0wanan avatar jun0wanan commented on May 24, 2024

hi,
sorry to disturb you , I want to ask how the 1th clip's person bbox link to 2rd clip's person bbox (the same person)?

best,
jun

from alphaction.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.