Giter VIP home page Giter VIP logo

visionosc's Introduction

Vision OSC

= PoseOSC + FaceOSC + HandOSC + OcrOSC + CatOSC + DogOSC

Download | Example

Send (almost) all Apple Vision Framework's detection results via OSC. (You can pick which one(s) to detect & send). Written in openFrameworks using Objective-C++. macOS 11+ only.

Inspired by PoseOSC, but faster, no more electron-bloat or horse neighing hacks. Compatible with the ARR format of PoseOSC.

How to re-build in Xcode

Do not attempt to re-build (with projectGenerator) unless absolutely necessary, in which case follow the following steps:

  • File > Project Settings, Build System -> New Build System
  • Left sidebar, click project name, General > Frameworks, Libraries,... Add Vision, AVKit, Foundation, AVFoundation, CoreML
  • Build Phases, Link Binary with Libraries, Change "mac catalyst" to "always"
  • Change deployment target to 11.3
  • For each file in src folder, on right sidebar, change file "Type" to Objective C++ (not extension, just in the dropdown menu)

If you encounter Undefined symbol: __objc_msgSend$identifier, you might need to set Excluded Architectures arm64. See this issue for details.

If there're some complaints about ARC, you might need to remove all mentions of autorelease in src folder.

How to Use

Settings in settings.xml will be loaded upon start.

In the packaged app, the settings.xml can be found in Contents/Resources.

See demos/VisionOSCProcessingReceiver for a Processing demo receiving all the detection types.

Receiving Poses from OSC

This is the same as ARR format of PoseOSC, copied below:

ARR will be sent to poses/arr OSC Address as an array of values (OSC spec allows multiple values of different types for each address).

  • The first value (int) is width of the frame.
  • The second value (int) is height of the frame.
  • The third value (int) is the number of poses. (When you read this value, you'll know how many more values to read, i.e. nPoses*(1+17*3). So if this number is 0 it means no pose is detected, so you can stop reading).
  • The next 52 values are data for the first pose, and the 52 values after that are data for the second pose (if there is), and so on...
  • For each pose, the first value (float) is the score for that pose, the rest 51 values (floats) can be divided into 17 groups of 3, with each group being (x,y,score) of a keypoint. For the ordering of keypoints, see PoseNet spec.

Receiving Faces from OSC

Similar to pose format (see above); sent to faces/arr OSC Address:

  • The first value (int) is width of the frame.
  • The second value (int) is height of the frame.
  • The third value (int) is the number of faces.
  • The next 229 values are data for the first face, and the 229 values after that are data for the second face (if there is), and so on...
  • For each face, the first value (float) is the score for that face, the rest 228 values (floats) can be divided into 76 groups of 3, with each group being (x,y,score) of a keypoint.

Receiving Hands from OSC

Similar to pose format (see above); sent to hands/arr OSC Address:

  • The first value (int) is width of the frame.
  • The second value (int) is height of the frame.
  • The third value (int) is the number of hands.
  • The next 64 values are data for the first hand, and the 64 values after that are data for the second hand (if there is), and so on...
  • For each hand, the first value (float) is the score for that hand, the rest 63 values (floats) can be divided into 21 groups of 3, with each group being (x,y,score) of a keypoint. For the ordering of the keypoints, see handpose spec

Receiving Texts (OCR) from OSC

Sent to texts/arr OSC Address:

  • The first value (int) is width of the frame.
  • The second value (int) is height of the frame.
  • The third value (int) is the number of text regions.
  • The next 6 values are data for the first text, and the 6 values after that are data for the second text (if there is), and so on...
  • For each text, the first value (float) is the score for that text, the next four values (float) are the (left,top,width,height) of the bounding box. The last value is what the text says (string).

Receiving Animal detections from OSC

Currently only cats and dogs are supported, per Apple's documentation.

Similar to texts format (see above); sent to animals/arr OSC Address:

  • The first value (int) is width of the frame.
  • The second value (int) is height of the frame.
  • The third value (int) is the number of animals.
  • The next 6 values are data for the first animal, and the 6 values after that are data for the second animal (if there is), and so on...
  • For each animal, the first value (float) is the score for that animal, the next four values (float) are the (left,top,width,height) of the bounding box. The last value is what the animal is (string): "Cat"/"Dog".

The JSON and XML formats supported by PoseOSC are now excluded because I've since realized it's a silly idea to add this sort of parsing overhead. Let me know if you have a case against this decision.

I recommand Protokol for testing/inspecting OSC.

Framerates

Tested on MacBook Pro (13-inch, M1, 2020) Memory 16 GB.

  • Body: 60 FPS
  • Hand: 60 FPS
  • Face: 45 FPS
  • Text: 10 FPS
  • Animal: 60FPS
  • Face, body, & hand: 25 FPS
  • Everything all on: 5 FPS

visionosc's People

Contributors

lingdong- avatar nsauzede avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

visionosc's Issues

IOSurface creation failed:

First of all, thank you for this great app
I've updated to latest OF in a fork here: https://github.com/dimitre/VisionOSC/tree/latest

I'm using body detection at 60fps, but after a while it gets an error:

IOSurface creation failed: e00002be parentID: 00000000 properties: <private> (likely per client IOSurface limit of 16384 reached)

maybe some resorce is not being released properly

Build error

How do I build and run this project after opening in XCode, it's not clear what to do with this error /Users/sj/Code/C++/VisionOSC_M1/Project.xcconfig:6:1 could not find included file '../../../libs/openFrameworksCompiled/project/osx/CoreOF.xcconfig' in search paths in the Project.xconfig

Offline mode?

Not an issue but a feature request...what about being able to work from video files as well? Would that be very hard to implement?

Select camera

Is there any way to select a different camera source in OSX?

memory build up

Hey Lingdong.

I am noticing that the memory is steadily increasing, to the point that my computer runs out of memory.
I commented out the OSC routines without solving the leak.
When none of the models is selected then the leaking stops.
Would you know how to fix that?
Thanks a bunch.
Screen Shot 2022-11-14 at 3 05 46 PM

face landmark order

do you have a reference to the order of landmarks and which part of the face they are?
Like which one is the left eye for example?

thx

needs "Excluded Architectures arm64"

This is awesome. Thanks so much for making this available.

I am on macOS 12.6 on MacBook Pro 2021 with Apple M1 Max and It only compiled without errors when setting "Excluded Architectures arm64"
I tested it on of_v0.11.2_osx_release and of_v20220530_osx_release.

Screen Shot 2022-10-03 at 7 07 32 PM
Screen Shot 2022-10-03 at 7 14 43 PM

face detection at 13 fps

I was wondering what your experience is with the face detection frame rate.
For me it is 13 fps when two people are detected.

Thx.

Screen Shot 2022-11-15 at 2 38 11 PM
Screen Shot 2022-11-15 at 2 38 31 PM

objc[40983]: Attempt to use unknown class

I have integrated the body and face detection in to my own OF app.
I notice when running the app through xcode it often crashes with 'objc[40983]: Attempt to use unknown class'.
The app does not crash when running it via it's app icon.

Is it possible that adding CGColorSpaceRelease(colorSpaceRef); here should remove this issue? It seems more stable now. Will do more testing

non-example app gets slow when face is found

I ran you example and all is well and fast.
But when I add you files and code to my already existing app then the detection it takes about 219 ms.

 ofPixels & pixels = cam_image.getPixels();
 multiTracker.detect(pixels);

[notice ] ofGetElapsedTimeMillis 65956
[notice ] ofGetElapsedTimeMillis 66175

Do you know what I need to look out for?
My camera image has the same dimensions as in your example.

When using the same code if ofxFaceTracker2 I do not see this fps drop.

thanks a bunch

VNGeneratePersonSegmentationRequest

This is not really an issue but more a feature request ;)

What do you think it would take to also get the people segmantation information out of coreML in to OF?
https://developer.apple.com/documentation/vision/applying_matte_effects_to_people_in_images_and_video

I know on the iOS photo app you need to press on the person you want to segment. So it might need some initial coordinates ... ? If so maybe it could to be combined with landmarks detection first and after the segmentation?

Would be interesting to hear your thoughts.

Cheers, Stephan.

trying to use projectGenerator

I am on an M1 MBP with macOS 12.6.3, XCode Version 13.4.1 and OF of_v20220530_osx_release.

I am hoping to include ofxOpenCv and ofxCv and followed the "How to re-build in Xcode" steps.
But I do not see any option to "Build Phases, Link Binary with Libraries, Change "mac catalyst" to "always"" ?

So far I am using the same code that ran fine in your un-modified project.

I get the ARC error:
Screen Shot 2023-03-25 at 6 40 25 PM
Screen Shot 2023-03-25 at 6 40 43 PM

Do you have any idea what else I could try?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.