sign / translate Goto Github PK
View Code? Open in Web Editor NEWEffortless Real-Time Sign Language Translation
Home Page: https://sign.mt
License: Other
Effortless Real-Time Sign Language Translation
Home Page: https://sign.mt
License: Other
It would be interesting to add this GitHub action, to automatically generate reports.
https://github.com/MyActionWay/lighthouse-badger-action#lighthouse-badger-easyyml
However, this requires an overhaul of the current actions into a workflow, as it is too much.
cn
, use fl
Type "testing 1 2 3" and now there is between "1" and "2" but not between "testing" and "1"
For the sentence "long word and sequence" for example, we get
M518x544S20500483x545S15a01490x528S30a00482x483 M519x527S10018485x494S1f010490x494S20600481x476 M521x515S2ff00482x482S10612495x457S22f04488x477 S38700463x496 M518x515S20500487x504S15a01491x486 M518x529S10018483x511S1f010490x494S20600481x476 S38800464x496
Which should be displayed like
But is actually showing
Human pose viewer should not be blocking the main thread.
It is, making the UI chuggy on mobile....
Use a WebWorker with OffscreenCanvas. This does allow tfjs
to still use the webgl
backend.
Allow installing apps through the app store.
Models should work fast on mobile as well.
capacitor
to create native binaries for iOS and AndroidMediaPipe
plugin to run Holistic nativelytflite
plugin to allow running other models, like pix2pix
Detect Language for spoken-to-signed language translation should identify the language of the input spoken language text.
If selected, when translating the text to SignWriting, first it should go through the identification model (should be client-side).
Additionally it should indicate the assumed language.
Unfortunately, afaik, the browser doesn't natively support it.
Someone did convert cld3 to webassembly https://github.com/kwonoj/cld3-asm
Material Icons should only load used icons
The Material Icons font is 100kb(!), loads all icons.
Use https://fonttools.readthedocs.io/en/latest/subset/index.html to subset the font to only the used glyphs.
Loading sign is showing but the text to sign translation is not working.
The text would be converted to animation skeleton animation.
No response
paste your flow here
No response
Hands prediction should be awesome
Its fidgety...
Currently, all requests are unauthenticated. We ideally want to prevent unauthorized requests to our storage, database, and APIs
We should use Firebase AppCheck for storage and functions.
Run server locally for inference of the models
Run server locally for hosting angular website
Can't see how to set this up and run it
My main problem is i wanna test the code out, but I cant figure out how to run it! Im mostly a python AI developer and not so much an app developer. I have identified where I need to place my own models, but can get the app running! ;)
Currently, we have no way to promote the app, and every screenshot/video needs to be manually made after every update.
I'd like to generate screenshots, and videos like https://www.youtube.com/watch?v=Y0SNPeTz09w
using Remotion or something similar, so they are created on every update.
Manual labor.
No response
Model viewer should load its library (model-viewer.min.js
) only when the model viewer starts to become visible.
It is a large library (1MB), and mostly redundant.
Bonus: model-viewer
should be tree-shaked rather than loaded as a min.js
The model viewer is loaded on page load.
At the moment, models are included in the build distribution.
This makes the build quite heavy, and native files too large and unappealing.
Make an "artifacts management" system that can download artifacts like necessary models, and store them in the app's documents folder.
When a new artifact is available, it should prompt to download.
When a user tries to use a feature requiring a missing artifact, it should prompt the user to download and wait until finishes.
For the web, and maybe Android, can store .tflite
models on firebase ML
For iOS, can use Core ML Model Deployment https://developer.apple.com/icloud/core-ml/
Hand should have 4 views:
Currently, the system doesn't support "other side"
Left hand versus right hand isn't the same as top versus bottom of the symbol palette. Consider the graphic below as the right hand can rotate. The last hand has the palm pointing to the side away from the body. It is a right hand that uses a symbol from the bottom of the symbol palette. This phenomena occurs in other situations, but we haven't fully mapped out when and why. I first ran into this problem myself when I tried to automatically colorize signs to have blue right hands and red left hands. It didn't work. Other programmers and researchers have run into this issue as well.
(Demonstrating these shapes is the ASL sign for only-child)
While I can't programmatically tell the difference between a left hand and a right hand, I still wanted to be able to color left and right hands. It's possible, but a manual process.
Another example of left versus right. These are all right hands. The left column has the right hand rotate to the side, ending with the finger pointing down. The right column starts the same, points away from the signer and ends with the finger pointing down.
(Reported by Steve Slevinski https://www.facebook.com/groups/SuttonSignWriting/permalink/2879117945741468/?comment_id=2879138809072715)
The pose sequence should be converted using a GAN to a human-looking video.
This has been implemented, but is currently being blocked by: tensorflow/tfjs#5374
Translating from video to text doesn't work.
Hello, I'm happy what I found this site yesterday. The conception and realization great, really insane. but, I can't understand why live translation don't work. I hope you will make some demo video how to use this web app! Appreciate your work! dm me please, my tg @bkbgnbtv
No response
No response
Similar to #11
When SignWriting is generated, it should be translated into spoken language text in the relevant target spoken language.
When opening small .pose
files with the site, it works just fine, however, when trying to open very large files (half an hour of pose), it breaks.
Site should be able to present arbitrarily long .pose
files.
Console error:
core.mjs:6494 ERROR Error: Uncaught (in promise): RangeError: Invalid array length
RangeError: Invalid array length
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Errors/Invalid_array_length
Human pose viewer is quite slow, especially on lower end devices.
This is because it is relying on the CPU, or high level API (WebGL).
WebGPU is not currently supported by default in any major browser.
It is a low level API, a few times faster, that should allow scaling the model from 256x256 to higher resolutions.
Human pose viewer inference on a server can be faster, with multiple high end GPUs, but I would like to have client-side support.
In spoken-to-signed translation, add a "download" button to generate a video from the canvas, and download as a video to one's computer.
MediaRecorder api
Example:
https://stackoverflow.com/questions/19235286/convert-html5-canvas-sequence-to-a-video-file
This can also be used when performing a language-direction swap. It will default to "upload" and have that file playing on the left side.
Every user entering the app sees English
->United States
for translation, and needs to select a language, even if they:
navigator.language
as the initial language and country. he-IL
should be Hebrew
to Israel
spl
to control the spoken languagesil
to control the signed languagetext
to control the spoken language text inputHolistic doesn't support iOS safari - google-ai-edge/mediapipe#1427
Once support exists, re-consider this change, as it has performance issues:
f43dce1#diff-234795a41fd0cddbe6708c29494ba2caf76de81e3fc82c0294394dce2ec968dcR52-R59
When inputting spoken-language text, and selecting a spoken language, the text should be translated into SignWriting of the signs in the relevant target signed-language.
The tab order is not great, I assume some other things as well.
Guide:
https://web.dev/accessible/
Hello Amit,
I would like to know if there is the opportunity to get a list of poses including not only X and Y coordinates but also the Z coordinate.
PS: do you have any info about using the backend for a commercial project?
No response
No response
No response
One should be able to choose who the generated person looks like
The current pix2pix is limited to two characters, without any selection allowed.
end
side menu, called "appearance".In addition to the SkeletonPoseViewer and HumanPoseViewer, support an AvatarPoseViewer.
The current avatar does not move, as the only implementation I wrote does not perform amazing: https://www.youtube.com/watch?v=TyJuU9_GOaw
Since then, the field has progressed a lot.
remain loading not providing the translation of the text.
The skeleton should show the translation of the text.
No response
paste your flow here
No response
Holistic should not block the main thread
It does block the main thread.
I set up a branch that isn't completely working yet:
Should pass image bitmap to fromPixels
in the following way:
https://github.com/tensorflow/tfjs/pull/5752/files#diff-298f5d426c793812872676c441b9a3be2977afa1d204acbdee32b1ee105f3b50R329
This should be tested, and fixed
Add a "Detect Language" to signed-to-spoken language translation, which identifies the signed language based on a sequence of poses.
Currently, we generate screenshots based on the device's screen
https://github.com/sign/translate/blob/master/tools/mobile/metadata/devices.ts#L7
Then, using imagemagick
pad the screenshot to fit the viewport
https://github.com/sign/translate/blob/master/tools/mobile/metadata/metadata.ts#L25-L26
This causes two issues:
We ideally need some way to know the values of the css variables like safe-area-inset-top
for every device, and pad the image from the top, not equally top and bottom.
This might help? https://github.com/fastlane/fastlane/blob/master/frameit/frames_generator/offsets.json
But needed for more devices, and android devices as well.
Currently, the HumanPoseViewer
renders the images in a none-constant speed, caches them, then redraws them in a constant speed and records a video.
Instead, as of Chrome 94 (Launch Sep 21st), we will be able to use WebCodecs - Create a MediaEncoder, and instead of storing the frames in the cache, pushing them to the encoder directly.
When trying to translate unsupported text, the pose returned is a 404, and the pose-viewer can't handle it.
Then, the next translation is also broken, even if it exists
Should not break current / next translations
After text is translated to SignWriting, the SignWriting should be translated to a sequence of poses.
This sequence should be then blobed, and created as a resource
const blob = new Blob(recordedChunks, {type: 'application/pose'});
const url = URL.createObjectURL(blob);
Right now, the 3d avatar, and human realistic avatars have a specific appearance.
We would like to support custom appearances.
This feature will have an interface where the user can take a video of themselves (or a single picture?) and generate a SMPL texture map.
Here's an example of a texture map from MediaPipe face - https://github.com/apple2373/mediapipe-facemesh
We could do something similar for hands and body using the "enable_segmentation" option.
MediPipe also supports hair segmentation, which may also be useful.
This can be done using a specific model, working on a single image, but probably best to compose on a video, prompting the user what we might be missing - hands, other side of face, etc.
Can use DensePose to generate SMPL maps, but that requires sending an image of the user to a server, and I don't think that's very good.
Or perhaps generate the full avatar, not only the texture map - https://github.com/sergeyprokudin/smplpix
but that would even more likely require a server.
If we generate a SMPL/UV texture, it would be cross compatible. If we generate our own, whatever we can, it might be easier to create in the browser.
It will support #33
For easy debugging, as well as taking advantage of features of sign.mt
such as the different pose visualizers, it would be cool to be able to either drag-and-drop a .pose
file on the website, or have some keyboard shortcut to do that.
Loading a .pose
file should change the pose
state, to include the actual Pose
class.
fileAssociations
for default opening https://www.electron.build/configuration/configuration#overridable-per-platform-optionshttps://docs.microsoft.com/en-us/microsoft-edge/progressive-web-apps-chromium/how-to/handle-files
The human pose viewer should be able to generate any face based on an encoding of a single image.
The current human pose viewer is limited to generating two people.
Use something like https://arxiv.org/pdf/2103.06902.pdf
Or like that - https://vcai.mpi-inf.mpg.de/projects/Styleposegan/
Whenever a new person is selected by uploading an image, create and encode a texture map into a vector. This process can be slow, as it happens once per person. (If a person is selected based on the existing catalog, we already have their vectors)
Given the latent vector, generate video based on that vector and a pose sequence
In signed-to-spoken translation, add:
https://sign.mt/ website is currently down. Looks like the API from https://spoken-to-signed-sxie2r74ua-uc.a.run.app is no longer working. Please check
Some small features and fixes to be addressed:
new Intl.DisplayNames(['he'], {type: 'language'}).of('he')
, if that doesn't exist, use what we currently docountries.isl
is a special case, for international signGenerated videos (in this case, webm, but also mp4) have no duration. Maybe that's why they also don't show on iOS safari.
Videos should have a duration. For webm
the fix could be https://github.com/yusitnikov/fix-webm-duration
For mp4
need to figure out a way
https://stackoverflow.com/questions/72693091/mediarecorder-ignoring-videoframe-timestamp
When detecting the input language, and no input language had been detected, this is the behavior:
instead, we need to show a disabled one-way arrow like:
If a language has been detected, and if the user chooses to change direction, the detected language should be selected
It would be nice to include some additional content, hero images, slogans, and features, both for users to see and web search engines to crawl. Something like https://translate.google.com/about/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.