sign / translate Goto Github PK

Effortless Real-Time Sign Language Translation

License: Other

JavaScript 1.69% TypeScript 80.45% HTML 10.54% SCSS 3.63% CSS 0.15% Shell 0.11% Java 0.37% Swift 0.38% Ruby 1.89% TeX 0.78%

sign-language sign-language-recognition sign-language-translation web ios android pwa

translate's People

Contributors

Stargazers

Watchers

translate's Issues

Lighthouse - Optimize Performance, Accessibility, Best Practices, and SEO

It would be interesting to add this GitHub action, to automatically generate reports.
https://github.com/MyActionWay/lighthouse-badger-action#lighthouse-badger-easyyml
However, this requires an overhaul of the current actions into a workflow, as it is too much.

Initial Lighthouse:

Sign-by-Word Translation Improvements

URL parameters:

slang (spoken language - en, fr, he, etc..)
dlang (signed language - us, gb, fr, il, etc...)
- special case - when given cn, use fl
sentence
fps (optional, when given, interpolate the final pose to this fps)

Translation functionality:

allow translation from any slang to any dlang. Search terms by slang, find videos by dlang.
translation of a single word should not crash
translation of an unknown latin word when dlang=us should be fingerspelled
translation of a sentence containing an unknown word when dlang!=us should skip the word.

Bugs

Dictionary words are broken in half - cannot becomes "can" + "not" (which is a "can" - "פחית")
The first and second signs never interpolate.
Type "1 2 3" and there is no interpolation between "1" and "2".

123.mp4

Type "testing 1 2 3" and now there is between "1" and "2" but not between "testing" and "1"

testing123.mp4

SignWriting - Correctly display unnormalized signs

Expected Behavior

Signs should not be cut
Punctuation should be visible

Actual Behavior

For the sentence "long word and sequence" for example, we get

M518x544S20500483x545S15a01490x528S30a00482x483 M519x527S10018485x494S1f010490x494S20600481x476 M521x515S2ff00482x482S10612495x457S22f04488x477 S38700463x496 M518x515S20500487x504S15a01491x486 M518x529S10018483x511S1f010490x494S20600481x476 S38800464x496

Which should be displayed like

But is actually showing

Steps to Reproduce the Problem

spoken to sign translation
translate "long word and sequence"

Human Pose Viewer - Move pix2pix model inference to Web Worker

Expected Behavior

Human pose viewer should not be blocking the main thread.

Actual Behavior

It is, making the UI chuggy on mobile....

Suggested fix

Use a WebWorker with OffscreenCanvas. This does allow tfjs to still use the webgl backend.

Native Mobile Applications - Wrap PWA

Expected Behavior

Allow installing apps through the app store.
Models should work fast on mobile as well.

Actual Behavior

App is a PWA, installation only via the browser
Models (holistic, pix2pix) are slow.

Suggested steps

Use capacitor to create native binaries for iOS and Android
Write a MediaPipe plugin to run Holistic natively
Write a tflite plugin to allow running other models, like pix2pix
For iOS, use CoreML models
- Convert models to MLPackage
- Perform inference using CoreML

Spoken Language Identification

Detect Language for spoken-to-signed language translation should identify the language of the input spoken language text.

If selected, when translating the text to SignWriting, first it should go through the identification model (should be client-side).
Additionally it should indicate the assumed language.

Unfortunately, afaik, the browser doesn't natively support it.

Someone did convert cld3 to webassembly https://github.com/kwonoj/cld3-asm

Content Page - About

Main Hero

MP4: signed-to-spoken "what is your name"?
Translate between 40+ signed and spoken languages in real-time
Android install link, iOS install link (being able to add more links like Windows and MacOS could be nice).

Main Feature - directionality

Two videos - Android phone on left with spoken to signed, iPhone on right, with signed-to-spoken. "How are you?"
<- Translate from spoken to signed languages
Or from signed to spoken languages ->

All Offline!

Video of a phone showing Remy, the avatar, in AR, signing on an airplane
Sign Translate works everywhere(), even offline on the airplane.
*When offline, Sign Translate uses locally stored machine learning models, which might be slower, less accurate, and consume more battery life.

Personalize

Control the appearance of the generated video
Grid, 3 by 3, head shots of different people.
Maayan and Amit not disabled, 6 others(thishumandoesnotexist) disabled, and upload disabled
More appearance personalization coming soon (link to GitHub issue)
When selecting an image, it shows a video signed by that figure

Download / Share

Image of the download / share buttons
Share your translation with anyone as a video
Download your translation, and use it anywhere, for free(* link to license)

Supports 40+ Languages

Maybe a world map? maybe with GeoJSON to either add pins or cover area of countries
Carousel of spoken languages
Carousel of countries + flags
Link to languages page

Material Icons Font - Subset Glyphs

Expected Behavior

Material Icons should only load used icons

Actual Behavior

The Material Icons font is 100kb(!), loads all icons.

Steps to fix?

Use https://fonttools.readthedocs.io/en/latest/subset/index.html to subset the font to only the used glyphs.

[BUG] <Text to Sign translation is not working translation as skeleton animation >

Current Behavior

Loading sign is showing but the text to sign translation is not working.

Expected Behavior

The text would be converted to animation skeleton animation.

Steps To Reproduce

No response

Example flow

paste your flow here

Environment

Device: (eg. iPhone Xs; laptop)
OS: (eg. iOS 13.5; Fedora 32)
Browser: (eg. Safari; Firefox)
Version: (eg. 13; 73)
Other info: (eg. display resolution, ease-of-access settings)

Additional context

No response

SignWriting - Average Hand Classification from Multiple Frames

Expected Behavior

Hands prediction should be awesome

Actual Behavior

Its fidgety...

Ideas to fix

Get majority vote from past few frames to make smoother
add exponential dropoff for past information

Add AppCheck support across all services

Problem

Currently, all requests are unauthenticated. We ideally want to prevent unauthorized requests to our storage, database, and APIs

Description

We should use Firebase AppCheck for storage and functions.

How would i go about running the program?

Expected Behavior

Run server locally for inference of the models
Run server locally for hosting angular website

Actual Behavior

Can't see how to set this up and run it

Steps to Reproduce the Problem

Dont have any sorry as i dont understand how to run the app

Specifications

Version: 18.04
Platform: Ubuntu

My main problem is i wanna test the code out, but I cant figure out how to run it! Im mostly a python AI developer and not so much an app developer. I have identified where I need to place my own models, but can get the app running! ;)

Publication Materials

Problem

Currently, we have no way to promote the app, and every screenshot/video needs to be manually made after every update.

Description

I'd like to generate screenshots, and videos like https://www.youtube.com/watch?v=Y0SNPeTz09w
using Remotion or something similar, so they are created on every update.

Alternatives

Manual labor.

Additional context

No response

Lazy Load `model-viewer` only when needed

Expected Behavior

Model viewer should load its library (model-viewer.min.js) only when the model viewer starts to become visible.
It is a large library (1MB), and mostly redundant.

Bonus: model-viewer should be tree-shaked rather than loaded as a min.js

Actual Behavior

The model viewer is loaded on page load.

[Feature] Deliver Models and Large Files as Artifacts

Problem

At the moment, models are included in the build distribution.
This makes the build quite heavy, and native files too large and unappealing.

Description

Make an "artifacts management" system that can download artifacts like necessary models, and store them in the app's documents folder.

When a new artifact is available, it should prompt to download.

When a user tries to use a feature requiring a missing artifact, it should prompt the user to download and wait until finishes.

For the web, and maybe Android, can store .tflite models on firebase ML

For iOS, can use Core ML Model Deployment https://developer.apple.com/icloud/core-ml/

SignWriting - Left hand vs Right hand edge cases

TL;DR

Hand should have 4 views:

front
side
back
other side

Currently, the system doesn't support "other side"

Expected Behavior

Left hand versus right hand isn't the same as top versus bottom of the symbol palette. Consider the graphic below as the right hand can rotate. The last hand has the palm pointing to the side away from the body. It is a right hand that uses a symbol from the bottom of the symbol palette. This phenomena occurs in other situations, but we haven't fully mapped out when and why. I first ran into this problem myself when I tried to automatically colorize signs to have blue right hands and red left hands. It didn't work. Other programmers and researchers have run into this issue as well.

(Demonstrating these shapes is the ASL sign for only-child)

While I can't programmatically tell the difference between a left hand and a right hand, I still wanted to be able to color left and right hands. It's possible, but a manual process.

Another example of left versus right. These are all right hands. The left column has the right hand rotate to the side, ending with the finger pointing down. The right column starts the same, points away from the signer and ends with the finger pointing down.

(Reported by Steve Slevinski https://www.facebook.com/groups/SuttonSignWriting/permalink/2879117945741468/?comment_id=2879138809072715)

Pose Viewer Stickman to Human Video

The pose sequence should be converted using a GAN to a human-looking video.

This has been implemented, but is currently being blocked by: tensorflow/tfjs#5374

Doesn't work!

Problem

Translating from video to text doesn't work.

Description

Hello, I'm happy what I found this site yesterday. The conception and realization great, really insane. but, I can't understand why live translation don't work. I hope you will make some demo video how to use this web app! Appreciate your work! dm me please, my tg @bkbgnbtv

Alternatives

No response

Additional context

No response

SignWriting to Spoken Language Translation

Similar to #11

When SignWriting is generated, it should be translated into spoken language text in the relevant target spoken language.

Video - Feed in a File Instead of Webcam

Expected Behavior

be able to load a video from disk

Can't open HUGE `.pose` files

Current Behavior

When opening small .pose files with the site, it works just fine, however, when trying to open very large files (half an hour of pose), it breaks.

Expected Behavior

Site should be able to present arbitrarily long .pose files.

Steps To Reproduce

Generate a HUGE pose file (100,000 frames, 500 keypoints, 3 dimensions)
Drag and drop on sign.mt

Environment

OS: macOS
Browser: Chrome 102

Additional context

Console error:

core.mjs:6494 ERROR Error: Uncaught (in promise): RangeError: Invalid array length
RangeError: Invalid array length

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Errors/Invalid_array_length

Human Pose Viewer - Support WebGPU For Faster Inference

Problem

Human pose viewer is quite slow, especially on lower end devices.
This is because it is relying on the CPU, or high level API (WebGL).

Description

WebGPU is not currently supported by default in any major browser.
It is a low level API, a few times faster, that should allow scaling the model from 256x256 to higher resolutions.

Alternatives

Human pose viewer inference on a server can be faster, with multiple high end GPUs, but I would like to have client-side support.

Additional context

https://caniuse.com/webgpu

Copy/Share Generated Pose Video

In spoken-to-signed translation, add a "download" button to generate a video from the canvas, and download as a video to one's computer.

MediaRecorder api
Example:
https://stackoverflow.com/questions/19235286/convert-html5-canvas-sequence-to-a-video-file

This can also be used when performing a language-direction swap. It will default to "upload" and have that file playing on the left side.

UI - Multi Language Support - Contributions Welcomed!

Ideally, the UI should have multi-language support.

To add a language:

Identify the language code. For example, Spanish is es.
Duplicate en.json in i18n to a file es.json, and translate all entries in the file.
Go here and add the language code to the array.
Create a Pull Request :)

[Feature] User preference for initial translation language pair

Problem

Every user entering the app sees English->United States for translation, and needs to select a language, even if they:

Are not using an English language browser
Have previously modified the translation language

Description

We should use navigator.language as the initial language and country. he-IL should be Hebrew to Israel
We should store the last selected language/country pair, and use them initially if we do have them.
We should modify the current browser URL after a selection, to reflect the selected language and country. This will make the translation more shareable.

Usability - Query Parameters Control

spl to control the spoken language
sil to control the signed language
text to control the spoken language text input

Pose Estimation - Missing Mobile Support

Holistic doesn't support iOS safari - google-ai-edge/mediapipe#1427

Once support exists, re-consider this change, as it has performance issues:
f43dce1#diff-234795a41fd0cddbe6708c29494ba2caf76de81e3fc82c0294394dce2ec968dcR52-R59

Spoken Language to SignWriting Translation

When inputting spoken-language text, and selecting a spoken language, the text should be translated into SignWriting of the signs in the relevant target signed-language.

Improve Accessibility

The tab order is not great, I assume some other things as well.

Guide:
https://web.dev/accessible/

[Feature] 3D Pose

Problem

Hello Amit,

I would like to know if there is the opportunity to get a list of poses including not only X and Y coordinates but also the Z coordinate.

PS: do you have any info about using the backend for a commercial project?

Description

No response

Alternatives

No response

Additional context

No response

Human Pose Viewer - Allow appearance selection

Expected Behavior

One should be able to choose who the generated person looks like

Actual Behavior

The current pix2pix is limited to two characters, without any selection allowed.

Ideas

Include an end side menu, called "appearance".
Show a grid of existing characters (Maayan, Amit)
Show more characters, disabled (like https://www.synthesia.io/features/avatars)
Final item of the grid should be a customize icon, opening a menu
- (upload icon) Upload a picture
- (camera icon) Take a picture
- (search) Search - allows a textual search of faces from StyleGAN outputs.

Pose-Viewer - add Avatar Pose Viewer

In addition to the SkeletonPoseViewer and HumanPoseViewer, support an AvatarPoseViewer.

The current avatar does not move, as the only implementation I wrote does not perform amazing: https://www.youtube.com/watch?v=TyJuU9_GOaw
Since then, the field has progressed a lot.

[BUG] <Sekelton translation is not working>

Current Behavior

remain loading not providing the translation of the text.

Expected Behavior

The skeleton should show the translation of the text.

Steps To Reproduce

No response

Example flow

paste your flow here

Environment

Device: (eg. iPhone Xs; laptop)
OS: (eg. iOS 13.5; Fedora 32)
Browser: (eg. Safari; Firefox)
Version: (eg. 13; 73)
Other info: (eg. display resolution, ease-of-access settings)

Additional context

No response

Use Worker for Mediapipe Holistic

Expected Behavior

Holistic should not block the main thread

Actual Behavior

It does block the main thread.

Ways to fix:

I set up a branch that isn't completely working yet:

Should pass image bitmap to fromPixels in the following way:
https://github.com/tensorflow/tfjs/pull/5752/files#diff-298f5d426c793812872676c441b9a3be2977afa1d204acbdee32b1ee105f3b50R329

[BUG] "Change Language" in dark mode is unreadable

Current Behavior

Expected Behavior

This should be tested, and fixed

Steps To Reproduce

https://sign.mt/legal/licenses

Sign Language Identification

Add a "Detect Language" to signed-to-spoken language translation, which identifies the signed language based on a sequence of poses.

Fastlane - screenshots do not fit correctly in "frameit" viewport

Current Behavior

Currently, we generate screenshots based on the device's screen
https://github.com/sign/translate/blob/master/tools/mobile/metadata/devices.ts#L7

Then, using imagemagick pad the screenshot to fit the viewport
https://github.com/sign/translate/blob/master/tools/mobile/metadata/metadata.ts#L25-L26

This causes two issues:

On some devices, the viewport and screen are equal, and so we fill the entire screen, not allocating space for the statusbar (example)
On other devices, we may pad too much, making the app centered, but quite below the "notch" (example)

Expected Behavior

We ideally need some way to know the values of the css variables like safe-area-inset-top for every device, and pad the image from the top, not equally top and bottom.

This might help? https://github.com/fastlane/fastlane/blob/master/frameit/frames_generator/offsets.json
But needed for more devices, and android devices as well.

Spoken Language Translation - Microphone Input

Similar to Google Translate, it would be nice to have a microphone input

Using the browser's internal speech-to-text, in the spoken language selected.

Use WebCodecs to Capture Human Video Faster

Currently, the HumanPoseViewer renders the images in a none-constant speed, caches them, then redraws them in a constant speed and records a video.

Instead, as of Chrome 94 (Launch Sep 21st), we will be able to use WebCodecs - Create a MediaEncoder, and instead of storing the frames in the cache, pushing them to the encoder directly.

This will save on time until the "download" option is ready
This will show a more consistent video output (controlling the MBps)
This will reduce memory, as we don't need to store the raw pixels grid. Especially important for when using larger GANs in the future

[BUG] Pose Viewer - Broken data when pose is not supported

Current Behavior

When trying to translate unsupported text, the pose returned is a 404, and the pose-viewer can't handle it.

Then, the next translation is also broken, even if it exists

Expected Behavior

Should not break current / next translations

Steps To Reproduce

https://sign.mt/?spl=en&sil=gb&text=t

SignWriting to Pose Sequence Translation

After text is translated to SignWriting, the SignWriting should be translated to a sequence of poses.
This sequence should be then blobed, and created as a resource

const blob = new Blob(recordedChunks, {type: 'application/pose'});
const url = URL.createObjectURL(blob);

[Feature] SMPL Texture Map Generator

Problem

Right now, the 3d avatar, and human realistic avatars have a specific appearance.
We would like to support custom appearances.

Description

This feature will have an interface where the user can take a video of themselves (or a single picture?) and generate a SMPL texture map.

Here's an example of a texture map from MediaPipe face - https://github.com/apple2373/mediapipe-facemesh
We could do something similar for hands and body using the "enable_segmentation" option.

MediPipe also supports hair segmentation, which may also be useful.

This can be done using a specific model, working on a single image, but probably best to compose on a video, prompting the user what we might be missing - hands, other side of face, etc.

Alternatives

Can use DensePose to generate SMPL maps, but that requires sending an image of the user to a server, and I don't think that's very good.

Or perhaps generate the full avatar, not only the texture map - https://github.com/sergeyprokudin/smplpix
but that would even more likely require a server.

If we generate a SMPL/UV texture, it would be cross compatible. If we generate our own, whatever we can, it might be easier to create in the browser.

Additional context

It will support #33

Open `.pose` files in sign.mt

For easy debugging, as well as taking advantage of features of sign.mt such as the different pose visualizers, it would be cool to be able to either drag-and-drop a .pose file on the website, or have some keyboard shortcut to do that.

Loading a .pose file should change the pose state, to include the actual Pose class.

Drag and drop loading
Electron app fileAssociations for default opening https://www.electron.build/configuration/configuration#overridable-per-platform-options

https://docs.microsoft.com/en-us/microsoft-edge/progressive-web-apps-chromium/how-to/handle-files

Human Pose Viewer - Generate any person based on an image

Expected Behavior

The human pose viewer should be able to generate any face based on an encoding of a single image.

Actual Behavior

The current human pose viewer is limited to generating two people.

Ideas

Use something like https://arxiv.org/pdf/2103.06902.pdf

Or like that - https://vcai.mpi-inf.mpg.de/projects/Styleposegan/

Whenever a new person is selected by uploading an image, create and encode a texture map into a vector. This process can be slow, as it happens once per person. (If a person is selected based on the existing catalog, we already have their vectors)

Given the latent vector, generate video based on that vector and a pose sequence

Copy / Share Spoken Language Translation

In signed-to-spoken translation, add:

copy to copy the output text
share to share the text AND link to the current translation

https://capacitorjs.com/docs/apis/clipboard

[BUG] https://sign.mt/ website is currently down

https://sign.mt/ website is currently down. Looks like the API from https://spoken-to-signed-sxie2r74ua-uc.a.run.app is no longer working. Please check

Some small features

Some small features and fixes to be addressed:

For languages, we can use the browser to get the name by the current new Intl.DisplayNames(['he'], {type: 'language'}).of('he'), if that doesn't exist, use what we currently do
Pose viewer selector "fab"s:
- are not well aligned
- their shadow is cut
On small screens, clicking a language/country feels bad. the menu is off. should maybe be attached to click origin? maybe have a different type of menu here? (full-width one? like google translate?)
countries.isl is a special case, for international sign

[BUG] Generated videos have no duration

Current Behavior

Generated videos (in this case, webm, but also mp4) have no duration. Maybe that's why they also don't show on iOS safari.

no_dropout.mp4

Expected Behavior

Videos should have a duration. For webm the fix could be https://github.com/yusitnikov/fix-webm-duration
For mp4 need to figure out a way

Additional Information

https://stackoverflow.com/questions/72693091/mediarecorder-ignoring-videoframe-timestamp

[BUG] `Detect Language` should not allow switching translation direction

Current Behavior

When detecting the input language, and no input language had been detected, this is the behavior:

Expected Behavior

instead, we need to show a disabled one-way arrow like:

If a language has been detected, and if the user chooses to change direction, the detected language should be selected

Add Content Pages

It would be nice to include some additional content, hero images, slogans, and features, both for users to see and web search engines to crawl. Something like https://translate.google.com/about/

About page - Links to mobile app install. slogans, and features #34

Languages - What features and pairing do we offer each language (and maybe dialects?)
- e.g. English, French, German, etc can pair with multiple signed languages
- Features like SignWriting dictionary (and size), video dictionary (STS, and size), text input, speech input (optional) for some spoken languages, live camera transcription, take videos for signed languages. offline deep learning models. language detection

Contribute - direct people to the sign-writing annotator. Make it into a general sign language collection tool

Tools - direct people to this repository. Make it clear that it is all free, everything can be used. Expose some translation APIs for server-side usage.

sign / translate Goto Github PK

translate's People

Contributors

Stargazers

Watchers

Forkers

translate's Issues

URL parameters:

Translation functionality:

Bugs

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Expected Behavior

Actual Behavior

Suggested fix

Expected Behavior

Actual Behavior

Suggested steps

Main Hero

Main Feature - directionality

All Offline!

Personalize

Download / Share

Supports 40+ Languages

Expected Behavior

Actual Behavior

Steps to fix?

Current Behavior

Expected Behavior

Steps To Reproduce

Example flow

Environment

Additional context

Expected Behavior

Actual Behavior

Ideas to fix

Problem

Description

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Problem

Description

Alternatives

Additional context

Expected Behavior

Actual Behavior

Problem

Description

TL;DR

Expected Behavior

Problem

Description

Alternatives

Additional context

Expected Behavior

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Additional context

Problem

Description

Alternatives

Additional context

Problem

Description

Problem

Description

Alternatives

Additional context

Expected Behavior

Actual Behavior

Ideas

Current Behavior

Expected Behavior

Steps To Reproduce

Example flow