Giter VIP home page Giter VIP logo

translate's People

Contributors

amitmy avatar github-actions[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

translate's Issues

Sign-by-Word Translation Improvements

URL parameters:

  • slang (spoken language - en, fr, he, etc..)
  • dlang (signed language - us, gb, fr, il, etc...)
    • special case - when given cn, use fl
  • sentence
  • fps (optional, when given, interpolate the final pose to this fps)

Translation functionality:

  • allow translation from any slang to any dlang. Search terms by slang, find videos by dlang.
  • translation of a single word should not crash
  • translation of an unknown latin word when dlang=us should be fingerspelled
  • translation of a sentence containing an unknown word when dlang!=us should skip the word.

Bugs

  • Dictionary words are broken in half - cannot becomes "can" + "not" (which is a "can" - "פחית")
  • The first and second signs never interpolate.
    Type "1 2 3" and there is no interpolation between "1" and "2".
123.mp4

Type "testing 1 2 3" and now there is between "1" and "2" but not between "testing" and "1"

testing123.mp4

SignWriting - Correctly display unnormalized signs

Expected Behavior

  1. Signs should not be cut
  2. Punctuation should be visible

Actual Behavior

For the sentence "long word and sequence" for example, we get

M518x544S20500483x545S15a01490x528S30a00482x483 M519x527S10018485x494S1f010490x494S20600481x476 M521x515S2ff00482x482S10612495x457S22f04488x477 S38700463x496 M518x515S20500487x504S15a01491x486 M518x529S10018483x511S1f010490x494S20600481x476 S38800464x496

Which should be displayed like
image

But is actually showing

image

Steps to Reproduce the Problem

  1. spoken to sign translation
  2. translate "long word and sequence"

Native Mobile Applications - Wrap PWA

Expected Behavior

Allow installing apps through the app store.
Models should work fast on mobile as well.

Actual Behavior

  • App is a PWA, installation only via the browser
  • Models (holistic, pix2pix) are slow.

Suggested steps

  • Use capacitor to create native binaries for iOS and Android
  • Write a MediaPipe plugin to run Holistic natively
  • Write a tflite plugin to allow running other models, like pix2pix
  • For iOS, use CoreML models
    • Convert models to MLPackage
    • Perform inference using CoreML

Spoken Language Identification

Detect Language for spoken-to-signed language translation should identify the language of the input spoken language text.
image

If selected, when translating the text to SignWriting, first it should go through the identification model (should be client-side).
Additionally it should indicate the assumed language.

Unfortunately, afaik, the browser doesn't natively support it.

Someone did convert cld3 to webassembly https://github.com/kwonoj/cld3-asm

Content Page - About

Main Hero

  • MP4: signed-to-spoken "what is your name"?
  • Translate between 40+ signed and spoken languages in real-time
  • Android install link, iOS install link (being able to add more links like Windows and MacOS could be nice).

Main Feature - directionality

  • Two videos - Android phone on left with spoken to signed, iPhone on right, with signed-to-spoken. "How are you?"
  • <- Translate from spoken to signed languages
  • Or from signed to spoken languages ->

All Offline!

  • Video of a phone showing Remy, the avatar, in AR, signing on an airplane
  • Sign Translate works everywhere(), even offline on the airplane.
  • *When offline, Sign Translate uses locally stored machine learning models, which might be slower, less accurate, and consume more battery life.

Personalize

  • Control the appearance of the generated video
  • Grid, 3 by 3, head shots of different people.
  • Maayan and Amit not disabled, 6 others(thishumandoesnotexist) disabled, and upload disabled
  • More appearance personalization coming soon (link to GitHub issue)
  • When selecting an image, it shows a video signed by that figure

Download / Share

  • Image of the download / share buttons
  • Share your translation with anyone as a video
  • Download your translation, and use it anywhere, for free(* link to license)

Supports 40+ Languages

  • Maybe a world map? maybe with GeoJSON to either add pins or cover area of countries
  • Carousel of spoken languages
  • Carousel of countries + flags
  • Link to languages page

image

[BUG] <Text to Sign translation is not working translation as skeleton animation >

Current Behavior

Loading sign is showing but the text to sign translation is not working.
image

Expected Behavior

The text would be converted to animation skeleton animation.
image

Steps To Reproduce

No response

Example flow

paste your flow here

Environment

  • Device: (eg. iPhone Xs; laptop)
  • OS: (eg. iOS 13.5; Fedora 32)
  • Browser: (eg. Safari; Firefox)
  • Version: (eg. 13; 73)
  • Other info: (eg. display resolution, ease-of-access settings)

Additional context

No response

Add AppCheck support across all services

Problem

Currently, all requests are unauthenticated. We ideally want to prevent unauthorized requests to our storage, database, and APIs

Description

We should use Firebase AppCheck for storage and functions.

How would i go about running the program?

Expected Behavior

Run server locally for inference of the models
Run server locally for hosting angular website

Actual Behavior

Can't see how to set this up and run it

Steps to Reproduce the Problem

  1. Dont have any sorry as i dont understand how to run the app

Specifications

  • Version: 18.04
  • Platform: Ubuntu

My main problem is i wanna test the code out, but I cant figure out how to run it! Im mostly a python AI developer and not so much an app developer. I have identified where I need to place my own models, but can get the app running! ;)

Publication Materials

Problem

Currently, we have no way to promote the app, and every screenshot/video needs to be manually made after every update.

Description

I'd like to generate screenshots, and videos like https://www.youtube.com/watch?v=Y0SNPeTz09w
using Remotion or something similar, so they are created on every update.

Alternatives

Manual labor.

Additional context

No response

Lazy Load `model-viewer` only when needed

Expected Behavior

Model viewer should load its library (model-viewer.min.js) only when the model viewer starts to become visible.
It is a large library (1MB), and mostly redundant.

Bonus: model-viewer should be tree-shaked rather than loaded as a min.js

Actual Behavior

The model viewer is loaded on page load.

[Feature] Deliver Models and Large Files as Artifacts

Problem

At the moment, models are included in the build distribution.
This makes the build quite heavy, and native files too large and unappealing.

Description

Make an "artifacts management" system that can download artifacts like necessary models, and store them in the app's documents folder.

When a new artifact is available, it should prompt to download.

When a user tries to use a feature requiring a missing artifact, it should prompt the user to download and wait until finishes.


For the web, and maybe Android, can store .tflite models on firebase ML

For iOS, can use Core ML Model Deployment https://developer.apple.com/icloud/core-ml/

SignWriting - Left hand vs Right hand edge cases

TL;DR

Hand should have 4 views:

  • front
  • side
  • back
  • other side

Currently, the system doesn't support "other side"

Expected Behavior

Left hand versus right hand isn't the same as top versus bottom of the symbol palette. Consider the graphic below as the right hand can rotate. The last hand has the palm pointing to the side away from the body. It is a right hand that uses a symbol from the bottom of the symbol palette. This phenomena occurs in other situations, but we haven't fully mapped out when and why. I first ran into this problem myself when I tried to automatically colorize signs to have blue right hands and red left hands. It didn't work. Other programmers and researchers have run into this issue as well.

image

(Demonstrating these shapes is the ASL sign for only-child)

While I can't programmatically tell the difference between a left hand and a right hand, I still wanted to be able to color left and right hands. It's possible, but a manual process.

image

Another example of left versus right. These are all right hands. The left column has the right hand rotate to the side, ending with the finger pointing down. The right column starts the same, points away from the signer and ends with the finger pointing down.

image

(Reported by Steve Slevinski https://www.facebook.com/groups/SuttonSignWriting/permalink/2879117945741468/?comment_id=2879138809072715)

Doesn't work!

Problem

Translating from video to text doesn't work.

Description

Hello, I'm happy what I found this site yesterday. The conception and realization great, really insane. but, I can't understand why live translation don't work. I hope you will make some demo video how to use this web app! Appreciate your work! dm me please, my tg @bkbgnbtv

Alternatives

No response

Additional context

No response

Can't open HUGE `.pose` files

Current Behavior

When opening small .pose files with the site, it works just fine, however, when trying to open very large files (half an hour of pose), it breaks.

Expected Behavior

Site should be able to present arbitrarily long .pose files.

Steps To Reproduce

  1. Generate a HUGE pose file (100,000 frames, 500 keypoints, 3 dimensions)
  2. Drag and drop on sign.mt

Environment

  • OS: macOS
  • Browser: Chrome 102

Additional context

Console error:

core.mjs:6494 ERROR Error: Uncaught (in promise): RangeError: Invalid array length
RangeError: Invalid array length

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Errors/Invalid_array_length

Human Pose Viewer - Support WebGPU For Faster Inference

Problem

Human pose viewer is quite slow, especially on lower end devices.
This is because it is relying on the CPU, or high level API (WebGL).

Description

WebGPU is not currently supported by default in any major browser.
It is a low level API, a few times faster, that should allow scaling the model from 256x256 to higher resolutions.

Alternatives

Human pose viewer inference on a server can be faster, with multiple high end GPUs, but I would like to have client-side support.

Additional context

https://caniuse.com/webgpu

UI - Multi Language Support - Contributions Welcomed!

Ideally, the UI should have multi-language support.

To add a language:

  1. Identify the language code. For example, Spanish is es.
  2. Duplicate en.json in i18n to a file es.json, and translate all entries in the file.
  3. Go here and add the language code to the array.
  4. Create a Pull Request :)

[Feature] User preference for initial translation language pair

Problem

Every user entering the app sees English->United States for translation, and needs to select a language, even if they:

  • Are not using an English language browser
  • Have previously modified the translation language

Description

  1. We should use navigator.language as the initial language and country. he-IL should be Hebrew to Israel
  2. We should store the last selected language/country pair, and use them initially if we do have them.
  3. We should modify the current browser URL after a selection, to reflect the selected language and country. This will make the translation more shareable.

Spoken Language to SignWriting Translation

When inputting spoken-language text, and selecting a spoken language, the text should be translated into SignWriting of the signs in the relevant target signed-language.

[Feature] 3D Pose

Problem

Hello Amit,

I would like to know if there is the opportunity to get a list of poses including not only X and Y coordinates but also the Z coordinate.

PS: do you have any info about using the backend for a commercial project?

Description

No response

Alternatives

No response

Additional context

No response

Human Pose Viewer - Allow appearance selection

Expected Behavior

One should be able to choose who the generated person looks like

Actual Behavior

The current pix2pix is limited to two characters, without any selection allowed.

Ideas

  • Include an end side menu, called "appearance".
  • Show a grid of existing characters (Maayan, Amit)
  • Show more characters, disabled (like https://www.synthesia.io/features/avatars)
  • Final item of the grid should be a customize icon, opening a menu
    • (upload icon) Upload a picture
    • (camera icon) Take a picture
    • (search) Search - allows a textual search of faces from StyleGAN outputs.

[BUG] <Sekelton translation is not working>

Current Behavior

remain loading not providing the translation of the text.

Expected Behavior

The skeleton should show the translation of the text.

Steps To Reproduce

No response

Example flow

paste your flow here

Environment

  • Device: (eg. iPhone Xs; laptop)
  • OS: (eg. iOS 13.5; Fedora 32)
  • Browser: (eg. Safari; Firefox)
  • Version: (eg. 13; 73)
  • Other info: (eg. display resolution, ease-of-access settings)

Additional context

No response

Sign Language Identification

Add a "Detect Language" to signed-to-spoken language translation, which identifies the signed language based on a sequence of poses.

Fastlane - screenshots do not fit correctly in "frameit" viewport

Current Behavior

Currently, we generate screenshots based on the device's screen
https://github.com/sign/translate/blob/master/tools/mobile/metadata/devices.ts#L7

Then, using imagemagick pad the screenshot to fit the viewport
https://github.com/sign/translate/blob/master/tools/mobile/metadata/metadata.ts#L25-L26

This causes two issues:

  1. On some devices, the viewport and screen are equal, and so we fill the entire screen, not allocating space for the statusbar (example)
  2. On other devices, we may pad too much, making the app centered, but quite below the "notch" (example)

Expected Behavior

We ideally need some way to know the values of the css variables like safe-area-inset-top for every device, and pad the image from the top, not equally top and bottom.

This might help? https://github.com/fastlane/fastlane/blob/master/frameit/frames_generator/offsets.json
But needed for more devices, and android devices as well.

Use WebCodecs to Capture Human Video Faster

Currently, the HumanPoseViewer renders the images in a none-constant speed, caches them, then redraws them in a constant speed and records a video.

Instead, as of Chrome 94 (Launch Sep 21st), we will be able to use WebCodecs - Create a MediaEncoder, and instead of storing the frames in the cache, pushing them to the encoder directly.

  • This will save on time until the "download" option is ready
  • This will show a more consistent video output (controlling the MBps)
  • This will reduce memory, as we don't need to store the raw pixels grid. Especially important for when using larger GANs in the future

SignWriting to Pose Sequence Translation

After text is translated to SignWriting, the SignWriting should be translated to a sequence of poses.
This sequence should be then blobed, and created as a resource

const blob = new Blob(recordedChunks, {type: 'application/pose'});
const url = URL.createObjectURL(blob);

[Feature] SMPL Texture Map Generator

Problem

Right now, the 3d avatar, and human realistic avatars have a specific appearance.
We would like to support custom appearances.

Description

This feature will have an interface where the user can take a video of themselves (or a single picture?) and generate a SMPL texture map.

image

Here's an example of a texture map from MediaPipe face - https://github.com/apple2373/mediapipe-facemesh
We could do something similar for hands and body using the "enable_segmentation" option.

MediPipe also supports hair segmentation, which may also be useful.

image

This can be done using a specific model, working on a single image, but probably best to compose on a video, prompting the user what we might be missing - hands, other side of face, etc.

Alternatives

Can use DensePose to generate SMPL maps, but that requires sending an image of the user to a server, and I don't think that's very good.

Or perhaps generate the full avatar, not only the texture map - https://github.com/sergeyprokudin/smplpix
but that would even more likely require a server.

If we generate a SMPL/UV texture, it would be cross compatible. If we generate our own, whatever we can, it might be easier to create in the browser.

Additional context

It will support #33

Open `.pose` files in sign.mt

For easy debugging, as well as taking advantage of features of sign.mt such as the different pose visualizers, it would be cool to be able to either drag-and-drop a .pose file on the website, or have some keyboard shortcut to do that.

Loading a .pose file should change the pose state, to include the actual Pose class.

https://docs.microsoft.com/en-us/microsoft-edge/progressive-web-apps-chromium/how-to/handle-files

Human Pose Viewer - Generate any person based on an image

Expected Behavior

The human pose viewer should be able to generate any face based on an encoding of a single image.

Actual Behavior

The current human pose viewer is limited to generating two people.

Ideas

Use something like https://arxiv.org/pdf/2103.06902.pdf
image

Or like that - https://vcai.mpi-inf.mpg.de/projects/Styleposegan/

Whenever a new person is selected by uploading an image, create and encode a texture map into a vector. This process can be slow, as it happens once per person. (If a person is selected based on the existing catalog, we already have their vectors)

Given the latent vector, generate video based on that vector and a pose sequence

Some small features

Some small features and fixes to be addressed:

  • For languages, we can use the browser to get the name by the current new Intl.DisplayNames(['he'], {type: 'language'}).of('he'), if that doesn't exist, use what we currently do
  • Pose viewer selector "fab"s:
    • are not well aligned
    • their shadow is cut
      Screen Shot 2022-07-15 at 7 43 40
  • On small screens, clicking a language/country feels bad. the menu is off. should maybe be attached to click origin? maybe have a different type of menu here? (full-width one? like google translate?)
    image
    image
  • countries.isl is a special case, for international sign

image

[BUG] `Detect Language` should not allow switching translation direction

Current Behavior

When detecting the input language, and no input language had been detected, this is the behavior:
image

Expected Behavior

instead, we need to show a disabled one-way arrow like:
image

If a language has been detected, and if the user chooses to change direction, the detected language should be selected

Add Content Pages

It would be nice to include some additional content, hero images, slogans, and features, both for users to see and web search engines to crawl. Something like https://translate.google.com/about/

  • About page - Links to mobile app install. slogans, and features #34

  • Languages - What features and pairing do we offer each language (and maybe dialects?)
    • e.g. English, French, German, etc can pair with multiple signed languages
    • Features like SignWriting dictionary (and size), video dictionary (STS, and size), text input, speech input (optional) for some spoken languages, live camera transcription, take videos for signed languages. offline deep learning models. language detection

  • Contribute - direct people to the sign-writing annotator. Make it into a general sign language collection tool

  • Tools - direct people to this repository. Make it clear that it is all free, everything can be used. Expose some translation APIs for server-side usage.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.