Giter VIP home page Giter VIP logo

gecko's Introduction

Gecko - A Tool for Effective Annotation of Human Conversations

Comparison

Gecko allows efficient and effective segmentation of the voice signal by speaker as well as annotation of the linguistic content of the conversation. A key feature of Gecko is the presentation of the output of automatic segmentation and transcription systems in an intuitive user interface for editing. Gecko allows annotation of Voice Activity Detection (VAD), Diarization, Speaker Identification and ASR outputs on a large scale, and leads to faster and more accurate annotated datasets.

We introduced Gecko in this Medium post.
For an overview of the main features, see this video and the corresponding paper.
You can also play with the online working platform.

Features

  • Supports the annotating process of different stages of a conversation: voice detection, diarization, identification and transcription.
  • Provides an efficient and convenient tool for annotating audio files.
  • Visualize the annotation of several different sources at once.
  • Refine existing annotation files
  • Compare different annotating files to find discrepancies between different systems or annotators.
  • No server side is needed - easy installation.
  • Supports different formats such as RTTM, CTM, JSON, CSV.
  • Increased productivity using keyboard shortcuts

Poster

Technological Stack

Gecko is written in Javascript and is based on Angular.js V1.X. The audio player uses the popular wavesurfer.js library.

Deployment and Installation

See this page.

Publications

Gecko was presented in Interspeech 2019, the world's leading Speech Technology conference. See this video for an overview and the accepted paper.

Citation

If you use Gecko please use the following

    @inproceedings{Gecko2019,
      Author = {Golan Levy, Raquel Sitman, Ido Amir, Eduard Golshtein, Ran Mochary, Eilon Reshef, Reichart, Omri Allouche},
      Title = {GECKO - A Tool for Effective Annotation of Human Conversations},
      Booktitle = {20th Annual Conference of the International Speech Communication Association, Interspeech 2019},
      Year = {2019},
      Month = {September},
      Address = {Herzliya, Israel},
      Url = {https://github.com/gong-io/gecko/blob/master/docs/gecko_interspeech_2019_paper.pdf}
    }

Contribution

See this page.

Contact

For help and feedback, please feel free to contact the team at Gong.io.

gecko's People

Contributors

actions-user avatar amirgalor avatar dependabot[bot] avatar golanlevy avatar jimnycricket avatar judyfong avatar omriallouche avatar rotem-1996 avatar sosuke-k avatar strelok2012 avatar xbraininc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gecko's Issues

Fix: video player

Hi,

I already opened issue #60 which is related but I think it's better to have a clean issue here (I'll close #60).
I noticed several bugs :

  • Clicking on the waveform is ineffective
  • It gets all white like this :
    image
  • Sometime the regions doesn't display at all, although this happens only on one of my two machines (both have ubuntu 16.04 and run gecko in firefox) :
    • doesn't happen on my laptop witch has :
      • npm 6.12.0
      • node v12.13.0
    • happens on my work computer which has:
      • npm 6.11.3
      • node v12.11.1
  • Can't jump regions in video mode, I fixed it here : https://github.com/PaulLerner/gecko/tree/fix/video (Should I PR ?).
  • Can't playregion in video mode, I wasn't able to solve it with the same trick as my fix/video. The player gets stuck in the same frame and audio and video get out of sync

Have a good week-end,

AWS integration

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04.6 LTS
  • Node version: node v10.19.0

Describe the problem

I'm trying to connect Gecko to AWS services, as defined in the docker-compose file.
I filled a .env file with these information:

GECKO_SERVER_PORT=4000
GECKO_SERVER_CONTAINER_PORT=4000
AWS_BUCKET=gecko-annotation
AWS_REGION=eu-west-3
AWS_ACCESS_KEY_ID=<<<access_key_id>>>
AWS_SECRET_ACCESS_KEY=<<<secret_key>>>
#AWS_COGNITO_POOL=<<<pool_id>>>
AWS_FOLDER=audio
GECKO_APP_HOST=localhost

After build with npm run build and running as a server npm run server, I don't see any interactions with S3 storage. It seems that it only support client mode.

I tried to setup an aws cognito pool, but I didn't succeed to get it work since the aws fails with this error:

/home/myuser/gecko/node_modules/aws-sdk/lib/request.js:31
            throw err;
            ^

CredentialsError: Missing credentials in config
    at Request.ENOTFOUND_ERROR (/home/myuser/gecko/node_modules/aws-sdk/lib/event_listeners.js:495:46)
    at Request.callListeners (/home/myuser/gecko/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/home/myuser/gecko/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
    at Request.emit (/home/myuser/gecko/node_modules/aws-sdk/lib/request.js:683:14)
    at ClientRequest.error (/home/myuser/gecko/node_modules/aws-sdk/lib/event_listeners.js:333:22)
    at ClientRequest.<anonymous> (/home/myuser/gecko/node_modules/aws-sdk/lib/http/node.js:96:19)
    at ClientRequest.emit (events.js:198:13)
    at ClientRequest.EventEmitter.emit (domain.js:448:20)
    at TLSSocket.socketErrorListener (_http_client.js:401:9)
    at TLSSocket.emit (events.js:198:13)

I had a look at the cognito configuration, it seems to be only using region & pool id from configuration.

Could you please provide some documentation or integration state of AWS services?

Kind regards

question about the segment dragging

Hi, Thanks for awesome project!

I am questioning about the segment dragging. It seems that the online demo segment dragging can only be dragged in the whole level while the youtube video can dragged for different length for top and down part.

This is for youtube video version:
image

This is for online demo version:
image

Annotation formats

Describe the problem

Hey, this tool seems really nice, but is there a specification for the annotation formats (e.g. JSON) it uses? Which fields and how they're supposed to be arranged?

I'm using a custom vosk script to get a transcription and have no idea how to get that into a compliant format for gecko :/

Feature/doc request: display video and audio waveform at the same level

Hi,
I already discussed about that with @GolanLevy :
We noticed that the video player in gecko isn't much ergonomic : since it's above the waveform/timeline, on a classic 16:9 screen it's hard to seen neither the segment labelling nor the transcription section.

We thought it would be better to have both video and audio waveform at the same level, left and right like this :
gecko video

I did it on my fork but it broke the clicking on the waveform, can you explain where that happens / any hints to fix it ?

Bests,

Doc request: where do you convert monologues object to wavesurfer regions ?

Hi,
I'd like to add some fields to the current json format, e.g. model confidence in the id of a speaker.
After writing a toy file with speaker looking like this:

"speaker" : {
      "id" : "Virgil",
      "confidence":0.8
    }

It gets correctly parsed to a monologue using app/textFormats/json.js parse function but I'm not sure where it gets converted into a wavesurfer region ?

Running Gecko locally without internet connection (data privacy)

Hello Gecko team,

I'm interested in using your platform to annotate my audio data, but due to data privacy concerns, I need to run the application without access to the internet. Is this something that is possible? Whenever I try to run the app locally without internet, I have trouble accessing the GUI.

More broadly, I want to make sure that none of the data that I annotate is saved to any external server off of the local machine that runs the program. Some insight into the privacy of annotated data here when the S3 bucket isn't configured would be extremely helpful.

Thanks!

Fix: Proofreading-view

Hi,

Describe the problem

The proof-reading view gets buggy when one deletes a region. See my screenshot :

gecko bug proofreading

Source code / logs

No error messages in the console

How to reproduce

Merry Christmas :)

Single click on word to select for deleting.

I would like to have a single click on word and with backspace I want to delete it. So is there way to trigger the word selection with single click? for each editable-words -> span element

Segment overlapping in the annotation tool

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): -
  • Python version: -

Describe the problem

After I tried Gecko tool, we missed a feature that allows to annotate overlapped segments. In case of having several sound events simultaneously, this feature would be really useful. The current version only allows segments starting once the previous one has finished.

I suppose this feature has been contemplated, but it has not been possible to develop it yet. Is it on your roadmap?
Thanks in advance

Loading JSON file with segment-level annotation

I am trying to use Gecko to post-edit automatically generated transcripts for some recordings. I noticed that the JSON file format in the demo.json requires word-level alignments, but I only have segment-level transcripts (segments can be one or more words). Does the JSON format also support multi-word segments?

How clear editable-words filed

Hi,
Can you please tell me how to clear editable-words filed
i want to clear this
<editable-words
id="editableId"

                        region="ctrl.currentRegions[$index]"
                        file-index="$index"
                        class="editable-words"
                        control="ctrl"
                    >
                    </editable-words>

Waiting for your response please let me know

Feature request: shift word end when shifting segment OR shift segment when deleting words

Hi again,

I think it's currently quite tricky to correct ASR or forced-alignment timing errors (see also #68).

I propose two alternative solutions :

  • shift word end when shifting segment : currently, only the beginning of the word (i.e. term.start) is shifted when one shifts a segment (i.e. drags a region starting point from left to right). I assume the behaviour is symmetrical if shifting a region from right to left. I propose that you shift the end of the word (i.e. term.end) to keep the same word duration as before the shifting (idem if you shift the segment from right to left).
  • shift segment when deleting words. I mean setting speaker.start at terms[0].start and speaker.end at terms[-1].end. Of course this could be a post-processing step if only the words were shifted when shifting a segment (my first proposal).

I can take care of it if you agree on one solution and tell me where that happens :)

Bests,

Concatenation of consecutive words may not be saved

Hi

System information

Describe the problem

Concatenation of consecutive words may not be saved when clicking on "play", "play region" or somewhere else on the waveform.
You can reproduce this issue trying to remove the first white-space of the first region of the demo file (between "Hi," and "Virgil").
You may have to try a few time to reproduce this issue, but on my side, it happens most of the time whatever the method used to suppress this white-space.

Source code / logs

NA

Access Denied on Save with Private S3 Bucket

Describe the problem

When using a private S3 bucket with read/write restrictions, the save functionality throws an "Access Denied" error when using save-to-server (saving to the S3 bucket). This is due to the ACL permission given in line 28:

s3.upload({
Key,
Body: file,
Bucket: process.env.AWS_BUCKET,
ACL: 'public-read'
}, function(err, data) {
if (err) {
failCallback(err.message)
} else {
successCallback()
}
});

If this line is removed, i.e.

s3.upload({
        Key,
        Body: file,
        Bucket: process.env.AWS_BUCKET //, 
        // ACL: 'public-read'
        }, function(err, data) {
        if (err) {
            failCallback(err.message)
        } else {
            successCallback()
        }
    });

then the upload to a private bucket works.

I'm happy to work on a PR for this, but I wanted to see how you preferred to handle the different cases (public and private S3 buckets). Obviously, a try-catch around the different cases would work, but it's a bit sloppy imo.

System information

OSX - Using docker-compose

Duplication of modified transcription

Hi,

First, thanks for this amazing tool !

System information

Describe the problem

Modified part of a transcription can be duplicated when clicking on "play", "play region" button or elsewhere on the waveform
Two different examples follows based on the first region of the demo file:

1- On the whole region transcription:

  • Select the whole text with your mouse ("Hi, Virgil good to finally connect after all this talking back and forth a little bit we've been.") and replace it with the words "one two"
  • click on "play" or "play region" button or elsewhere on the waveform
  • result: The transcription of this region is now duplicated i.e. equal to "one two one two"
    image

2- on a subpart of a given transcription

  • Select the words "good to" with your mouse in the same region without the white-spaces before or after and replace it with the words "one two" (note: for unknown reason, when starting this edition, the white-spaces before or after "good to" can be included in this modified part of the sentence - the bug described here will only appear if none of the white-spaces before or after is included when starting to edit the text - you may have to try a few time to do so)
  • click on "play" or "play region" button or elsewhere on the waveform
  • result: The modified part of this transcription is now duplicated i.e. equal to "one two one two"
    image

Source code / logs

NA

Bug: Check regions error Overlapping in file:

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Debian
  • Python version: 3

Describe the problem

I use Gecko to create segments then save them as json, srt, and rttm files. However, if I want to reload the rttm or json file, I sometimes get overlap errors. This is despite not modifying the file at all after saving it from Gecko. The errors do not occur with srt files. The only timing difference between json/rttm and srt files is srt has a precision of 3 decimal places and json/rttm has a precision of 2 decimal places.

For example with the follow two srt segments:

 7
 00:01:35,635 --> 00:01:58,955
 <NA>
  
 8
 00:01:58,955 --> 00:02:00,436
 <NA>

and the corresponding rttm segments:

SPEAKER <NA> <NA> 95.64 23.32 <NA> <NA> 1 <NA> <NA>
SPEAKER <NA> <NA> 118.95 1.49 <NA> <NA> 2 <NA> <NA>

There is no overlap issue with the srt segments but there is a .01 overlap between the first and second rttm segments.

Source code / logs

The problem is likely a rounding error. I believe the problem is created from changing the precision of the segment timing from 3 decimal places to 2 decimal places via toFixed(2).

Here is a toy solution with the segments given above:

  var start = 95.635;
  var s = start.toFixed(2); //95.64
  var end = 118.955;
  var e = end.toFixed(2); //118.95
  var gecko_diff = (end - start).toFixed(2); // 23.32
  var correct_diff = (end.toFixed(2) - start.toFixed(2)).toFixed(2); //23.31

By fixing the region boundary decimal places to the final rttm ones before subtracting, it guarantees there will not be an overlap issue when reloading the file into Gecko.

How I make words clickable ,

Hi,
First of all your library is awesome. I appreciated your great effort keep it up.

I have one requirement I want to click on complete sentence and want to perform some functionality

and also is this possible to split the text in a different sentence in a new line
like below

Hello how are you
Hi I am fine

I am waiting for your response kindly reply me

Delete audio file segment

Hi,
Thank you for providing great library. I have one concern let's suppose I have audio file where I want to crop this audio file we can say from 20 sec to 45 sec. Is this possible in this library. Like we delete segment if we can delete segment. Is we can crop audio when select segment
Please let me know I am waiting for your response

Bug: srt files with CRLF line endings don't load properly

System information

  • Debian Stretch
  • The srt files are produced in a Windows environment

Describe the problem

I load an srt, rttm, and wav file.
When I load an srt file with CRLF line endings no segmentation information is shown.
However, when I change the line endings to LF all the information loads as usual,
The segment labelling section also becomes non interactive.

The first image shows what happens if I also upload an rttm file.


This second image shows what happens if I only upload the crlf srt file and the wav file.

I don't think it's related to my operating system but I do believe it is related to the operating system that created the srt files.

Source code / logs

You can use this vim command to convert your srt files to have crlf and recreate the issue:
vim file.txt -c "set ff=dos" -c ":wq"

Multiple Records Annotations

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): All
  • Python version: All

Describe the problem

We have separated records of each side in a conversation. We want to annotate them all together and if possible relate the annotations to the batch or to one steam (this part is not very important).

  1. Annotating in separation loses a lot of annotation and context so we want to annotate all of them together.
  2. Merging voice to one stream is easy, but with video, it starts to be dirty.
  3. Separated streams (not merging them) can give a better context, clear background noise from one side or when they talk to each other.

Source code / logs

It is a feature request so I don't have any logs/source code to show.

Feature request/doc request : Play all segments of a given speaker

Hi,
I'd like the "play" button of a given speaker in the "Segment Labeling" section to play all the segments of that speaker in a given order.
I should be able to implement it myself but I'm having problems diving into the code given the lack of technical documentation.

Bests,

Bug: Check regions error

System information

  • Latest version from the master branch, compiled via included Dockerfile, using Chrome as the local client.

Describe the problem

During transcription, after spending some time editing the transcript, some files begin to exhibit the "Check regions" error when attempting to save the file.

To reproduce:

  • Load the attached files (in the .zip archive)
  • Create a new segment around the 0:33 ~ 0:44 time frame
  • Save to .JSON
  • Observe the error popping out
Screen.Recording.2023-04-13.at.14.17.33.mov

repro.zip

Feature Request - support .stm output format

As kaldi training data requires .stm files - good addition would be to for the tool to output that (and maybe also as input format?)

The .stm file has lines of the form
waveform-name channel speakerID start-time end-time [] transcription

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.