otalk / hark Goto Github PK

Converts an audio stream to speech events in the browser

JavaScript 95.80% HTML 4.20%

hark's Introduction

Hark

Hark is a tiny browser/commonJS module that listens to an audio stream, and emits events indicating whether the user is speaking or not.

With browserify:

npm install hark

Without browserify download and use:

hark.bundle.js

Example:

npm install hark

If you aren't using browserify, you'll want hark.bundle.js.

  var hark = require('../hark.js')

  var getUserMedia = require('getusermedia')

  getUserMedia(function(err, stream) {
    if (err) throw err

    var options = {};
    var speechEvents = hark(stream, options);

    speechEvents.on('speaking', function() {
      console.log('speaking');
    });

    speechEvents.on('stopped_speaking', function() {
      console.log('stopped_speaking');
    });
  });

How does hark work?

Hark uses the webaudio API to FFT (get the power of) the audio in the audio stream. If the power is above a threshold, it's determined to be speech.

Usage

var speech = hark(stream, options);
speech.on('speaking', function() {
  console.log('Speaking!');
});

Pass hark either a webrtc stream which has audio enabled, or an audio element, and an optional options hash (see below for options).
hark returns an event emitter with the following events:
- speaking emitted when the stream appears to be speaking
- stopped_speaking emitted when the audio doesn't seem to be speaking
- volume_change emitted on every poll event by the event emitter with the current volume (in decibels) and the current threshold for speech
The hark object also has the following methods to update the config of hark. Both of these options can be passed in on instantiation, but you may wish to alter them either for debug or fine tuning as your app runs.
- setInterval(interval_in_ms) change
- setThreshold(threshold_in_db) change the minimum volume at which the audio will emit a speaking event
hark can be stopped by calling this method
- stop() will stop the polling and events will not be emitted.

Options

interval (optional, default 100ms) how frequently the analyser polls the audio stream to check if speaking has started or stopped. This will also be the frequency of the volume_change events.
threshold (optional, default -50db) the volume at which speaking/stopped\_speaking events will be fired
play (optional, default true for audio tags, false for webrtc streams) whether the audio stream should also be piped to the speakers, or just swallowed by the analyser. Typically for audio tags you would want to hear them, but for microphone based webrtc streams you may not to avoid feedback.
audioContext (optional, default is to create a single context) If you have already created an AudioContext, you can pass it to hark to use it instead of an internally generated one.

Understanding dB/volume threshold

Fine tuning the volume threshold is the main configuration setting for how this module will behave. The level of -50db have been chosen based on some basic experimentation on mysetup, but you may wish to change them (and should if it improves your app).

What is dB? Decibels are how sound is measured. The loudest sounds on your system will be at 0dB, and silence in webaudio is -100dB. Speech seems to be above -50dB depending on the volume and type of source. If speaking events are being fired too frequently, you would make this number higher (i.e. towards 0). If they are not firing frequently enough (you are speaking loudly but no events are firing), make the number closer to -100dB).

Demo:

Clone and open example/index.html or view it online

Requirements:

Chrome 27+, remote streams require Chrome 49+
Firefox
Microsoft Edge, support for remote streams is under consideration

License

MIT

hark's People

Contributors

Stargazers

Watchers

Forkers

henrikjoreteg snorp diegobill serkid nvdnkpr imclab andrenatal ordinaryworld legendvijay gaecom neuron-packages sylturner joshkaufman jbelke learn-alexuser01 sococo myl142857 roarz shelmanlee andrepcg artofspeed 0xbuidlman ezekeal anand-io andrewlefs ibc jt-nti devvmh atyenoria rkathir vivaldi-va olibaron zhanglianjie-163 mikeg0 ko-watanabe b5710547221 psykikk pechenn latentflip shubhambeta jadbox y3llojama rajashekaradukanibluecloud donhuvy tobils alirazachishti sethips mauna-ai rikhav-parsegon joeyreinders repausmedia zarkasy cliffireri up1 maxim11111 jason-shen oskar-codes slimojay disconnect3d jaywxx herryfan chinarui-na appgrader asim-aashish c-garza ajithdesilva rogervaas sahilmidha soltrinox alludaddy lotterfriends benbro calialove vpurandara adityak368 p2p-inc danielxvu kixefn98425 jmbp1999 lijunhaoabroad timkishkin

hark's Issues

Chrome 71 will require user gesture for AudioContext to work

We've been using AudioContext object to analyze media stream. Chrome plans to change audio policies December 2018 and they'll require user gesture for AudioContext to work:

https://developers.google.com/web/updates/2017/09/autoplay-policy-changes#webaudio

Key Point: If an AudioContext is created prior to the document receiving a user gesture, it will be created in the "suspended" state, and you will need to call resume() after a user gesture is received.

It means, that even if user granted access to mic for given website, audio analysis wont work if i.e. user fired the page and mic is being captured.

Do you have an alternative way to detect current mic level indication?

Add bower support

Hi!

For projects that don't use browserify is very inconvenient having to download the bundle and have it checked in in git rather that as a external dependency like other libraries. Have you considering adding bower support? Would you be interested in a pull-request with a bower.json to support bower install?

Cheers!

muting local stream seems to stop events from being generated

at least on chrome/linux. on windows, speaking / not speaking events are still generated.
Firefox doesnt seem to generate those events either.

investigate using RFC 6465 algorithm for audio level calculation / speaking events

I suspect the current max(freq) strategy for is somewhat unstable, since it just takes the maximum and ignores the frequency.

getByteTimeDomainData may enable us to calculate root mean square according to http://tools.ietf.org/html/rfc6465#appendix-A.1
If that doesn't work... we'll have to figure out something with the FFT data.

In ham radio, this is called "squelch"

A squelch circuit turns off the sound when no one is talking. You'll hear it in police TV shows, when the cops talk to each other over walkie-talkies.

add a stop() method to stop hark from indefinitely polling the stream.

i have fixed this in my fork. pull request is #8

Doesn't seem to work for Remote Peer RTC Connections

I get the stream from a user, pass it to hark, and it never says "talking".

Any way we could get a version on npm without example folder

Adding 2 mb to a lib just for an example seems kind of crazy. The lib otherwise would be well under 50kb and seems much preferable to 2mb.

Some incoming deprecations

In Chromium 62:

[Deprecation] GainNode.gain.value setter smoothing is deprecated and will be removed in M64, around January 2018. Please use setTargetAtTime() instead if smoothing is needed. See https://www.chromestatus.com/features/5287995770929152 for more details.

[Deprecation] AudioParam value setter will become equivalent to AudioParam.setValueAtTime() in M65, around March 2018 See https://webaudio.github.io/web-audio-api/#dom-audioparam-value for more details.

Link the demo with https://

The demo link in README points to http://otalk.github.io/hark/example/. Since getUserMedia() requires now HTTPS, it should be https://otalk.github.io/hark/example/.

deleted

How to make hark detect silence after a pause of 1.5 secs or 2 secs?

I have tried using setInterval() function and options.interval but it's not working.

recordrtc.js + hark.js = corrupted stream audio records (robotic voice + echo + noise) in safari

Well, using

var options = {};
  var speechEvents = hark(localStream, options);
  speechEvents.on('speaking', function() {
      stream_part();
    });
    speechEvents.on('stopped_speaking', function() {
      stream_recorder.stopRecording(function() {
        io_chunk(stream_recorder.getBlob());
    });
    });
/* */
function stream_part() {
  stream_recorder = new RecordRTC(localStream, { 
      type: 'audio', mimeType: 'audio/wav',
      recorderType: StereoAudioRecorder, numberOfAudioChannels: 1,
      desiredSampRate: 16000
  });
  stream_recorder.startRecording();
  }
  function io_chunk(part) {
    socket.emit('stream', part);
  }

All ok in chrome, but in Safari records corrupts somehow, how to prevent this, any ideas? Thx!

Strange values in Safari when mic volumen is 0

Safari Technology Preview in OSX High Sierra.
Mute the mic volume.
Check the dBs value in "volume_change" event:

It decreases until ~-500 and, after that, it becomes -100 (constant). It never reaches -Infinity.
In the other side, in Chrome/Firefox in behaves differently: it decreases gradually to -Infinity.

Not sure if hark may do something to "normalize" it. Currently it's so hard to detect "muted microphone".

Because turning away and silencing contributors is the best thing you can think of?

Webpage stop working

I did an app with hark work great but suddenly the webpage stop working and then i have to reopen (in less than one minute).
i dont know what is happening. Is this normal in the new versions of chrome?

Publish 1.2.0 to npm

Is there a reason why the latest npm version is on 1.1.6 while the latest release here is 1.2.0?

Can you publish an update to npm?

Support for using video element

Hey, I noticed that you only had support for audio elements as inputs, would it be possible to also add support for <video> tags? createMediaElementSource supposedly also works with <video> tags

Still maintained?

I've noticed a somewhat lack of activity on issues and PRs, just checking if otalk will be active to help shepard PRs if I start contributing.

Seems to stop working when page is not foreground

Seems to just stop emitting events if the page is in the background.

Does not work with remote WebRTC streams

Works perfectly fine for my local audio stream retrieved with getUserMedia(). However when putting in a remote WebRTC stream it does not work. I found this bug: https://code.google.com/p/chromium/issues/detail?id=112367 Do you have any idea how to get this working?

Browser is: Chrome Version 33.0.1750.117

Random error in Safari: TypeError: null is not an object (evaluating 'audioContext.createAnalyser')

Sometimes, not sure why and hard to reproduce (but it happens often,) I get this error in Safari OSX:

TypeError: null is not an object (evaluating 'audioContext.createAnalyser')

Obviously it happens here, although I cannot understand why that happens. Looking at the code it's clear that audioContext is never null...

My usage is very simple, I just do this:

this._hark = hark(stream, { play: false });

where stream is always a proper MediaStream instance which always has an audio track.

view it online?

example link is 404. //cc @latentflip

Documentation wrong: volume_change, not current_volume

Typo: the README.md states in the section "Usage" that an event "current_volume" is fired on every poll event. The event that is fired is called "volume_change".

does not work in chrome?

I tried with the latest browser.
https://otalk.github.io/hark/example/
nothing haoppoens both with webcam both with video

Add a getVolume() API to support on-demand volume level

A good addition to the library could be adding an API like getVolume(), which would return the current volume level. Currently, the library does a poll for the speaking events, as well as the volume. Having a poll for speaking events does make sense, but for volume, user might want to run the volume meter poll on the client side, and just query the volume when needed.

hark.bundle.js is broken

The bundle wrapper is doing something wrong. hark.bundle.js is broken. Take a look at the source and you immediately see that getMaxVolume() is not being called anywhere else in the source. I went back to the unbundled version and modernized the variable scope prefixes while I was at it, and it works now. Note: My version is formatted to be used as a module and I am importing wildemitted.js from a a sibling events directory, also as a module.

Here is my "unbundled" version. Tested and it works fine:

// Import dependencies (ensure WildEmitter is available as an ES6 module)

import WildEmitter from '../events/wildemitter.mjs';

export function getMaxVolume(analyser, fftBins) {
  let maxVolume = -Infinity;
  analyser.getFloatFrequencyData(fftBins);

  for (let i = 4, ii = fftBins.length; i < ii; i++) {
    if (fftBins[i] > maxVolume && fftBins[i] < 0) {
      maxVolume = fftBins[i];
    }
  }
  return maxVolume;
}

let audioContextType;
if (typeof window !== 'undefined') {
  audioContextType = window.AudioContext || window.webkitAudioContext;
}

let audioContext = null;

export function hark(stream, options) {
  let harker = new WildEmitter(); // Ensure WildEmitter is correctly imported or implemented

  if (!audioContextType) return harker;

  options = options || {};
  let smoothing = options.smoothing || 0.1,
    interval = options.interval || 50,
    threshold = options.threshold,
    play = options.play,
    history = options.history || 10;

  audioContext = options.audioContext || audioContext || new audioContextType();

  let analyser = audioContext.createAnalyser();
  analyser.fftSize = 512;
  analyser.smoothingTimeConstant = smoothing;
  let fftBins = new Float32Array(analyser.frequencyBinCount);

  let sourceNode;
  if (stream.jquery) stream = stream[0];
  if (stream instanceof HTMLAudioElement || stream instanceof HTMLVideoElement) {
    sourceNode = audioContext.createMediaElementSource(stream);
    if (typeof play === 'undefined') play = true;
    threshold = threshold || -50;
  } else {
    sourceNode = audioContext.createMediaStreamSource(stream);
    threshold = threshold || -50;
  }

  sourceNode.connect(analyser);
  if (play) analyser.connect(audioContext.destination);

  // Implement harker logic...

  return harker;
}

Support for updating stream and teardown of previous resources

Currently there is no way to start() hark monitoring. When the exposed function hark(stream, options) is triggered, it automatically starts the poll.
A better way would be to:

support an api to start the poll: hark.start()
on hark.stop(), clear the setTimeout, disconnect the analyser and free associated resources
Also, a way to update the media stream at runtime: hark.setStream() - this would first do hark.stop() to teardown the current existing pipeline, and then re-create it using the new stream.

add note regarding remote mediastreams

It might be useful to add a clear note to the readme that this lib currently only works with local mediastreams and not remote media streams that are received through a webrtc peer connection.

is it possible to use this package with react native?

Hello,

i am working on a media app that built by react native, and based on my research i come up to hark to listen and ideally convert speech to audio waves like the one in the whatsapp audio message waves.

is it possible to use this library with react native and convert speech to audio waves?

best wishes,