Giter VIP home page Giter VIP logo

hark's Introduction

Hark

Hark is a tiny browser/commonJS module that listens to an audio stream, and emits events indicating whether the user is speaking or not.

With browserify:

npm install hark

Without browserify download and use:

hark.bundle.js

Example:

npm install hark

If you aren't using browserify, you'll want hark.bundle.js.

  var hark = require('../hark.js')

  var getUserMedia = require('getusermedia')

  getUserMedia(function(err, stream) {
    if (err) throw err

    var options = {};
    var speechEvents = hark(stream, options);

    speechEvents.on('speaking', function() {
      console.log('speaking');
    });

    speechEvents.on('stopped_speaking', function() {
      console.log('stopped_speaking');
    });
  });

How does hark work?

Hark uses the webaudio API to FFT (get the power of) the audio in the audio stream. If the power is above a threshold, it's determined to be speech.

Usage

var speech = hark(stream, options);
speech.on('speaking', function() {
  console.log('Speaking!');
});
  • Pass hark either a webrtc stream which has audio enabled, or an audio element, and an optional options hash (see below for options).
  • hark returns an event emitter with the following events:
    • speaking emitted when the stream appears to be speaking
    • stopped_speaking emitted when the audio doesn't seem to be speaking
    • volume_change emitted on every poll event by the event emitter with the current volume (in decibels) and the current threshold for speech
  • The hark object also has the following methods to update the config of hark. Both of these options can be passed in on instantiation, but you may wish to alter them either for debug or fine tuning as your app runs.
    • setInterval(interval_in_ms) change
    • setThreshold(threshold_in_db) change the minimum volume at which the audio will emit a speaking event
  • hark can be stopped by calling this method
    • stop() will stop the polling and events will not be emitted.

Options

  • interval (optional, default 100ms) how frequently the analyser polls the audio stream to check if speaking has started or stopped. This will also be the frequency of the volume_change events.
  • threshold (optional, default -50db) the volume at which speaking/stopped\_speaking events will be fired
  • play (optional, default true for audio tags, false for webrtc streams) whether the audio stream should also be piped to the speakers, or just swallowed by the analyser. Typically for audio tags you would want to hear them, but for microphone based webrtc streams you may not to avoid feedback.
  • audioContext (optional, default is to create a single context) If you have already created an AudioContext, you can pass it to hark to use it instead of an internally generated one.

Understanding dB/volume threshold

Fine tuning the volume threshold is the main configuration setting for how this module will behave. The level of -50db have been chosen based on some basic experimentation on mysetup, but you may wish to change them (and should if it improves your app).

What is dB? Decibels are how sound is measured. The loudest sounds on your system will be at 0dB, and silence in webaudio is -100dB. Speech seems to be above -50dB depending on the volume and type of source. If speaking events are being fired too frequently, you would make this number higher (i.e. towards 0). If they are not firing frequently enough (you are speaking loudly but no events are firing), make the number closer to -100dB).

Demo:

Clone and open example/index.html or view it online

Requirements:

  • Chrome 27+, remote streams require Chrome 49+
  • Firefox
  • Microsoft Edge, support for remote streams is under consideration

License

MIT

hark's People

Contributors

devvmh avatar fippo avatar henrikjoreteg avatar ibc avatar latentflip avatar mikeg0 avatar ordinaryworld avatar snorp avatar thehunmonkgroup avatar vivaldi-va avatar xdumaine avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hark's Issues

Chrome 71 will require user gesture for AudioContext to work

We've been using AudioContext object to analyze media stream. Chrome plans to change audio policies December 2018 and they'll require user gesture for AudioContext to work:

https://developers.google.com/web/updates/2017/09/autoplay-policy-changes#webaudio

Key Point: If an AudioContext is created prior to the document receiving a user gesture, it will be created in the "suspended" state, and you will need to call resume() after a user gesture is received.

It means, that even if user granted access to mic for given website, audio analysis wont work if i.e. user fired the page and mic is being captured.

Do you have an alternative way to detect current mic level indication?

Add bower support

Hi!

For projects that don't use browserify is very inconvenient having to download the bundle and have it checked in in git rather that as a external dependency like other libraries. Have you considering adding bower support? Would you be interested in a pull-request with a bower.json to support bower install?

Cheers!

In ham radio, this is called "squelch"

A squelch circuit turns off the sound when no one is talking. You'll hear it in police TV shows, when the cops talk to each other over walkie-talkies.

recordrtc.js + hark.js = corrupted stream audio records (robotic voice + echo + noise) in safari

Well, using

var options = {};
  var speechEvents = hark(localStream, options);
  speechEvents.on('speaking', function() {
      stream_part();
    });
    speechEvents.on('stopped_speaking', function() {
      stream_recorder.stopRecording(function() {
        io_chunk(stream_recorder.getBlob());
    });
    });
/* */
function stream_part() {
  stream_recorder = new RecordRTC(localStream, { 
      type: 'audio', mimeType: 'audio/wav',
      recorderType: StereoAudioRecorder, numberOfAudioChannels: 1,
      desiredSampRate: 16000
  });
  stream_recorder.startRecording();
  }
  function io_chunk(part) {
    socket.emit('stream', part);
  }

All ok in chrome, but in Safari records corrupts somehow, how to prevent this, any ideas? Thx!

Strange values in Safari when mic volumen is 0

  • Safari Technology Preview in OSX High Sierra.
  • Mute the mic volume.
  • Check the dBs value in "volume_change" event:

It decreases until ~-500 and, after that, it becomes -100 (constant). It never reaches -Infinity.
In the other side, in Chrome/Firefox in behaves differently: it decreases gradually to -Infinity.

Not sure if hark may do something to "normalize" it. Currently it's so hard to detect "muted microphone".

Webpage stop working

I did an app with hark work great but suddenly the webpage stop working and then i have to reopen (in less than one minute).
i dont know what is happening. Is this normal in the new versions of chrome?

Publish 1.2.0 to npm

Is there a reason why the latest npm version is on 1.1.6 while the latest release here is 1.2.0?

Can you publish an update to npm?

Support for using video element

Hey, I noticed that you only had support for audio elements as inputs, would it be possible to also add support for <video> tags? createMediaElementSource supposedly also works with <video> tags

Still maintained?

I've noticed a somewhat lack of activity on issues and PRs, just checking if otalk will be active to help shepard PRs if I start contributing.

Random error in Safari: TypeError: null is not an object (evaluating 'audioContext.createAnalyser')

Sometimes, not sure why and hard to reproduce (but it happens often,) I get this error in Safari OSX:

TypeError: null is not an object (evaluating 'audioContext.createAnalyser')

Obviously it happens here, although I cannot understand why that happens. Looking at the code it's clear that audioContext is never null...

My usage is very simple, I just do this:

this._hark = hark(stream, { play: false });

where stream is always a proper MediaStream instance which always has an audio track.

Add a getVolume() API to support on-demand volume level

A good addition to the library could be adding an API like getVolume(), which would return the current volume level. Currently, the library does a poll for the speaking events, as well as the volume. Having a poll for speaking events does make sense, but for volume, user might want to run the volume meter poll on the client side, and just query the volume when needed.

hark.bundle.js is broken

The bundle wrapper is doing something wrong. hark.bundle.js is broken. Take a look at the source and you immediately see that getMaxVolume() is not being called anywhere else in the source. I went back to the unbundled version and modernized the variable scope prefixes while I was at it, and it works now. Note: My version is formatted to be used as a module and I am importing wildemitted.js from a a sibling events directory, also as a module.

Here is my "unbundled" version. Tested and it works fine:

// Import dependencies (ensure WildEmitter is available as an ES6 module)

import WildEmitter from '../events/wildemitter.mjs';

export function getMaxVolume(analyser, fftBins) {
  let maxVolume = -Infinity;
  analyser.getFloatFrequencyData(fftBins);

  for (let i = 4, ii = fftBins.length; i < ii; i++) {
    if (fftBins[i] > maxVolume && fftBins[i] < 0) {
      maxVolume = fftBins[i];
    }
  }
  return maxVolume;
}

let audioContextType;
if (typeof window !== 'undefined') {
  audioContextType = window.AudioContext || window.webkitAudioContext;
}

let audioContext = null;

export function hark(stream, options) {
  let harker = new WildEmitter(); // Ensure WildEmitter is correctly imported or implemented

  if (!audioContextType) return harker;

  options = options || {};
  let smoothing = options.smoothing || 0.1,
    interval = options.interval || 50,
    threshold = options.threshold,
    play = options.play,
    history = options.history || 10;

  audioContext = options.audioContext || audioContext || new audioContextType();

  let analyser = audioContext.createAnalyser();
  analyser.fftSize = 512;
  analyser.smoothingTimeConstant = smoothing;
  let fftBins = new Float32Array(analyser.frequencyBinCount);

  let sourceNode;
  if (stream.jquery) stream = stream[0];
  if (stream instanceof HTMLAudioElement || stream instanceof HTMLVideoElement) {
    sourceNode = audioContext.createMediaElementSource(stream);
    if (typeof play === 'undefined') play = true;
    threshold = threshold || -50;
  } else {
    sourceNode = audioContext.createMediaStreamSource(stream);
    threshold = threshold || -50;
  }

  sourceNode.connect(analyser);
  if (play) analyser.connect(audioContext.destination);

  // Implement harker logic...

  return harker;
}

Support for updating stream and teardown of previous resources

Currently there is no way to start() hark monitoring. When the exposed function hark(stream, options) is triggered, it automatically starts the poll.
A better way would be to:

  1. support an api to start the poll: hark.start()
  2. on hark.stop(), clear the setTimeout, disconnect the analyser and free associated resources
  3. Also, a way to update the media stream at runtime: hark.setStream() - this would first do hark.stop() to teardown the current existing pipeline, and then re-create it using the new stream.

add note regarding remote mediastreams

It might be useful to add a clear note to the readme that this lib currently only works with local mediastreams and not remote media streams that are received through a webrtc peer connection.

is it possible to use this package with react native?

Hello,

i am working on a media app that built by react native, and based on my research i come up to hark to listen and ideally convert speech to audio waves like the one in the whatsapp audio message waves.

is it possible to use this library with react native and convert speech to audio waves?

best wishes,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.