Giter VIP home page Giter VIP logo

lecturenotes's Introduction

LectureNotes

A chrome extension that summarizes audio from a chrome tab into text with support for multiple languages.
Get it on Chrome Web Store

This application was developed after my friends complained how difficult it is to take understand and take notes from a lecture at the same time and is primarily intended to take notes during a live session/lecture.

System-Design

Screenshots


Working

  • Upon intialization
    • The client registered itself with the server - uses short polling incase of a server error
    • Audio configration is sent to the server
    • The server creates a session corresponding to the client

  • Upon start recording
    • A WebSocket connection is opened with the server
    • The audio stream of the tab is captured by the background script - before it reaches to the speaker.
    • A Audio object plays the audio from the background script
    • Every 14 secs, a base64 string of the audio is sent to the server for transcription via Google speech to text.

  • Upon stop recording
    • The last buffered audio is sent to the server as a base64 string
    • Websocket connection is terminated
    • All audio streams are disconnected
    • Other objects are destroyed

  • Upon get notes
    • Server performs text summerization using DeepAI, sends back the response and resets transcription
    • Client converts the response into a blob and downloads it.

Optimizations

  • Increased performance by ~70ms by implementing a dual communication channel of http requests and web sockets.
    • Intially, a single Websocket connection was used for server-client communication and different event data were transmitted as JSON.
    • The audio is in uncompressed .wav format, A 14 sec base64 audio string is of size ~1.5Mb
    • To avoid json processing of a 1.5 Mb string, websockets now transmit the raw base64 audio string whereas http request are used for other type of events.

Future Planning

  • Use redis for storing session data
  • Implement a heartbeat mechanism for websockets to identify and terminate broken connections
  • Capture a tab close event to stop recording.
  • Make CORS opaque.
  • Use a compressed audio format like .mp3.
  • Add unit tests.

API Reference

Register a client

  POST /register
  Return: <text/plain> unique ID, used for further communication. 
Parameter Type Description
config json Configration of google speech to text api

Get notes

  POST /getNotes
  Return: <text/plain> Summarized text and transcription 
Parameter Type Description
ID Integer Required. ID returned from /register api

Send audio

  Websocket ws://${HOST}?ID=${clientID}&languageCode=${languageCode}
  Example: ws://LectureNotes:8080?ID=12&languageCode=hi-IN
  Note: Use native websockets
  
  Return: <text/plain> errors, if any
QueryString Parameter Type Description
ID Integer Required. ID returned from /register api
QueryString Parameter Type Description
language String Language code; Defaults to en-US
Websocket message Type Description
base64 string base64 encoded string of audio; must not exceed 10MB/15 secs

Run Locally

Install node and npm

https://nodejs.org/en/download/

Clone the project

  git clone https://github.com/smitdesai1010/LectureNotes.git

Add the following environment variables in a .env file located in ./Server folder

  DEEPAI_KEY

  GOOGLE_APPLICATION_CREDENTIALS

Go to the project directory

   cd Server
   npm install      //Install dependencies
   npm start        //starts server 

To start client (chrome-extension)

    Open Google chrome
    Click on "extension" > "Manage extensions"
    Click on "Load unpacked" > select the ./client/ folder in the project directory
    Click on "extensions" > "LectureNotes"

Acknowledgements

Feedback

If you have any feedback, please reach out to me at [email protected]

lecturenotes's People

Contributors

smitdesai1010 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.