Giter VIP home page Giter VIP logo

whisper-tool's Introduction

Whisper API (LAURE tool compatible)

Description

This is a simple transcriber API that uses whisper to transcribe audio files.

Prerequisites

download model you should download the model from here and put it in the models folder.

ex: data/models/ggml-base.bin and data/models/ggml-small.bin and data/models/ggml-medium.bin.

Quickstart

Use L.A.U.R.E

after you have modified the .laure file

laure create && laure push && laure run

OR (for testing and development purposes only)

prerequisites

In Dockerfiles add the following lines

ENV LAURE_HOST="0.0.0.0"
ENV LAURE_PORT="5000"

Build the container

docker build -t testing-stt .

Run the container

docker run -d -p 5000:5000 --name stt testing-stt

Delete the container

docker rm -f stt

API Endpoints

Method Endpoint Description
GET / Welcome message
POST /transcribe Transcribe an audio file
GET /transcribe Get the transcription of the audio file

/ (GET)

Welcome message

Return

{
  "success": "Welcome to the Whisper API"
}
Parameter Type Description
success string The welcome message

/transcribe POST

Transcribe an audio file to text and return the id of the transcription

application/octet-stream is the only accepted content type for the file, use stream=True with requests.post to send the file if you are using requests in python

Parameters

Parameter Type Description Additional Info
model string The model to use for the transcription optional, default is base and you can use base/small/medium
file file The audio file to transcribe
secret string Secret key to use the API actually not used

Return

Parameter Type Description Additional Info
id string The id of the transcription
success boolean True if the transcription was successful only if success
error string The error message if the transcription failed only if not success

/transcribe GET

Get the transcription of the audio file

Parameters

Parameter Type Description Additional Info
id string The id of the transcription

Return

Parameter Type Description Additional Info
text string The transcription of the audio file
status string The status of the transcription success/running/failed
success string Success is present if the transcription succeeded or is running only if not failed
error string The error message if the transcription failed only if failed

Examples

Transcribe an audio file

Request

with open(file_path, "rb") as audio_file:
  files = {"file": audio_file}
  response = requests.post(url=url, data={
      "model": model
  }, files=files, verify=False, stream=True)
result = response.json()
print(result)

Response

{
  "id": "5f7e3b3e-3b3e-4e3b-7e3f-3b3e4e3b7e3f",
  "success": true
}

Get the transcription of the audio file

Request

result = get_text(url, id)
if "error" in result:
  result = {"status": "failed", "text": result["error"]}
elif result["status"] == "success":
  result = {"status": "success", "text": result["text"]}
else:
  result = {"status": "running"}
print(result)

Response

{
  "text": "The transcription of the audio file",
  "status": "success",
  "success": true
}

TODO before it's ready to multiple usage

  • Add a way to change the model
  • Add a way to change the language
  • maybe a ffmpeg to convert the audio file to the right format
  • Add a way to get json output
  • Add option like temperature, initial_prompt, ...

References

License

MIT

whisper-tool's People

Contributors

enixcode avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.