Giter VIP home page Giter VIP logo

rapidai / rapidasr Goto Github PK

View Code? Open in Web Editor NEW
484.0 18.0 58.0 36.7 MB

商用级开源语音自动识别程序库,开箱即用,全平台支持,中英文混合识别。A Cross-platform implementation of ASR inference. It's based on ONNXRuntime and FunASR. We provide a set of easier APIs to call ASR models.

License: MIT License

Python 8.36% CMake 0.45% C++ 75.21% C 15.97%
asr paraformer paddlespeech wenet

rapidasr's Introduction

Rapid ASR

  • 🎉 推出知识星球RapidAI私享群,这里的提问会优先得到回答和支持,也会享受到RapidAI组织后续持续优质的服务。欢迎大家的加入。
  • Paraformer模型出自阿里达摩院Paraformer语音识别-中文-通用-16k-离线-large-pytorch
  • 本仓库仅对模型做了转换,只采用ONNXRuntime推理引擎。该项目核心代码已经并入FunASR
  • 项目仍会持续更新,欢迎关注。
  • QQ群号:645751008

📖文档导航

📆TODO以及任务认领

  • 参见这里:link

🎨整体框架

flowchart LR

A([wav]) --RapidVad--> B([各个小段的音频]) --RapidASR--> C([识别的文本内容]) --RapidPunc--> D([最终识别内容])
Loading

📣更新日志

详情 - 2023-08-21 v2.0.4 update: - 添加whl包支持 - 更新文档 - 2023-02-25 - 添加C++版本推理,使用onnxruntime引擎,预/后处理代码来自: [FastASR](https://github.com/chenkui164/FastASR) - 2023-02-14 v2.0.3 update: - 修复librosa读取wav文件错误 - 修复fbank与torch下fbank提取结果不一致bug - 2023-02-11 v2.0.2 update: - 模型和推理代码解耦(`rapid_paraformer`和`resources`) - 支持批量推理(通过`resources/config.yaml`中`batch_size`指定) - 增加多种输入方式(`Union[str, np.ndarray, List[str]]`) - 2023-02-10 v2.0.1 update: - 添加对输入音频为噪音或者静音的文件推理结果捕捉。

rapidasr's People

Contributors

lauragpt avatar peach-water avatar swhl avatar znsoftm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rapidasr's Issues

👀目前想做但未做的任务整理,想贡献的小伙伴可来这里认领呦!

⚠️ 注意

  • 以下均为独立仓库,如需贡献,注意去对应仓库下哈
  • RaidASR | RapidPunc | RapidVAD 这三个仓库在FunASR下均已有onnxruntime推理实现,可参考link,做整理与抽取,解耦合。

RapidASR

该部分主要是整理FunASR中对应的功能

  • RapidASR与FunASR中代码与效果对齐,需要再次验证,包括中英文混合音频
  • RapidASR增加流式推理能力
  • 整理热词模型

RapidPunc

RapidVAD


RapidTTS

RapidTP-Aligns(时间戳预测)

😉😉😉

想要认领尝试的小伙伴,直接在下面回复认领哪个即可。
不要有任何压力,我们社区很轻松愉快的。

Model requires 6 inputs. Input Feed contains 2

export onnx model with “wenet/bin/export_onnx.py” and run “python test_demo.py” with exported onnx , get follow error:

Traceback (most recent call last):
  File "test_demo.py", line 4, in <module>
    from wenet import WenetInfer
  File "/data/disk1/ybZhang/.opendir/RapidASR/python/base_wenet/wenet/__init__.py", line 2, in <module>
    from .wenet_infer import WenetInfer
  File "/data/disk1/ybZhang/.opendir/RapidASR/python/base_wenet/wenet/wenet_infer.py", line 37, in <module>
    from swig_decoders import (PathTrie, TrieVector,
ModuleNotFoundError: No module named 'swig_decoders'
source activate wenet
(wenet) [ybZhang@master /data/disk1/ybZhang/.opendir/RapidASR/python/base_wenet]$python test_demo.py pretrain_model2/onnx_cpu/ test_data/test.wav
Traceback (most recent call last):
  File "test_demo.py", line 22, in <module>
    key, content, elapse = wenet_infer(wav_path)
  File "/data/disk1/ybZhang/.opendir/RapidASR/python/base_wenet/wenet/wenet_infer.py", line 148, in __call__
    ort_outs = self.encoder_ort_session.run(None, ort_inputs)
  File "/home/ybZhang/miniconda3/envs/wenet/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 184, in run
    raise ValueError("Model requires {} inputs. Input Feed contains {}".format(num_required_inputs, num_inputs))
ValueError: Model requires 6 inputs. Input Feed contains 2

提个需求

集成到obs中,市面上是由网络调用实时字幕显示的,但是功能识别率和网络问题,会导致识别效果不是很理想
现提出集成到obs中,离线使用。

windows版本识别结果乱码

您好,windows下model.onnx一直导出失败,使用的model.onnx是从linux复制过来的,返回的识别文字是:
Result:汉字 "Result: "鐢氳嚦鍑虹幇浜ゆ槗鍑犱箮鍋滄粸鐨勬儏鍐".
汉字两个字代码手写的,排除编码之类的问题

解决方式,UTF8ToGBK函数转一下就好了

ValueError: negative dimensions are not allowed

run test_demo.py error in feature.py:
def sliding_window(x, window_size, window_shift):
shape = x.shape[:-1] + (x.shape[-1] - window_size + 1, window_size)
strides = x.strides + (x.strides[-1],)
return np.lib.stride_tricks.as_strided(x, shape=shape, strides=strides)[::window_shift]
if x.shape[-1] - window_size + 1 < 0:
ValueError: negative dimensions are not allowed
what is the mean x.shape[-1] - window_size + 1 ?

报错无法解析模型 。

下载百度云的模型,然后用的是venv的环境,按照requirements.txt安装的,但是在运行demo.py的时候报错无法解析。

知道是环境问题,但是不知道是哪个库的版本问题,我昨天在mac上可以跑起来,也没报错,但是今天换到了台式机上发现报错。

2 0 2 3 - 0 4 - 1 6 1 8 : 3 3 : 2 4 . 2 7 7 1 1 1 0 [ W : o n n x r u n t i m e : D e f a u l t , o n n x r u n t i m e _ p y b i n d _ s t a t e . c c : 1 6 4 1 o n n x r u n t i m e : : p y t h o n : : C r e a t e I n f e r e n c e P y b i n d S t a t e M o d u l e ] I n i t p r o v i d e r b r i d g e f a i l e d . Traceback (most recent call last): File "F:/语音识别/RapidASR-2.0.0/demo.py", line 7, in <module> paraformer = RapidParaformer() File "F:\语音识别\RapidASR-2.0.0\rapid_paraformer\rapid_paraformer.py", line 28, in __init__ self.ort_infer = OrtInferSession(config['Model']) File "F:\语音识别\RapidASR-2.0.0\rapid_paraformer\utils.py", line 304, in __init__ self.session = InferenceSession(config['model_path'], File "F:\语音识别\RapidASR-2.0.0\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 347, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "F:\语音识别\RapidASR-2.0.0\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 384, in _create_inference_session sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model) onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from F:\语音识别\RapidASR-2.0.0\rapid_paraformer\models\asr_paraformerv2.onnx failed:Protobuf parsing failed.
以上是报错完全信息。
`Package Version


audioread 3.0.0
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 3.1.0
coloredlogs 15.0.1
decorator 5.1.1
onnx 1.13.1
onnxruntime 1.13.1
packaging 23.1
pip 21.3.1
platformdirs 3.2.0
pooch 1.7.0
protobuf 3.20.3
pycparser 2.21
pyreadline3 3.4.1
PyYAML 6.0
requests 2.28.2
resampy 0.4.2
scikit-learn 1.2.2
scipy 1.7.3
setuptools 60.2.0
soundfile 0.12.1
sympy 1.11.1
threadpoolctl 3.1.0
typeguard 2.13.3
typing_extensions 4.5.0
urllib3 1.26.15
wheel 0.37.1
zipp 3.15.0
`
因为我上传不了图片,所以以代码块的形式展示目前这个环境的所有库的版本, 期待大佬指点一下。

如何写配置文件,让ort支持cuda

我尝试把模型放在gpu上面运行,我把use_cuda从false改成了true, 但是还是不行,请问怎么办?

TokenIDConverter:
  token_path: resources/models/token_list.pkl
  unk_symbol: <unk>

CharTokenizer:
  symbol_value:
  space_symbol: <space>
  remove_non_linguistic_symbols: false

WavFrontend:
  cmvn_file: resources/models/am.mvn
  frontend_conf:
    fs: 16000
    window: hamming
    n_mels: 80
    frame_length: 25
    frame_shift: 10
    lfr_m: 7
    lfr_n: 6
    filter_length_max: -.inf
    dither: 0.0

Model:
  model_path: resources/models/asr_paraformerv2.onnx
  use_cuda: true
  CUDAExecutionProvider: 
      device_id: 0
      arena_extend_strategy: kNextPowerOfTwo
      cudnn_conv_algo_search: EXHAUSTIVE
      do_copy_in_default_stream: true
  batch_size: 3

还更新吗?

公告上说 “该项目核心代码已经并入FunASR”,那这个项目是否继续维护?如果继续更新的话,和FunASR会有什么差异?

请教关于python调用demo的时候,识别的结果很差

回答的结果是['百必苦之说苦是苦苦之之苦是苦之之之此苦之苦之之之此此之之苦苦之谷谷此之谷此苦之之谷此谷苦谷之之谷苦之谷此是苦之谷此是苦之之苦三之谷此谷苦之之谷此之果苦之之谷故苦之谷此苦],但是实际语音并不是这个

谢谢!

请教一下是否可以多线程并行?

你好,首先非常感谢分享。
想请教一下,RapidAsrRecogFile 是否可以多线程并行运行?
就是说使用RapidAsrInit初始化一次模型获得的句柄,是否可以多线程中并行调用RapidAsrRecogFile?

多谢

ImportError: cannot import name 'check_argument_types' from 'typeguard'

报错信息:
Traceback (most recent call last):
File "/Users/edy/PycharmProjects/pythonProject2/RapidASR/python/demo.py", line 4, in
from rapid_paraformer import RapidParaformer
File "/Users/edy/PycharmProjects/pythonProject2/RapidASR/python/rapid_paraformer/init.py", line 4, in
from .rapid_paraformer import RapidParaformer
File "/Users/edy/PycharmProjects/pythonProject2/RapidASR/python/rapid_paraformer/rapid_paraformer.py", line 11, in
from .utils import (CharTokenizer, Hypothesis, ONNXRuntimeError,
File "/Users/edy/PycharmProjects/pythonProject2/RapidASR/python/rapid_paraformer/utils.py", line 14, in
from typeguard import check_argument_types
ImportError: cannot import name 'check_argument_types' from 'typeguard' (/opt/homebrew/Caskroom/miniconda/base/envs/modelscope/lib/python3.8/site-packages/typeguard/init.py)

已解决:
终端执行
pip uninstall typeguard
pip install typeguard==2.13.1 -i https://pypi.org/simple/

原因
因为 typeguard 3.0 版本后移除了 check_argument_types ,所以只需要降低至 typeguard <3.0 即可
image

CPP版本多核性能测试分析

测试环境Rocky Linux 8,仅测试cpp版本结果(未测python版本)

简述:

在3台配置不同的机器上分别编译并测试,在fftw和onnxruntime版本都相同的前提下,识别同一个30分钟的音频文件,分别测试不同onnx线程数量的表现。
image

目前可以总结出大致规律:

  • 并非onnx线程数越多越好
  • 2线程比1线程提升显著,线程再多则提升较小
  • 线程数等于CPU物理核心数时效率最好

实操建议:

  • 大部分场景用3-4线程性价比最高
  • 低配机器用2线程合适

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.