Giter VIP home page Giter VIP logo

capswriter-offline's Issues

源代码运行出错

作者您好~
我遇到运行问题,请问能否帮忙参考如何解决。打包的版本在本机运行正常,源代码直接运行时,服务端运行正常(用Client的EXE是可以正常使用的),但是Client端用源代码运行时,按Caps Lock键,终端无反应,不会出现“开始录音”。
请问可以如何排查此问题,非常感谢。

打包完运行后缺少库

绑定的服务地址:0.0.0.0:6016

Process Process-2:
Traceback (most recent call last):
File "multiprocessing\process.py", line 314, in bootstrap
File "multiprocessing\process.py", line 108, in run
File "D:\软件工具\python小工具\CapsWriter-Offline\dist\CapsWriter-Offline\util\server_init_recognizer.py", line 29, in init_recognizer
from funasr_onnx import CT_Transformer
File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module
File "funasr_onnx_init
.py", line 2, in
File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module
File "funasr_onnx\paraformer_bin.py", line 10, in
ModuleNotFoundError: No module named 'torch'

打包前正常运行

运行一段时间后无法正常使用

报错如下

Task exception was never retrieved
future: <Task finished name='Task-14' coro=<do_recognize() done, defined at F:\TEMP\CapsWriter\core_client.py:290>
exception=ValueError('need at least one array to concatenate')>
Traceback (most recent call last):
  File "F:\TEMP\CapsWriter\core_client.py", line 314, in do_recognize
    samples0 = np.concatenate(samples0)
  File "<__array_function__ internals>", line 180, in concatenate
ValueError: need at least one array to concatenate
Exception in thread Thread-2 (record):
Traceback (most recent call last):
  File "F:\TEMP\CapsWriter\libs\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "F:\TEMP\CapsWriter\libs\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "F:\TEMP\CapsWriter\core_client.py", line 437, in record
    data = stream.read(int(0.05 * 48000))[0]
  File "F:\TEMP\CapsWriter\libs\sounddevice.py", line 1456, in read
    data, overflowed = RawInputStream.read(self, frames)
  File "F:\TEMP\CapsWriter\libs\sounddevice.py", line 1236, in read
    _check(err)
  File "F:\TEMP\CapsWriter\libs\sounddevice.py", line 2745, in _check
    raise PortAudioError(errormsg, err, hosterror_info)
sounddevice.PortAudioError: Unanticipated host error [PaErrorCode -9999]: 'There is no driver installed on your system.' [MME error 6]
检测到配置文件更新,已载入     4 条中文热词

Task exception was never retrieved
future: <Task finished name='Task-16' coro=<do_recognize() done, defined at F:\TEMP\CapsWriter\core_client.py:290>
exception=ValueError('need at least one array to concatenate')>
Traceback (most recent call last):
  File "F:\TEMP\CapsWriter\core_client.py", line 314, in do_recognize
    samples0 = np.concatenate(samples0)
  File "<__array_function__ internals>", line 180, in concatenate
ValueError: need at least one array to concatenate
Task exception was never retrieved
future: <Task finished name='Task-18' coro=<do_recognize() done, defined at F:\TEMP\CapsWriter\core_client.py:290>
exception=ValueError('need at least one array to concatenate')>
Traceback (most recent call last):
  File "F:\TEMP\CapsWriter\core_client.py", line 314, in do_recognize
    samples0 = np.concatenate(samples0)
  File "<__array_function__ internals>", line 180, in concatenate
ValueError: need at least one array to concatenate

[功能请求] 输入状态提示

现在已经利用nssm设置了开机自启动+隐藏cmd窗口,在输入的时候有些不踏实,所以希望能够加入输入状态提示;

按下启动热键 开始语音识别时,获取当前光标位置,并在该位置 显示一个gif图片,这样更加有确认感:

  • 确认CapsWriter已启动,且当前正在进行语音输入;
  • 确认当前语音输入的位置

这个离线版,每次识别结果都会弄一份到剪贴板吗?

在我这里是这样的,每句话识别结果都会在剪贴板里占一条,导致我的第三方剪贴板工具就被事实上占领瘫痪了。

作者大大在线版本并没有这个问题。

请问是设计如此,还是我自己这边有什么个别原因导致个别问题呢?

image

image

服务端未连接,无法发送

初次运行,运行start-client开始输入后显示“服务端未连接,无法发送”咋办呀。另外问下各位的模型加载耗时几秒

能否輸出繁體字呢?

如題 有沒有辦法讓他能夠或者是更新一個輸出繁體字的功能呢><?
台灣的用起來會方便一些
感謝作者 辛苦了

请问能否更换录音方法呢?

目前是输入需要一直按着capslk录音,松开结束录音。但是如果一段话有点长,一直按着比较不方便。
因此能否改为,连点两下capslk开始录音,再连点两下结束录音呢?
谢谢大佬😙

连续读数时,被错误转换为数值

连续读取 11 12 13 14 15等类似数时,被识别为类似112345的数字
convert_value_num将"十一十二十三十四十五"认定为一个数字
已启用符号库,没有正确的将文本转换为类似 "十一、十二、十三、十四、十五"

Python310 下报错无指定的 sherpa_onnx 版本

ERROR: Could not find a version that satisfies the requirement sherpa_onnx==1.5.4 (from versions: 1.8.9, 1.8.10, 1.8.11, 1.8.14, 1.9.0, 1.9.1, 1.9.3, 1.9.4)
ERROR: No matching distribution found for sherpa_onnx==1.5.4

我的环境:Windows10,Python3.10.0,
测试了我自己的清华源和 https://mirror.sjtu.edu.cn/pypi/web/simple,都是一样的报错。
对应的requirements文件:

rich
websockets
numpy==1.23.3
typeguard==2.13.3
sherpa_onnx==1.5.4
funasr_onnx==0.0.6
kaldi-native-fbank==1.17

不过,去掉版本指定之后,安装依赖、执行 python 程序,可用。
离线版 exe 也可用,而且 exe 特别丝滑,感谢作者!

服务端运行报错,试了两台电脑都是一样效果

Traceback (most recent call last):
File "start_server.py", line 10, in
File "D:\BaiduNetdiskDownload\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\core_server.py", line 207, in init
asyncio.run(main())
File "D:\BaiduNetdiskDownload\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "D:\BaiduNetdiskDownload\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\asyncio\base_events.py", line 649, in run_until_complete
return future.result()
File "D:\BaiduNetdiskDownload\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\core_server.py", line 170, in main
recognizer = sherpa_onnx.OfflineRecognizer.from_paraformer(
File "D:\BaiduNetdiskDownload\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\sherpa_onnx\offline_recognizer.py", line 158, in from_paraformer
self.recognizer = _Recognizer(recognizer_config)
RuntimeError: Failed to load model with error: D:\a_work\1\s\onnxruntime\core/graph/model_load_utils.h:56 onnxruntime::model_load_utils::ValidateOpsetForDomain ONNX Runtime only guarantees support for models stamped with official released onnx opset versions. Opset 19 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain com.ms.internal.nhwc is till opset 18.

[8260] Failed to execute script 'start_server' due to unhandled exception!

win7使用客户端

WIN7(py3.8)下使用客户端会出现无法录音的情况,按下大写锁定无反应,WIN10正常,大佬是否可以帮忙打包一个WIN764的客户端

关闭标点符号引擎时,希望能根据停顿间隔添加逗号或空格、甚至句号;以及将两个模型分开打包

现在关闭 标点符号引擎 时,无论是连续大段说话还是启启停停,都是连着的。

在部分场景,比如即时通讯或快速录入供自己修改(比如录入字幕)等场景,并不需要完善的标点符号。
希望能根据停顿间隔添加逗号(或空格)。甚至停顿更久些时,添加句号,或换行。

换行 在聊天软件、录入视频字幕这些场景可以直接使用了,省略后续操作。

另外,希望能将两个模型分开打包,就可以不下载 标点符号引擎了 ,方便部分人试用、临时用,某些带宽吃紧或有便携需求的场景。
最后,感谢作者的工作,试用了效果很不错,是我一直想要的软件。

难搞=阿里语音识别

https://translate.alibaba.com/#core-translation
识别后老是有时间,例子=
2023-08-11 19:57:27
体验行业领先的智能翻译,支持全球214种语言生根,多个垂直领域。

阿里语音识别的问题=浏览器在线使用=火狐浏览器的驱动老是搞错,搞成语音鼠标的驱动,搞的每回用都要叉掉权限刷新1次才行
图片

最近同时在使用 https://tingwu.aliyun.com/home 录好几次都没录到音频
咪鼠语音鼠 客户端一直没更新=发邮件之类的都没人回,老是崩溃=搞的根本用不了
讯飞、搜狗输入法电脑客户端语音输入不好用、很久不更新

Ubuntu22.04下运行core_server.py报错

`Process Process-2:
Traceback (most recent call last):
File "/home/programmer/.conda/envs/capswriter/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()

File "/home/programmer/.conda/envs/capswriter/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)

File "/home/programmer/Projects/PycharmProjects/CapsWriter-Offline/util/server_init_recognizer.py", line 35, in init_recognizer
recognizer = sherpa_onnx.OfflineRecognizer.from_paraformer(

File "/home/programmer/.conda/envs/capswriter/lib/python3.8/site-packages/sherpa_onnx/offline_recognizer.py", line 184, in from_paraformer
self.recognizer = _Recognizer(recognizer_config)

RuntimeError: Failed to load model because protobuf parsing failed.
`
Python=3.8,依赖确认均安装到位,models目录下的两份模型文件也具备,无英伟达显卡,无CUDA

被360拦截了

类型:木马-HEUR/QVM202.0.998A.Malware.Gen
描述:木马是一种伪装成正常文件的恶意软件,会盗取您的帐号、密码等隐私资料。
扫描引擎:云特征引擎
文件路径:D:\插件\CapsWriter-Offline\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\start_client.exe
文件大小:338.4K (346,540 字节)
文件指纹(MD5):0079a4538495fc9b1903cfc81670f558
处理建议:隔离文件

这是正常的吗?还是下载过程被感染了

linux服务端安装CMAKE特别慢

我打算将服务端部署到centos上,然后就是在安装依赖的时候提示要安装cmake库,但是安装这个库特别特别慢,有没有解决方法

标点模型,按照代码中的安装路径安装后,显示“failed:Protobuf parsing failed.”

(keyaudio) zhangyiqing@inin:~/CapsWriter-Offline$ python core_server.py

──────────────────────────────────────────────────────────── CapsWriter Offline Server ────────────────────────────────────────────────────────────

项目地址:https://github.com/HaujetZhao/CapsWriter-Offline

当前基文件夹:/home/zhangyiqing/CapsWriter-Offline

绑定的服务地址:0.0.0.0:6016

模块加载完成

语音模型载入完成

Process Process-1:
Traceback (most recent call last):
File "/home/zhangyiqing/miniconda3/envs/keyaudio/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/zhangyiqing/miniconda3/envs/keyaudio/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/zhangyiqing/CapsWriter-Offline/util/server_init_recognizer.py", line 42, in init_recognizer
punc_model = CT_Transformer(ModelPaths.punc_model_dir, quantize=True)
File "/home/zhangyiqing/miniconda3/envs/keyaudio/lib/python3.10/site-packages/funasr_onnx/punc_bin.py", line 69, in init
self.ort_infer = OrtInferSession(model_file, device_id, intra_op_num_threads=intra_op_num_threads)
File "/home/zhangyiqing/miniconda3/envs/keyaudio/lib/python3.10/site-packages/funasr_onnx/utils/utils.py", line 209, in init
self.session = InferenceSession(model_file,
File "/home/zhangyiqing/miniconda3/envs/keyaudio/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/zhangyiqing/miniconda3/envs/keyaudio/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 452, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from models/punc_ct-transformer_cn-en/model_quant.onnx failed:Protobuf parsing failed.
^C
再见!

关于长音频的分割和去重

当我试图为一首歌生成字幕,发现两个问题
1、大量句子没有被识别到,识别到了大概有1/4,怀疑是音乐声导致的
2、歌开头有一段很长啦啦啦的吟唱,但是由于去重被压缩成两个字了,也没有单独成句,去重算法可能有点太暴力了

下面是我的生成结果
赵雷 - 南方姑娘.srt.txt

win11报错

C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\site-packages\funasr_onnx\punc_bin.py:
279: FutureWarning: In the future np.bool will be defined as the corresponding NumPy scalar.
def vad_mask(self, size, vad_pos, dtype=np.bool):
Process Process-2:
Traceback (most recent call last):
File "C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 314, in _bootstrap
self.run()
File "C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 108, in run
self._target(*self.args, **self.kwargs)
File "d:\软件工具\python小工具\CapsWriter-Offline\util\server_init_recognizer.py", line 29, in init_recognizer
from funasr_onnx import CT_Transformer
File "C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\site-packages\funasr_onnx_init
.py", line 5, in
from .punc_bin import CT_Transformer
File "C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\site-packages\funasr_onnx\punc_bin.py", line 166, in
class CT_Transformer_VadRealtime(CT_Transformer):
File "C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\site-packages\funasr_onnx\punc_bin.py", line 279, in CT_Transformer_VadRealtime
def vad_mask(self, size, vad_pos, dtype=np.bool):
File "C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy_init
.py", line 338, in getattr
raise AttributeError(former_attrs[attr])
AttributeError: module 'numpy' has no attribute 'bool'.
np.bool was a deprecated alias for the builtin bool. To avoid this error in existing code, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

可否考虑加入安装程序或脚本?

我的想法是,提供一个程序或脚本,直接把服务器和客户端注册成一个服务,实现开机自启。
每次要手动开,而且任务栏会有多余的图标,很难受

非默认按键会变更大写锁定的状态

平常都是用 shift 来输入大写, capslock 键已经替换掉了。我在 core_client.py 中修改输入按键为 ctrl、space 或者 esc,发现他们都会使得大写锁定亮灯,这时候需要手动点一次 capslock 键才能复原,可否优化一下代码使得非默认按键也能保留大写锁定的状态。

Error opening InputStream: Unanticipated host error [PaErrorCode -9999]:

两台一样的机器,下载打包文件后一台直接可以跑起来。
另一台客户端一直PaErrorCode -9999闪退。

检查了麦克风隐私输入开启,python状态正常。依赖提示已安装。
麦克风输入分别试了不同端口也不行。
查了下类似情况说pip install pyaudio的也不行。

Screenshot_2023-09-02-17-14-49-987_com miui video

win11专业版下,经常出现识别了但是不上屏,需要反复说几次识别几次才上屏

在另一台win11家庭版下安装后用了一整天都没有出现这个问题。

但是在这台台式机win11专业版下面,刚开机的时候基本没问题的,过两个小时就会出现说话被识别了但是不上屏,经常要说第二次,偶尔要第三次才能上屏。

我看了失败时候的剪贴板记录,一开始没上屏时候剪贴板里面也没有记录的,但是过了一会儿可能会出现部分记录,比如一句话说可4次,一开始一条剪贴版记录都没有,但是过了一会儿里面可能出现两三条剪贴板记录,只是时间顺序是乱套的。

两台电脑除了系统版本不一样,其他软件环境基本一致的,主要输入窗口也都是OneNote笔记本,实在搞不明白。

Snipaste_2023-11-02_21-46-04

Mac M2 标点引擎出错

电脑: Macbook Air m2
macOS: 14.1
客户端每次关闭时,报错

接客了:<websockets.legacy.server.WebSocketServerProtocol object at 0x1537d9390>
标点引擎出错:cannot access local variable 'punctuations' where it is not associated with a value

如果能静音一段时间默认停止并转换文字、等待下一段声音出现再重新开始就更好了

类似讯飞或者搜狗的 长文本 与 短对话 两种模式并存,短对话时不考虑静音直接转文字、长文本的模式时静音或无人声自动中断录音。注意到作者有实现过对视频中的无人声部分进行加速,技术实现上无人声识别应该是可以参考的。

如果能够实现不用松开按键也检测人声,就可以做到睡梦中(指躺在床上啥也不碰啥也不看)写完讲演初稿(bushi),在没有遇到你的工具之前我一直拿搜狗输入法这样做。要自己控制才停的模式怕语音过长最后转换时间太长而失败,或者出错之后不记得上文而无法修正。

我也会尝试自己去实现这个功能,只是我比较菜就算实现了功能逻辑也做不出无人声自动中断,只能做出声音大小低于阈值一段时间自动中断。

[Bug] Linux环境下无法触发关键词写入markdown文件

复现步骤:

  1. git clone ssh://[email protected]/HaujetZhao/CapsWriter-Offline
  2. cd CapsWriter-Offline
  3. sudo pip install --breaking-system-packages -r requirements-client.txt;sudo pip install --breaking-system-packages -r requirements-server.txt
  4. sudo python ./start_server.py
  5. 另起终端执行sudo python ./start_client.py
  6. 按住Caps Lock 说出包含关键词的句子
$ sudo python3 ./start_server.py


───────────────────────────────────────────────────── CapsWriter Offline Server ──────────────────────────────────────────────────────

项目地址:https://github.com/HaujetZhao/CapsWriter-Offline

当前基文件夹:/home/ayaka/CapsWriter-Offline

绑定的服务地址:0.0.0.0:6016

模块加载完成

语音模型载入完成

标点模型载入完成

模型加载耗时 32.75s

────────────────────────────────────────────────────────────── 开始服务 ──────────────────────────────────────────────────────────────

接客了:<websockets.legacy.server.WebSocketServerProtocol object at 0x7fb9be319190>

识别结果:
    日记123健康。
识别结果:
    重要123。
识别结果:
    健康,今天中午吃的大米炒饭。
识别结果:
    重要,什么问题可以用什么方法解决?
识别结果:
    你好。
ConnectionClosed...
接客了:<websockets.legacy.server.WebSocketServerProtocol object at 0x7fb9bfc963d0>

识别结果:
    学习健康重要。
ConnectionClosed...
接客了:<websockets.legacy.server.WebSocketServerProtocol object at 0x7fb9a92b9cd0>

识别结果:
    你好。
识别结果:
    健康。
识别结果:
    你好。
识别结果:
    重要健康学习检测文件,防止文键管理词每章一个,绘本省略。
识别结果:
    当识频结果以关键字开头时会被记录到相应的 md 文件中。

正常使用几小时后出现错误,需要重新打开才能使用,麦克风是正常的,而且最近频繁的出现

Exception in thread Thread-2 (record):
Traceback (most recent call last):
File "D:\YuYinShuRu\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\threading.py", line 1016, in _bootstrap_inner
self.run()
File "D:\YuYinShuRu\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "D:\YuYinShuRu\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\core_client.py", line 437, in record
data = stream.read(int(0.05 * 48000))[0]
File "D:\YuYinShuRu\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\sounddevice.py", line 1456, in read
data, overflowed = RawInputStream.read(self, frames)
File "D:\YuYinShuRu\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\sounddevice.py", line 1236, in read
_check(err)
File "D:\YuYinShuRu\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\sounddevice.py", line 2745, in _check
raise PortAudioError(errormsg, err, hosterror_info)
sounddevice.PortAudioError: Unanticipated host error [PaErrorCode -9999]: 'There is no driver installed on your system.' [MME error 6]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.