haujetzhao / capswriter-offline Goto Github PK
View Code? Open in Web Editor NEWCapsWriter 的离线版,一个好用的 PC 端的语音输入工具
CapsWriter 的离线版,一个好用的 PC 端的语音输入工具
作者您好~
我遇到运行问题,请问能否帮忙参考如何解决。打包的版本在本机运行正常,源代码直接运行时,服务端运行正常(用Client的EXE是可以正常使用的),但是Client端用源代码运行时,按Caps Lock键,终端无反应,不会出现“开始录音”。
请问可以如何排查此问题,非常感谢。
希望支持安卓系统,因为我是安卓用户。当然了,如果精力足够的话最好再支持一下苹果系统和Linux。
能否提供音频文件的语音识别功能?
绑定的服务地址:0.0.0.0:6016
Process Process-2:
Traceback (most recent call last):
File "multiprocessing\process.py", line 314, in bootstrap
File "multiprocessing\process.py", line 108, in run
File "D:\软件工具\python小工具\CapsWriter-Offline\dist\CapsWriter-Offline\util\server_init_recognizer.py", line 29, in init_recognizer
from funasr_onnx import CT_Transformer
File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module
File "funasr_onnx_init.py", line 2, in
File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module
File "funasr_onnx\paraformer_bin.py", line 10, in
ModuleNotFoundError: No module named 'torch'
打包前正常运行
报错如下
Task exception was never retrieved
future: <Task finished name='Task-14' coro=<do_recognize() done, defined at F:\TEMP\CapsWriter\core_client.py:290>
exception=ValueError('need at least one array to concatenate')>
Traceback (most recent call last):
File "F:\TEMP\CapsWriter\core_client.py", line 314, in do_recognize
samples0 = np.concatenate(samples0)
File "<__array_function__ internals>", line 180, in concatenate
ValueError: need at least one array to concatenate
Exception in thread Thread-2 (record):
Traceback (most recent call last):
File "F:\TEMP\CapsWriter\libs\threading.py", line 1016, in _bootstrap_inner
self.run()
File "F:\TEMP\CapsWriter\libs\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "F:\TEMP\CapsWriter\core_client.py", line 437, in record
data = stream.read(int(0.05 * 48000))[0]
File "F:\TEMP\CapsWriter\libs\sounddevice.py", line 1456, in read
data, overflowed = RawInputStream.read(self, frames)
File "F:\TEMP\CapsWriter\libs\sounddevice.py", line 1236, in read
_check(err)
File "F:\TEMP\CapsWriter\libs\sounddevice.py", line 2745, in _check
raise PortAudioError(errormsg, err, hosterror_info)
sounddevice.PortAudioError: Unanticipated host error [PaErrorCode -9999]: 'There is no driver installed on your system.' [MME error 6]
检测到配置文件更新,已载入 4 条中文热词
Task exception was never retrieved
future: <Task finished name='Task-16' coro=<do_recognize() done, defined at F:\TEMP\CapsWriter\core_client.py:290>
exception=ValueError('need at least one array to concatenate')>
Traceback (most recent call last):
File "F:\TEMP\CapsWriter\core_client.py", line 314, in do_recognize
samples0 = np.concatenate(samples0)
File "<__array_function__ internals>", line 180, in concatenate
ValueError: need at least one array to concatenate
Task exception was never retrieved
future: <Task finished name='Task-18' coro=<do_recognize() done, defined at F:\TEMP\CapsWriter\core_client.py:290>
exception=ValueError('need at least one array to concatenate')>
Traceback (most recent call last):
File "F:\TEMP\CapsWriter\core_client.py", line 314, in do_recognize
samples0 = np.concatenate(samples0)
File "<__array_function__ internals>", line 180, in concatenate
ValueError: need at least one array to concatenate
现在已经利用nssm设置了开机自启动+隐藏cmd窗口,在输入的时候有些不踏实,所以希望能够加入输入状态提示;
按下启动热键 开始语音识别时,获取当前光标位置
,并在该位置 显示一个gif图片,这样更加有确认感:
标点引擎出错:local variable 'punctuations' referenced before assignment
能否单独设置一个快捷键,当按下这个快捷键时,只单独识别英文,从而提高英文的识别准确率.
官方改包,原问题终结
So I can install it on my NAS and every device in local network could easily access it, save tons of space and power.
初次运行,运行start-client开始输入后显示“服务端未连接,无法发送”咋办呀。另外问下各位的模型加载耗时几秒
如題 有沒有辦法讓他能夠或者是更新一個輸出繁體字的功能呢><?
台灣的用起來會方便一些
感謝作者 辛苦了
目前是输入需要一直按着capslk录音,松开结束录音。但是如果一段话有点长,一直按着比较不方便。
因此能否改为,连点两下capslk开始录音,再连点两下结束录音呢?
谢谢大佬😙
连续读取 11 12 13 14 15等类似数时,被识别为类似112345的数字
convert_value_num将"十一十二十三十四十五"认定为一个数字
已启用符号库,没有正确的将文本转换为类似 "十一、十二、十三、十四、十五"
ERROR: Could not find a version that satisfies the requirement sherpa_onnx==1.5.4 (from versions: 1.8.9, 1.8.10, 1.8.11, 1.8.14, 1.9.0, 1.9.1, 1.9.3, 1.9.4)
ERROR: No matching distribution found for sherpa_onnx==1.5.4
我的环境:Windows10,Python3.10.0,
测试了我自己的清华源和 https://mirror.sjtu.edu.cn/pypi/web/simple,都是一样的报错。
对应的requirements文件:
rich
websockets
numpy==1.23.3
typeguard==2.13.3
sherpa_onnx==1.5.4
funasr_onnx==0.0.6
kaldi-native-fbank==1.17
不过,去掉版本指定之后,安装依赖、执行 python 程序,可用。
离线版 exe 也可用,而且 exe 特别丝滑,感谢作者!
当suppress为true时,无论restore_key是true还是false都不会生效
因为我任务栏的东西比较多,如果再增加两个窗口的话就太占地方了,希望可以隐藏到托盘,这样可以节约空间
开启英雄联盟客户端的时候,大小写按键会失去作用
Traceback (most recent call last):
File "start_server.py", line 10, in
File "D:\BaiduNetdiskDownload\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\core_server.py", line 207, in init
asyncio.run(main())
File "D:\BaiduNetdiskDownload\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "D:\BaiduNetdiskDownload\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\asyncio\base_events.py", line 649, in run_until_complete
return future.result()
File "D:\BaiduNetdiskDownload\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\core_server.py", line 170, in main
recognizer = sherpa_onnx.OfflineRecognizer.from_paraformer(
File "D:\BaiduNetdiskDownload\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\sherpa_onnx\offline_recognizer.py", line 158, in from_paraformer
self.recognizer = _Recognizer(recognizer_config)
RuntimeError: Failed to load model with error: D:\a_work\1\s\onnxruntime\core/graph/model_load_utils.h:56 onnxruntime::model_load_utils::ValidateOpsetForDomain ONNX Runtime only guarantees support for models stamped with official released onnx opset versions. Opset 19 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain com.ms.internal.nhwc is till opset 18.
[8260] Failed to execute script 'start_server' due to unhandled exception!
可否将 model 跑在 GPU 上。
在linux mint下安装client时候报错:
pip install -r requirements-client.txt -i https://mirror.sjtu.edu.cn/pypi/web/simple
Defaulting to user installation because normal site-packages is not writeable
ERROR: Could not open requirements file: [Errno 2] 没有那个文件或目录: 'requirements-client.txt'
WIN7(py3.8)下使用客户端会出现无法录音的情况,按下大写锁定无反应,WIN10正常,大佬是否可以帮忙打包一个WIN764的客户端
现在关闭 标点符号引擎 时,无论是连续大段说话还是启启停停,都是连着的。
在部分场景,比如即时通讯或快速录入供自己修改(比如录入字幕)等场景,并不需要完善的标点符号。
希望能根据停顿间隔添加逗号(或空格)。甚至停顿更久些时,添加句号,或换行。
换行 在聊天软件、录入视频字幕这些场景可以直接使用了,省略后续操作。
另外,希望能将两个模型分开打包,就可以不下载 标点符号引擎了 ,方便部分人试用、临时用,某些带宽吃紧或有便携需求的场景。
最后,感谢作者的工作,试用了效果很不错,是我一直想要的软件。
https://translate.alibaba.com/#core-translation
识别后老是有时间,例子=
2023-08-11 19:57:27
体验行业领先的智能翻译,支持全球214种语言生根,多个垂直领域。
阿里语音识别的问题=浏览器在线使用=火狐浏览器的驱动老是搞错,搞成语音鼠标的驱动,搞的每回用都要叉掉权限刷新1次才行
最近同时在使用 https://tingwu.aliyun.com/home 录好几次都没录到音频
咪鼠语音鼠 客户端一直没更新=发邮件之类的都没人回,老是崩溃=搞的根本用不了
讯飞、搜狗输入法电脑客户端语音输入不好用、很久不更新
`Process Process-2:
Traceback (most recent call last):
File "/home/programmer/.conda/envs/capswriter/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/programmer/.conda/envs/capswriter/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/programmer/Projects/PycharmProjects/CapsWriter-Offline/util/server_init_recognizer.py", line 35, in init_recognizer
recognizer = sherpa_onnx.OfflineRecognizer.from_paraformer(
File "/home/programmer/.conda/envs/capswriter/lib/python3.8/site-packages/sherpa_onnx/offline_recognizer.py", line 184, in from_paraformer
self.recognizer = _Recognizer(recognizer_config)
RuntimeError: Failed to load model because protobuf parsing failed.
`
Python=3.8,依赖确认均安装到位,models目录下的两份模型文件也具备,无英伟达显卡,无CUDA
类型:木马-HEUR/QVM202.0.998A.Malware.Gen
描述:木马是一种伪装成正常文件的恶意软件,会盗取您的帐号、密码等隐私资料。
扫描引擎:云特征引擎
文件路径:D:\插件\CapsWriter-Offline\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\start_client.exe
文件大小:338.4K (346,540 字节)
文件指纹(MD5):0079a4538495fc9b1903cfc81670f558
处理建议:隔离文件
这是正常的吗?还是下载过程被感染了
我打算将服务端部署到centos上,然后就是在安装依赖的时候提示要安装cmake库,但是安装这个库特别特别慢,有没有解决方法
(keyaudio) zhangyiqing@inin:~/CapsWriter-Offline$ python core_server.py
──────────────────────────────────────────────────────────── CapsWriter Offline Server ────────────────────────────────────────────────────────────
项目地址:https://github.com/HaujetZhao/CapsWriter-Offline
当前基文件夹:/home/zhangyiqing/CapsWriter-Offline
绑定的服务地址:0.0.0.0:6016
模块加载完成
语音模型载入完成
Process Process-1:
Traceback (most recent call last):
File "/home/zhangyiqing/miniconda3/envs/keyaudio/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/zhangyiqing/miniconda3/envs/keyaudio/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/zhangyiqing/CapsWriter-Offline/util/server_init_recognizer.py", line 42, in init_recognizer
punc_model = CT_Transformer(ModelPaths.punc_model_dir, quantize=True)
File "/home/zhangyiqing/miniconda3/envs/keyaudio/lib/python3.10/site-packages/funasr_onnx/punc_bin.py", line 69, in init
self.ort_infer = OrtInferSession(model_file, device_id, intra_op_num_threads=intra_op_num_threads)
File "/home/zhangyiqing/miniconda3/envs/keyaudio/lib/python3.10/site-packages/funasr_onnx/utils/utils.py", line 209, in init
self.session = InferenceSession(model_file,
File "/home/zhangyiqing/miniconda3/envs/keyaudio/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/zhangyiqing/miniconda3/envs/keyaudio/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 452, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from models/punc_ct-transformer_cn-en/model_quant.onnx failed:Protobuf parsing failed.
^C
再见!
大佬有没有计划建立一个docker服务端
当我试图为一首歌生成字幕,发现两个问题
1、大量句子没有被识别到,识别到了大概有1/4,怀疑是音乐声导致的
2、歌开头有一段很长啦啦啦的吟唱,但是由于去重被压缩成两个字了,也没有单独成句,去重算法可能有点太暴力了
下面是我的生成结果
赵雷 - 南方姑娘.srt.txt
C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\site-packages\funasr_onnx\punc_bin.py:
279: FutureWarning: In the future np.bool
will be defined as the corresponding NumPy scalar.
def vad_mask(self, size, vad_pos, dtype=np.bool):
Process Process-2:
Traceback (most recent call last):
File "C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 314, in _bootstrap
self.run()
File "C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 108, in run
self._target(*self.args, **self.kwargs)
File "d:\软件工具\python小工具\CapsWriter-Offline\util\server_init_recognizer.py", line 29, in init_recognizer
from funasr_onnx import CT_Transformer
File "C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\site-packages\funasr_onnx_init.py", line 5, in
from .punc_bin import CT_Transformer
File "C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\site-packages\funasr_onnx\punc_bin.py", line 166, in
class CT_Transformer_VadRealtime(CT_Transformer):
File "C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\site-packages\funasr_onnx\punc_bin.py", line 279, in CT_Transformer_VadRealtime
def vad_mask(self, size, vad_pos, dtype=np.bool):
File "C:\Users\wang\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy_init.py", line 338, in getattr
raise AttributeError(former_attrs[attr])
AttributeError: module 'numpy' has no attribute 'bool'.
np.bool
was a deprecated alias for the builtin bool
. To avoid this error in existing code, use bool
by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_
here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
我的想法是,提供一个程序或脚本,直接把服务器和客户端注册成一个服务,实现开机自启。
每次要手动开,而且任务栏会有多余的图标,很难受
平常都是用 shift 来输入大写, capslock 键已经替换掉了。我在 core_client.py 中修改输入按键为 ctrl、space 或者 esc,发现他们都会使得大写锁定亮灯,这时候需要手动点一次 capslock 键才能复原,可否优化一下代码使得非默认按键也能保留大写锁定的状态。
电脑: Macbook Air m2
macOS: 14.1
客户端每次关闭时,报错
接客了:<websockets.legacy.server.WebSocketServerProtocol object at 0x1537d9390>
标点引擎出错:cannot access local variable 'punctuations' where it is not associated with a value
类似讯飞或者搜狗的 长文本 与 短对话 两种模式并存,短对话时不考虑静音直接转文字、长文本的模式时静音或无人声自动中断录音。注意到作者有实现过对视频中的无人声部分进行加速,技术实现上无人声识别应该是可以参考的。
如果能够实现不用松开按键也检测人声,就可以做到睡梦中(指躺在床上啥也不碰啥也不看)写完讲演初稿(bushi),在没有遇到你的工具之前我一直拿搜狗输入法这样做。要自己控制才停的模式怕语音过长最后转换时间太长而失败,或者出错之后不记得上文而无法修正。
我也会尝试自己去实现这个功能,只是我比较菜就算实现了功能逻辑也做不出无人声自动中断,只能做出声音大小低于阈值一段时间自动中断。
能否加一个GUI,一键启动客户端和服务端,并且可以后台运行(不显示在任务栏,可以显示托盘图标)
无论是独立APP还是Rime等的输入方案,技术上能否实现?毕竟移动端实现的需求显然高于PC端。
我是Windows 10系统,更新了最新的版本和模型。
我注意到新版的标点符号模型,载入时间,是原来的2.5倍。老版载入需要20秒,新版需要50秒。
请问是否正常?
我使用过程中,实时修改了热词txt文件,中英文的都修改了。
然后语音输入就不能识别了,所有的语音都只能识别成“hello,好的好的好的好的”之类的。
我重启和重新安装了软件,都没有效果,不知道是哪里出了问题。
复现步骤:
git clone ssh://[email protected]/HaujetZhao/CapsWriter-Offline
cd CapsWriter-Offline
sudo pip install --breaking-system-packages -r requirements-client.txt;sudo pip install --breaking-system-packages -r requirements-server.txt
sudo python ./start_server.py
sudo python ./start_client.py
Caps Lock
说出包含关键词的句子$ sudo python3 ./start_server.py
───────────────────────────────────────────────────── CapsWriter Offline Server ──────────────────────────────────────────────────────
项目地址:https://github.com/HaujetZhao/CapsWriter-Offline
当前基文件夹:/home/ayaka/CapsWriter-Offline
绑定的服务地址:0.0.0.0:6016
模块加载完成
语音模型载入完成
标点模型载入完成
模型加载耗时 32.75s
────────────────────────────────────────────────────────────── 开始服务 ──────────────────────────────────────────────────────────────
接客了:<websockets.legacy.server.WebSocketServerProtocol object at 0x7fb9be319190>
识别结果:
日记123健康。
识别结果:
重要123。
识别结果:
健康,今天中午吃的大米炒饭。
识别结果:
重要,什么问题可以用什么方法解决?
识别结果:
你好。
ConnectionClosed...
接客了:<websockets.legacy.server.WebSocketServerProtocol object at 0x7fb9bfc963d0>
识别结果:
学习健康重要。
ConnectionClosed...
接客了:<websockets.legacy.server.WebSocketServerProtocol object at 0x7fb9a92b9cd0>
识别结果:
你好。
识别结果:
健康。
识别结果:
你好。
识别结果:
重要健康学习检测文件,防止文键管理词每章一个,绘本省略。
识别结果:
当识频结果以关键字开头时会被记录到相应的 md 文件中。
我留学第二语言是德语,想问一下大佬该如何额外添加德语的语言模型?
请问和在线阿里云的版本相比,哪个识别效果更好呢?
Exception in thread Thread-2 (record):
Traceback (most recent call last):
File "D:\YuYinShuRu\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\threading.py", line 1016, in _bootstrap_inner
self.run()
File "D:\YuYinShuRu\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "D:\YuYinShuRu\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\core_client.py", line 437, in record
data = stream.read(int(0.05 * 48000))[0]
File "D:\YuYinShuRu\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\sounddevice.py", line 1456, in read
data, overflowed = RawInputStream.read(self, frames)
File "D:\YuYinShuRu\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\sounddevice.py", line 1236, in read
_check(err)
File "D:\YuYinShuRu\CapsWriter-Offline-Win10-64-Pyinstaller-Without-Models\libs\sounddevice.py", line 2745, in _check
raise PortAudioError(errormsg, err, hosterror_info)
sounddevice.PortAudioError: Unanticipated host error [PaErrorCode -9999]: 'There is no driver installed on your system.' [MME error 6]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.