Giter VIP home page Giter VIP logo

audionotes's Introduction

AudioNotes

基于 FunASR 和 Qwen2 构建的音视频转结构化笔记系统

能够快速提取音视频的内容,并且调用大模型进行整理,成为一份结构化的markdown笔记,方便快速阅读

FunASR: https://github.com/modelscope/FunASR

Qwen2: https://ollama.com/library/qwen2

效果展示

音视频识别和整理

image

与音视频内容对话

image

使用方法

① 安装 Ollama

下载对应系统的 Ollama 安装包进行安装

https://ollama.com/download

② 拉取模型

我以 阿里的千问2 7b 为例 https://ollama.com/library/qwen2

ollama pull qwen2:7b

③ 部署服务

有两种部署方式,一种是使用 Docker 部署,另一种是本地部署

Docker部署(推荐)🐳

curl -fsSL https://github.com/harry0703/AudioNotes/raw/main/docker-compose.yml -o docker-compose.yml
docker-compose up

docker 启动后,访问 http://localhost:15433/

登录账号为 admin,密码为 admin (可以在 docker-compose.yml 文件里面修改)

本地部署 📦

需要有可访问的 postgresql 数据库

conda create -n AudioNotes python=3.10 -y
conda activate AudioNotes
git clone https://github.com/harry0703/AudioNotes.git
cd AudioNotes
pip install -r requirements.txt

.env.example 重命名为 .env,修改相关配置信息

chainlit run main.py

服务启动后,访问 http://localhost:8000/

登录账号为 admin,密码为 admin (可以在 .env 文件里面修改)

audionotes's People

Contributors

harry0703 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

audionotes's Issues

大模型对于录音的总结能力

大佬好,我发现多提问2次,大模型就不能很好的总结录音了,开始自己编造答案了,有什么办法解决吗?谢谢

识别视频时异常

环境:
本地搭建
python 3.10

操作方式,选择一个视频文件识别时报错。这个视频只有背景音乐,视频内容都是文字。

2024-07-24 11:46:31,913 - modelscope - INFO - Use user-specified model revision: master
  File "G:\Anaconda-EVN\AudioNotes\lib\site-packages\chainlit\utils.py", line 40, in wrapper
    return await user_function(**params_values)
  File "F:\CodeProject\\AudioNotes\main.py", line 72, in on_chat_start
    asr_result = await transcribe_file(file)
  File "F:\CodeProject\\AudioNotes\main.py", line 56, in transcribe_file
    result = await loop.run_in_executor(None, funasr.transcribe, uploaded_file.path)
  File "G:\Anaconda-EVN\AudioNotes\lib\asyncio\futures.py", line 285, in __await__
    yield self  # This tells Task to wait for completion.
  File "G:\Anaconda-EVN\AudioNotes\lib\asyncio\tasks.py", line 304, in __wakeup
    future.result()
  File "G:\Anaconda-EVN\AudioNotes\lib\asyncio\futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "G:\Anaconda-EVN\AudioNotes\lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "F:\CodeProject\\AudioNotes\app\services\asr_funasr.py", line 56, in transcribe
    text = res[0]['text']
IndexError: list index out of range

本地部署后, 访问不了http://localhost:8000/,拒绝连接,是什么原因啊

操作步骤:
1.ollama pull qwen2:7b 成功,http://localhost:11434/能输出Ollama is running.
2.PG数据库安装成功,自己测试数据库名,用户名,密码没有问题.
3.执行main.py输出是:
2024-07-30 15:05:08 - Loaded .env file
2024-07-30 15:05:14 - new registry table has been added: preprocessor_classes
2024-07-30 15:05:15 - new registry table has been added: adaptor_classes
2024-07-30 15:05:15 - new registry table has been added: lid_predictor_classes
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1722323117.217673 22812 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_client, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache
请问这个输出是正确的吗,为啥打不开http://localhost:8000/这个网址,打印端口号占用情况8000端口没有任何东西,

exec format error

When running docker-compose, I get:
audio_notes_webui | exec /usr/local/bin/chainlit: exec format error

本地搭建出现问题

conda create -n AudioNotes python=3.10 -y conda activate AudioNotes git clone https://github.com/harry0703/AudioNotes.git cd AudioNotes pip install -r requirements.txt

requirements.txt 这个文件代码里没提供啊?我卡在这步了

exec /usr/local/bin/chainlit: exec format error

docker compose up
...
[+] Running 4/3
 ✔ Network audionotes_audio_notes                                                                                                                       Created                0.2s
 ✔ Container audio_notes_pg                                                                                                                             Created                1.0s
 ✔ Container audio_notes_webui                                                                                                                          Created                0.0s
 ! webui The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64/v3) and no specific platform was requested                        0.0s
Attaching to audio_notes_pg, audio_notes_webui
audio_notes_pg     | The files belonging to this database system will be owned by user "postgres".
audio_notes_pg     | This user must also own the server process.
audio_notes_pg     |
audio_notes_pg     | The database cluster will be initialized with locale "en_US.utf8".
audio_notes_pg     | The default database encoding has accordingly been set to "UTF8".
audio_notes_pg     | The default text search configuration will be set to "english".
audio_notes_pg     |
audio_notes_pg     | Data page checksums are disabled.
audio_notes_pg     |
audio_notes_pg     | fixing permissions on existing directory /var/lib/postgresql/data ... ok
audio_notes_pg     | creating subdirectories ... ok
audio_notes_pg     | selecting dynamic shared memory implementation ... posix
audio_notes_pg     | selecting default max_connections ... 100
audio_notes_pg     | selecting default shared_buffers ... 128MB
audio_notes_pg     | selecting default time zone ... Etc/UTC
audio_notes_pg     | creating configuration files ... ok
audio_notes_pg     | running bootstrap script ... ok
audio_notes_webui  | exec /usr/local/bin/chainlit: exec format  @error
audio_notes_pg     | performing post-bootstrap initialization ... ok
audio_notes_pg     | initdb: warning: enabling "trust" authentication for local connections
audio_notes_pg     | initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
audio_notes_pg     | syncing data to disk ... ok
audio_notes_pg     |
audio_notes_pg     |
audio_notes_pg     | Success. You can now start the database server using:
audio_notes_pg     |
audio_notes_pg     |     pg_ctl -D /var/lib/postgresql/data -l logfile start
audio_notes_pg     |
audio_notes_pg     | waiting for server to start....2024-07-20 16:43:24.007 UTC [49] LOG:  starting PostgreSQL 16.3 (Debian 16.3-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
audio_notes_pg     | 2024-07-20 16:43:24.008 UTC [49] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
audio_notes_pg     | 2024-07-20 16:43:24.012 UTC [52] LOG:  database system was shut down at 2024-07-20 16:43:23 UTC
audio_notes_pg     | 2024-07-20 16:43:24.015 UTC [49] LOG:  database system is ready to accept connections
audio_notes_pg     |  done
audio_notes_pg     | server started
audio_notes_webui exited with code 0
audio_notes_pg     | CREATE DATABASE
audio_notes_pg     |
audio_notes_pg     |
audio_notes_pg     | /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
audio_notes_pg     |
audio_notes_pg     | waiting for server to shut down...2024-07-20 16:43:24.222 UTC [49] LOG:  received fast shutdown request
audio_notes_pg     | .2024-07-20 16:43:24.225 UTC [49] LOG:  aborting any active transactions
audio_notes_pg     | 2024-07-20 16:43:24.231 UTC [49] LOG:  background worker "logical replication launcher" (PID 55) exited with exit code 1
audio_notes_pg     | 2024-07-20 16:43:24.232 UTC [50] LOG:  shutting down
audio_notes_pg     | 2024-07-20 16:43:24.234 UTC [50] LOG:  checkpoint starting: shutdown immediate
audio_notes_pg     | 2024-07-20 16:43:24.284 UTC [50] LOG:  checkpoint complete: wrote 922 buffers (5.6%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.033 s, sync=0.012 s, total=0.053 s; sync files=301, longest=0.003 s, average=0.001 s; distance=4255 kB, estimate=4255 kB; lsn=0/1912048, redo lsn=0/1912048
audio_notes_pg     | 2024-07-20 16:43:24.297 UTC [49] LOG:  database system is shut down
audio_notes_webui  | exec /usr/local/bin/chainlit: exec format error
audio_notes_webui  | exec /usr/local/bin/chainlit: exec format error
audio_notes_pg     |  done
audio_notes_pg     | server stopped
audio_notes_pg     |
audio_notes_pg     | PostgreSQL init process complete; ready for start up.
audio_notes_pg     |
audio_notes_pg     | 2024-07-20 16:43:24.372 UTC [1] LOG:  starting PostgreSQL 16.3 (Debian 16.3-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
audio_notes_pg     | 2024-07-20 16:43:24.372 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
audio_notes_pg     | 2024-07-20 16:43:24.372 UTC [1] LOG:  listening on IPv6 address "::", port 5432
audio_notes_pg     | 2024-07-20 16:43:24.375 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
audio_notes_pg     | 2024-07-20 16:43:24.381 UTC [65] LOG:  database system was shut down at 2024-07-20 16:43:24 UTC
audio_notes_pg     | 2024-07-20 16:43:24.391 UTC [1] LOG:  database system is ready to accept connections
audio_notes_webui exited with code 0
audio_notes_webui  | exec /usr/local/bin/chainlit: exec format error
audio_notes_webui  | exec /usr/local/bin/chainlit: exec format error
audio_notes_webui  | exec /usr/local/bin/chainlit: exec format error
audio_notes_webui exited with code 1
audio_notes_webui  | exec /usr/local/bin/chainlit: exec format error
audio_notes_webui  | exec /usr/local/bin/chainlit: exec format error
audio_notes_webui  | exec /usr/local/bin/chainlit: exec format error
...
audio_notes_webui  | exec /usr/local/bin/chainlit: exec format error

decoding, empty speech

2024-08-01 11:12:51 - Your app is available at http://localhost:8000
2024-08-01 11:12:53 - Translated markdown file for zh-CN not found. Defaulting to chainlit.md.
You are using the latest version of funasr-1.1.4
2024-08-01 11:12:58 - download models from model hub: ms
2024-08-01 11:12:58,680 - modelscope - WARNING - Using the master branch is fragile, please use it with caution!
2024-08-01 11:12:58,680 - modelscope - INFO - Use user-specified model revision: master
2024-08-01 11:13:00 - Loading pretrained params from C:\Users\YUMEI.cache\modelscope\hub\iic\speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch\model.pt
2024-08-01 11:13:00 - ckpt: C:\Users\YUMEI.cache\modelscope\hub\iic\speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch\model.pt
2024-08-01 11:13:01 - scope_map: ['module.', 'None']
2024-08-01 11:13:01 - excludes: None
2024-08-01 11:13:01 - Loading ckpt: C:\Users\YUMEI.cache\modelscope\hub\iic\speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch\model.pt, status:
2024-08-01 11:13:03 - Building VAD model.
2024-08-01 11:13:03 - download models from model hub: ms
2024-08-01 11:13:04,067 - modelscope - WARNING - Using the master branch is fragile, please use it with caution!
2024-08-01 11:13:04,067 - modelscope - INFO - Use user-specified model revision: master
2024-08-01 11:13:04 - Loading pretrained params from C:\Users\YUMEI.cache\modelscope\hub\iic\speech_fsmn_vad_zh-cn-16k-common-pytorch\model.pt
2024-08-01 11:13:04 - ckpt: C:\Users\YUMEI.cache\modelscope\hub\iic\speech_fsmn_vad_zh-cn-16k-common-pytorch\model.pt
2024-08-01 11:13:04 - scope_map: ['module.', 'None']
2024-08-01 11:13:04 - excludes: None
2024-08-01 11:13:04 - Loading ckpt: C:\Users\YUMEI.cache\modelscope\hub\iic\speech_fsmn_vad_zh-cn-16k-common-pytorch\model.pt, status:
2024-08-01 11:13:04 - Building punc model.
2024-08-01 11:13:04 - download models from model hub: ms
2024-08-01 11:13:04,669 - modelscope - WARNING - Using the master branch is fragile, please use it with caution!
2024-08-01 11:13:04,670 - modelscope - INFO - Use user-specified model revision: master
Building prefix dict from the default dictionary ...
2024-08-01 11:13:06 - Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\YUMEI\AppData\Local\Temp\jieba.cache
2024-08-01 11:13:06 - Loading model from cache C:\Users\YUMEI\AppData\Local\Temp\jieba.cache
Loading model cost 0.403 seconds.
2024-08-01 11:13:07 - Loading model cost 0.403 seconds.
Prefix dict has been built successfully.
2024-08-01 11:13:07 - Prefix dict has been built successfully.
2024-08-01 11:13:23 - Loading pretrained params from C:\Users\YUMEI.cache\modelscope\hub\iic\punc_ct-transformer_cn-en-common-vocab471067-large\model.pt
2024-08-01 11:13:23 - ckpt: C:\Users\YUMEI.cache\modelscope\hub\iic\punc_ct-transformer_cn-en-common-vocab471067-large\model.pt
2024-08-01 11:13:23 - scope_map: ['module.', 'None']
2024-08-01 11:13:23 - excludes: None
2024-08-01 11:13:25 - Loading ckpt: C:\Users\YUMEI.cache\modelscope\hub\iic\punc_ct-transformer_cn-en-common-vocab471067-large\model.pt, status:
2024-08-01 11:13:25 - Building SPK model.
2024-08-01 11:13:25 - download models from model hub: ms
2024-08-01 11:13:26,765 - modelscope - WARNING - Using the master branch is fragile, please use it with caution!
2024-08-01 11:13:26,765 - modelscope - INFO - Use user-specified model revision: master
Detect model requirements, begin to install it: C:\Users\YUMEI.cache\modelscope\hub\iic\speech_campplus_sv_zh-cn_16k-common\requirements.txt
install model requirements successfully
2024-08-01 11:13:28 - Loading pretrained params from C:\Users\YUMEI.cache\modelscope\hub\iic\speech_campplus_sv_zh-cn_16k-common\campplus_cn_common.bin
2024-08-01 11:13:28 - ckpt: C:\Users\YUMEI.cache\modelscope\hub\iic\speech_campplus_sv_zh-cn_16k-common\campplus_cn_common.bin
2024-08-01 11:13:28 - scope_map: ['module.', 'None']
2024-08-01 11:13:28 - excludes: None
2024-08-01 11:13:28 - Loading ckpt: C:\Users\YUMEI.cache\modelscope\hub\iic\speech_campplus_sv_zh-cn_16k-common\campplus_cn_common.bin, status:
rtf_avg: 0.148: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.56s/it]
0%| | 0/1 [00:00<?, ?it/s]2024-08-01 11:13:30 - decoding, utt: be11b39b-1f43-48d0-95ab-ca3ad43f3b79, empty speech

GPU加速

您好,非常感谢昨天的问题解答。想再请教一下语音识别阶段怎么换成GPU的,感觉CPU还是偏慢,实用有些受局限。

新建对话会删除当前的对话信息

想开多个窗口,每个窗口一个录音那种。点击右上角「新建对话」,弹窗显示「这将清除当前消息并开始新的对话。」,是哪里操作不对吗?

8381722388839_ pic

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.