同步github官方更新截止Commits on Apr 18, 2025
a9c36c76e569107b5a39b3de8afd6e016b24d662
Showing
23 changed files
with
1003 additions
and
99 deletions
README-EN.md
0 → 100644
1 | +Real-time interactive streaming digital human enables synchronous audio and video dialogue. It can basically achieve commercial effects. | ||
2 | + | ||
3 | +[Effect of wav2lip](https://www.bilibili.com/video/BV1scwBeyELA/) | [Effect of ernerf](https://www.bilibili.com/video/BV1G1421z73r/) | [Effect of musetalk](https://www.bilibili.com/video/BV1gm421N7vQ/) | ||
4 | + | ||
5 | +## News | ||
6 | +- December 8, 2024: Improved multi-concurrency, and the video memory does not increase with the number of concurrent connections. | ||
7 | +- December 21, 2024: Added model warm-up for wav2lip and musetalk to solve the problem of stuttering during the first inference. Thanks to [@heimaojinzhangyz](https://github.com/heimaojinzhangyz) | ||
8 | +- December 28, 2024: Added the digital human model Ultralight-Digital-Human. Thanks to [@lijihua2017](https://github.com/lijihua2017) | ||
9 | +- February 7, 2025: Added fish-speech tts | ||
10 | +- February 21, 2025: Added the open-source model wav2lip256. Thanks to @不蠢不蠢 | ||
11 | +- March 2, 2025: Added Tencent's speech synthesis service | ||
12 | +- March 16, 2025: Supports mac gpu inference. Thanks to [@GcsSloop](https://github.com/GcsSloop) | ||
13 | + | ||
14 | +## Features | ||
15 | +1. Supports multiple digital human models: ernerf, musetalk, wav2lip, Ultralight-Digital-Human | ||
16 | +2. Supports voice cloning | ||
17 | +3. Supports interrupting the digital human while it is speaking | ||
18 | +4. Supports full-body video stitching | ||
19 | +5. Supports rtmp and webrtc | ||
20 | +6. Supports video arrangement: Play custom videos when not speaking | ||
21 | +7. Supports multi-concurrency | ||
22 | + | ||
23 | +## 1. Installation | ||
24 | + | ||
25 | +Tested on Ubuntu 20.04, Python 3.10, Pytorch 1.12 and CUDA 11.3 | ||
26 | + | ||
27 | +### 1.1 Install dependency | ||
28 | + | ||
29 | +```bash | ||
30 | +conda create -n nerfstream python=3.10 | ||
31 | +conda activate nerfstream | ||
32 | +# If the cuda version is not 11.3 (confirm the version by running nvidia-smi), install the corresponding version of pytorch according to <https://pytorch.org/get-started/previous-versions/> | ||
33 | +conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch | ||
34 | +pip install -r requirements.txt | ||
35 | +# If you need to train the ernerf model, install the following libraries | ||
36 | +# pip install "git+https://github.com/facebookresearch/pytorch3d.git" | ||
37 | +# pip install tensorflow-gpu==2.8.0 | ||
38 | +# pip install --upgrade "protobuf<=3.20.1" | ||
39 | +``` | ||
40 | +Common installation issues [FAQ](https://livetalking-doc.readthedocs.io/en/latest/faq.html) | ||
41 | +For setting up the linux cuda environment, you can refer to this article https://zhuanlan.zhihu.com/p/674972886 | ||
42 | + | ||
43 | + | ||
44 | +## 2. Quick Start | ||
45 | +- Download the models | ||
46 | +Quark Cloud Disk <https://pan.quark.cn/s/83a750323ef0> | ||
47 | +Google Drive <https://drive.google.com/drive/folders/1FOC_MD6wdogyyX_7V1d4NDIO7P9NlSAJ?usp=sharing> | ||
48 | +Copy wav2lip256.pth to the models folder of this project and rename it to wav2lip.pth; | ||
49 | +Extract wav2lip256_avatar1.tar.gz and copy the entire folder to the data/avatars folder of this project. | ||
50 | +- Run | ||
51 | +python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1 | ||
52 | +Open http://serverip:8010/webrtcapi.html in a browser. First click'start' to play the digital human video; then enter any text in the text box and submit it. The digital human will broadcast this text. | ||
53 | +<font color=red>The server side needs to open ports tcp:8010; udp:1-65536</font> | ||
54 | +If you need to purchase a high-definition wav2lip model for commercial use, [Link](https://livetalking-doc.readthedocs.io/zh-cn/latest/service.html#wav2lip). | ||
55 | + | ||
56 | +- Quick experience | ||
57 | +<https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking1.3> Create an instance with this image to run it. | ||
58 | + | ||
59 | +If you can't access huggingface, before running | ||
60 | +``` | ||
61 | +export HF_ENDPOINT=https://hf-mirror.com | ||
62 | +``` | ||
63 | + | ||
64 | + | ||
65 | +## 3. More Usage | ||
66 | +Usage instructions: <https://livetalking-doc.readthedocs.io/en/latest> | ||
67 | + | ||
68 | +## 4. Docker Run | ||
69 | +No need for the previous installation, just run directly. | ||
70 | +``` | ||
71 | +docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/codewithgpu2/lipku-metahuman-stream:2K9qaMBu8v | ||
72 | +``` | ||
73 | +The code is in /root/metahuman-stream. First, git pull to get the latest code, and then execute the commands as in steps 2 and 3. | ||
74 | + | ||
75 | +The following images are provided: | ||
76 | +- autodl image: <https://www.codewithgpu.com/i/lipku/metahuman-stream/base> | ||
77 | +[autodl Tutorial](https://livetalking-doc.readthedocs.io/en/latest/autodl/README.html) | ||
78 | +- ucloud image: <https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_livetalking1.3> | ||
79 | +Any port can be opened, and there is no need to deploy an srs service additionally. | ||
80 | +[ucloud Tutorial](https://livetalking-doc.readthedocs.io/en/latest/ucloud/ucloud.html) | ||
81 | + | ||
82 | + | ||
83 | +## 5. TODO | ||
84 | +- [x] Added chatgpt to enable digital human dialogue | ||
85 | +- [x] Voice cloning | ||
86 | +- [x] Replace the digital human with a video when it is silent | ||
87 | +- [x] MuseTalk | ||
88 | +- [x] Wav2Lip | ||
89 | +- [x] Ultralight-Digital-Human | ||
90 | + | ||
91 | +--- | ||
92 | +If this project is helpful to you, please give it a star. Friends who are interested are also welcome to join in and improve this project together. | ||
93 | +* Knowledge Planet: https://t.zsxq.com/7NMyO, where high-quality common problems, best practice experiences, and problem solutions are accumulated. | ||
94 | +* WeChat Official Account: Digital Human Technology | ||
95 | + |
1 | -Real time interactive streaming digital human, realize audio video synchronous dialogue. It can basically achieve commercial effects. | 1 | +[English](./README-EN.md) | 中文版 |
2 | 实时交互流式数字人,实现音视频同步对话。基本可以达到商用效果 | 2 | 实时交互流式数字人,实现音视频同步对话。基本可以达到商用效果 |
3 | +[wav2lip效果](https://www.bilibili.com/video/BV1scwBeyELA/) | [ernerf效果](https://www.bilibili.com/video/BV1G1421z73r/) | [musetalk效果](https://www.bilibili.com/video/BV1gm421N7vQ/) | ||
3 | 4 | ||
4 | -[ernerf 效果](https://www.bilibili.com/video/BV1PM4m1y7Q2/) [musetalk 效果](https://www.bilibili.com/video/BV1gm421N7vQ/) [wav2lip 效果](https://www.bilibili.com/video/BV1Bw4m1e74P/) | ||
5 | - | ||
6 | -## 为避免与 3d 数字人混淆,原项目 metahuman-stream 改名为 livetalking,原有链接地址继续可用 | 5 | +## 为避免与3d数字人混淆,原项目metahuman-stream改名为livetalking,原有链接地址继续可用 |
7 | 6 | ||
8 | ## News | 7 | ## News |
9 | - | ||
10 | - 2024.12.8 完善多并发,显存不随并发数增加 | 8 | - 2024.12.8 完善多并发,显存不随并发数增加 |
11 | -- 2024.12.21 添加 wav2lip、musetalk 模型预热,解决第一次推理卡顿问题。感谢@heimaojinzhangyz | ||
12 | -- 2024.12.28 添加数字人模型 Ultralight-Digital-Human。 感谢@lijihua2017 | ||
13 | -- 2025.2.7 添加 fish-speech tts | ||
14 | -- 2025.2.21 添加 wav2lip256 开源模型 感谢@不蠢不蠢 | 9 | +- 2024.12.21 添加wav2lip、musetalk模型预热,解决第一次推理卡顿问题。感谢[@heimaojinzhangyz](https://github.com/heimaojinzhangyz) |
10 | +- 2024.12.28 添加数字人模型Ultralight-Digital-Human。 感谢[@lijihua2017](https://github.com/lijihua2017) | ||
11 | +- 2025.2.7 添加fish-speech tts | ||
12 | +- 2025.2.21 添加wav2lip256开源模型 感谢@不蠢不蠢 | ||
15 | - 2025.3.2 添加腾讯语音合成服务 | 13 | - 2025.3.2 添加腾讯语音合成服务 |
14 | +- 2025.3.16 支持mac gpu推理,感谢[@GcsSloop](https://github.com/GcsSloop) | ||
16 | 15 | ||
17 | ## Features | 16 | ## Features |
18 | - | ||
19 | 1. 支持多种数字人模型: ernerf、musetalk、wav2lip、Ultralight-Digital-Human | 17 | 1. 支持多种数字人模型: ernerf、musetalk、wav2lip、Ultralight-Digital-Human |
20 | 2. 支持声音克隆 | 18 | 2. 支持声音克隆 |
21 | 3. 支持数字人说话被打断 | 19 | 3. 支持数字人说话被打断 |
22 | 4. 支持全身视频拼接 | 20 | 4. 支持全身视频拼接 |
23 | -5. 支持 rtmp 和 webrtc | 21 | +5. 支持rtmp和webrtc |
24 | 6. 支持视频编排:不说话时播放自定义视频 | 22 | 6. 支持视频编排:不说话时播放自定义视频 |
25 | 7. 支持多并发 | 23 | 7. 支持多并发 |
26 | 24 | ||
@@ -41,59 +39,53 @@ pip install -r requirements.txt | @@ -41,59 +39,53 @@ pip install -r requirements.txt | ||
41 | # pip install tensorflow-gpu==2.8.0 | 39 | # pip install tensorflow-gpu==2.8.0 |
42 | # pip install --upgrade "protobuf<=3.20.1" | 40 | # pip install --upgrade "protobuf<=3.20.1" |
43 | ``` | 41 | ``` |
44 | - | ||
45 | 安装常见问题[FAQ](https://livetalking-doc.readthedocs.io/en/latest/faq.html) | 42 | 安装常见问题[FAQ](https://livetalking-doc.readthedocs.io/en/latest/faq.html) |
46 | -linux cuda 环境搭建可以参考这篇文章 https://zhuanlan.zhihu.com/p/674972886 | 43 | +linux cuda环境搭建可以参考这篇文章 https://zhuanlan.zhihu.com/p/674972886 |
47 | 44 | ||
48 | -## 2. Quick Start | ||
49 | 45 | ||
46 | +## 2. Quick Start | ||
50 | - 下载模型 | 47 | - 下载模型 |
51 | - 百度云盘<https://pan.baidu.com/s/1yOsQ06-RIDTJd3HFCw4wtA> 密码: ltua | 48 | + 夸克云盘<https://pan.quark.cn/s/83a750323ef0> |
52 | GoogleDriver <https://drive.google.com/drive/folders/1FOC_MD6wdogyyX_7V1d4NDIO7P9NlSAJ?usp=sharing> | 49 | GoogleDriver <https://drive.google.com/drive/folders/1FOC_MD6wdogyyX_7V1d4NDIO7P9NlSAJ?usp=sharing> |
53 | - 将 wav2lip256.pth 拷到本项目的 models 下, 重命名为 wav2lip.pth; | ||
54 | - 将 wav2lip256_avatar1.tar.gz 解压后整个文件夹拷到本项目的 data/avatars 下 | 50 | + 将wav2lip256.pth拷到本项目的models下, 重命名为wav2lip.pth; |
51 | + 将wav2lip256_avatar1.tar.gz解压后整个文件夹拷到本项目的data/avatars下 | ||
55 | - 运行 | 52 | - 运行 |
56 | python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1 --preload 2 | 53 | python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1 --preload 2 |
57 | - | ||
58 | 使用 GPU 启动模特 3 号:python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar3 --preload 2 | 54 | 使用 GPU 启动模特 3 号:python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar3 --preload 2 |
59 | - 用浏览器打开 http://serverip:8010/webrtcapi.html , 先点‘start',播放数字人视频;然后在文本框输入任意文字,提交。数字人播报该段文字 | 55 | + |
56 | +用浏览器打开http://serverip:8010/webrtcapi.html , 先点‘start',播放数字人视频;然后在文本框输入任意文字,提交。数字人播报该段文字 | ||
60 | <font color=red>服务端需要开放端口 tcp:8010; udp:1-65536 </font> | 57 | <font color=red>服务端需要开放端口 tcp:8010; udp:1-65536 </font> |
61 | - 如果需要商用高清 wav2lip 模型,可以与我联系购买 | 58 | + 如果需要商用高清wav2lip模型,[链接](https://livetalking-doc.readthedocs.io/zh-cn/latest/service.html#wav2lip) |
62 | 59 | ||
63 | - 快速体验 | 60 | - 快速体验 |
64 | <https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking1.3> 用该镜像创建实例即可运行成功 | 61 | <https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking1.3> 用该镜像创建实例即可运行成功 |
65 | 62 | ||
66 | -如果访问不了 huggingface,在运行前 | ||
67 | - | 63 | +如果访问不了huggingface,在运行前 |
68 | ``` | 64 | ``` |
69 | export HF_ENDPOINT=https://hf-mirror.com | 65 | export HF_ENDPOINT=https://hf-mirror.com |
70 | ``` | 66 | ``` |
71 | 67 | ||
72 | -## 3. More Usage | ||
73 | 68 | ||
69 | +## 3. More Usage | ||
74 | 使用说明: <https://livetalking-doc.readthedocs.io/> | 70 | 使用说明: <https://livetalking-doc.readthedocs.io/> |
75 | 71 | ||
76 | ## 4. Docker Run | 72 | ## 4. Docker Run |
77 | - | ||
78 | 不需要前面的安装,直接运行。 | 73 | 不需要前面的安装,直接运行。 |
79 | - | ||
80 | ``` | 74 | ``` |
81 | docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/codewithgpu2/lipku-metahuman-stream:2K9qaMBu8v | 75 | docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/codewithgpu2/lipku-metahuman-stream:2K9qaMBu8v |
82 | ``` | 76 | ``` |
83 | - | ||
84 | -代码在/root/metahuman-stream,先 git pull 拉一下最新代码,然后执行命令同第 2、3 步 | 77 | +代码在/root/metahuman-stream,先git pull拉一下最新代码,然后执行命令同第2、3步 |
85 | 78 | ||
86 | 提供如下镜像 | 79 | 提供如下镜像 |
80 | +- autodl镜像: <https://www.codewithgpu.com/i/lipku/metahuman-stream/base> | ||
81 | + [autodl教程](https://livetalking-doc.readthedocs.io/en/latest/autodl/README.html) | ||
82 | +- ucloud镜像: <https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_livetalking1.3> | ||
83 | + 可以开放任意端口,不需要另外部署srs服务. | ||
84 | + [ucloud教程](https://livetalking-doc.readthedocs.io/en/latest/ucloud/ucloud.html) | ||
87 | 85 | ||
88 | -- autodl 镜像: <https://www.codewithgpu.com/i/lipku/metahuman-stream/base> | ||
89 | - [autodl 教程](https://livetalking-doc.readthedocs.io/en/latest/autodl/README.html) | ||
90 | -- ucloud 镜像: <https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_livetalking1.3> | ||
91 | - 可以开放任意端口,不需要另外部署 srs 服务. | ||
92 | - [ucloud 教程](https://livetalking-doc.readthedocs.io/en/latest/ucloud/ucloud.html) | ||
93 | 86 | ||
94 | ## 5. TODO | 87 | ## 5. TODO |
95 | - | ||
96 | -- [x] 添加 chatgpt 实现数字人对话 | 88 | +- [x] 添加chatgpt实现数字人对话 |
97 | - [x] 声音克隆 | 89 | - [x] 声音克隆 |
98 | - [x] 数字人静音时用一段视频代替 | 90 | - [x] 数字人静音时用一段视频代替 |
99 | - [x] MuseTalk | 91 | - [x] MuseTalk |
@@ -101,9 +93,8 @@ docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/c | @@ -101,9 +93,8 @@ docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/c | ||
101 | - [x] Ultralight-Digital-Human | 93 | - [x] Ultralight-Digital-Human |
102 | 94 | ||
103 | --- | 95 | --- |
96 | +如果本项目对你有帮助,帮忙点个star。也欢迎感兴趣的朋友一起来完善该项目. | ||
97 | +* 知识星球: https://t.zsxq.com/7NMyO 沉淀高质量常见问题、最佳实践经验、问题解答 | ||
98 | +* 微信公众号:数字人技术 | ||
99 | +  | ||
104 | 100 | ||
105 | -如果本项目对你有帮助,帮忙点个 star。也欢迎感兴趣的朋友一起来完善该项目. | ||
106 | - | ||
107 | -- 知识星球: https://t.zsxq.com/7NMyO 沉淀高质量常见问题、最佳实践经验、问题解答 | ||
108 | -- 微信公众号:数字人技术 | ||
109 | -  |
@@ -201,7 +201,7 @@ async def set_audiotype(request): | @@ -201,7 +201,7 @@ async def set_audiotype(request): | ||
201 | params = await request.json() | 201 | params = await request.json() |
202 | 202 | ||
203 | sessionid = params.get('sessionid',0) | 203 | sessionid = params.get('sessionid',0) |
204 | - nerfreals[sessionid].set_curr_state(params['audiotype'],params['reinit']) | 204 | + nerfreals[sessionid].set_custom_state(params['audiotype'],params['reinit']) |
205 | 205 | ||
206 | return web.Response( | 206 | return web.Response( |
207 | content_type="application/json", | 207 | content_type="application/json", |
@@ -495,6 +495,8 @@ if __name__ == '__main__': | @@ -495,6 +495,8 @@ if __name__ == '__main__': | ||
495 | elif opt.transport=='rtcpush': | 495 | elif opt.transport=='rtcpush': |
496 | pagename='rtcpushapi.html' | 496 | pagename='rtcpushapi.html' |
497 | logger.info('start http server; http://<serverip>:'+str(opt.listenport)+'/'+pagename) | 497 | logger.info('start http server; http://<serverip>:'+str(opt.listenport)+'/'+pagename) |
498 | + logger.info('如果使用webrtc,推荐访问webrtc集成前端: http://<serverip>:'+str(opt.listenport)+'/dashboard.html') | ||
499 | + | ||
498 | def run_server(runner): | 500 | def run_server(runner): |
499 | loop = asyncio.new_event_loop() | 501 | loop = asyncio.new_event_loop() |
500 | asyncio.set_event_loop(loop) | 502 | asyncio.set_event_loop(loop) |
@@ -35,7 +35,7 @@ import soundfile as sf | @@ -35,7 +35,7 @@ import soundfile as sf | ||
35 | import av | 35 | import av |
36 | from fractions import Fraction | 36 | from fractions import Fraction |
37 | 37 | ||
38 | -from ttsreal import EdgeTTS,VoitsTTS,XTTS,CosyVoiceTTS,FishTTS,TencentTTS | 38 | +from ttsreal import EdgeTTS,SovitsTTS,XTTS,CosyVoiceTTS,FishTTS,TencentTTS |
39 | from logger import logger | 39 | from logger import logger |
40 | 40 | ||
41 | from tqdm import tqdm | 41 | from tqdm import tqdm |
@@ -57,7 +57,7 @@ class BaseReal: | @@ -57,7 +57,7 @@ class BaseReal: | ||
57 | if opt.tts == "edgetts": | 57 | if opt.tts == "edgetts": |
58 | self.tts = EdgeTTS(opt,self) | 58 | self.tts = EdgeTTS(opt,self) |
59 | elif opt.tts == "gpt-sovits": | 59 | elif opt.tts == "gpt-sovits": |
60 | - self.tts = VoitsTTS(opt,self) | 60 | + self.tts = SovitsTTS(opt,self) |
61 | elif opt.tts == "xtts": | 61 | elif opt.tts == "xtts": |
62 | self.tts = XTTS(opt,self) | 62 | self.tts = XTTS(opt,self) |
63 | elif opt.tts == "cosyvoice": | 63 | elif opt.tts == "cosyvoice": |
@@ -262,8 +262,8 @@ class BaseReal: | @@ -262,8 +262,8 @@ class BaseReal: | ||
262 | self.curr_state = 1 #当前视频不循环播放,切换到静音状态 | 262 | self.curr_state = 1 #当前视频不循环播放,切换到静音状态 |
263 | return stream | 263 | return stream |
264 | 264 | ||
265 | - def set_curr_state(self,audiotype, reinit): | ||
266 | - print('set_curr_state:',audiotype) | 265 | + def set_custom_state(self,audiotype, reinit=True): |
266 | + print('set_custom_state:',audiotype) | ||
267 | self.curr_state = audiotype | 267 | self.curr_state = audiotype |
268 | if reinit: | 268 | if reinit: |
269 | self.custom_audio_index[audiotype] = 0 | 269 | self.custom_audio_index[audiotype] = 0 |
@@ -179,8 +179,11 @@ print(f'[INFO] fitting light...') | @@ -179,8 +179,11 @@ print(f'[INFO] fitting light...') | ||
179 | 179 | ||
180 | batch_size = 32 | 180 | batch_size = 32 |
181 | 181 | ||
182 | -device_default = torch.device("cuda:0") | ||
183 | -device_render = torch.device("cuda:0") | 182 | +device_default = torch.device("cuda:0" if torch.cuda.is_available() else ( |
183 | + "mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) | ||
184 | +device_render = torch.device("cuda:0" if torch.cuda.is_available() else ( | ||
185 | + "mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) | ||
186 | + | ||
184 | renderer = Render_3DMM(arg_focal, h, w, batch_size, device_render) | 187 | renderer = Render_3DMM(arg_focal, h, w, batch_size, device_render) |
185 | 188 | ||
186 | sel_ids = np.arange(0, num_frames, int(num_frames / batch_size))[:batch_size] | 189 | sel_ids = np.arange(0, num_frames, int(num_frames / batch_size))[:batch_size] |
@@ -83,7 +83,7 @@ class Render_3DMM(nn.Module): | @@ -83,7 +83,7 @@ class Render_3DMM(nn.Module): | ||
83 | img_h=500, | 83 | img_h=500, |
84 | img_w=500, | 84 | img_w=500, |
85 | batch_size=1, | 85 | batch_size=1, |
86 | - device=torch.device("cuda:0"), | 86 | + device=torch.device("cuda:0" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")), |
87 | ): | 87 | ): |
88 | super(Render_3DMM, self).__init__() | 88 | super(Render_3DMM, self).__init__() |
89 | 89 |
@@ -147,7 +147,7 @@ if __name__ == '__main__': | @@ -147,7 +147,7 @@ if __name__ == '__main__': | ||
147 | 147 | ||
148 | seed_everything(opt.seed) | 148 | seed_everything(opt.seed) |
149 | 149 | ||
150 | - device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') | 150 | + device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) |
151 | 151 | ||
152 | model = NeRFNetwork(opt) | 152 | model = NeRFNetwork(opt) |
153 | 153 |
@@ -442,7 +442,7 @@ class LPIPSMeter: | @@ -442,7 +442,7 @@ class LPIPSMeter: | ||
442 | self.N = 0 | 442 | self.N = 0 |
443 | self.net = net | 443 | self.net = net |
444 | 444 | ||
445 | - self.device = device if device is not None else torch.device('cuda' if torch.cuda.is_available() else 'cpu') | 445 | + self.device = device if device is not None else torch.device('cuda' if torch.cuda.is_available() else ('mps' if hasattr(torch.backends, "mps") and torch.backends.mps.is_available() else 'cpu')) |
446 | self.fn = lpips.LPIPS(net=net).eval().to(self.device) | 446 | self.fn = lpips.LPIPS(net=net).eval().to(self.device) |
447 | 447 | ||
448 | def clear(self): | 448 | def clear(self): |
@@ -618,7 +618,11 @@ class Trainer(object): | @@ -618,7 +618,11 @@ class Trainer(object): | ||
618 | self.flip_init_lips = self.opt.init_lips | 618 | self.flip_init_lips = self.opt.init_lips |
619 | self.time_stamp = time.strftime("%Y-%m-%d_%H-%M-%S") | 619 | self.time_stamp = time.strftime("%Y-%m-%d_%H-%M-%S") |
620 | self.scheduler_update_every_step = scheduler_update_every_step | 620 | self.scheduler_update_every_step = scheduler_update_every_step |
621 | - self.device = device if device is not None else torch.device(f'cuda:{local_rank}' if torch.cuda.is_available() else 'cpu') | 621 | + self.device = device if device is not None else torch.device( |
622 | + f'cuda:{local_rank}' if torch.cuda.is_available() else ( | ||
623 | + 'mps' if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else 'cpu' | ||
624 | + ) | ||
625 | + ) | ||
622 | self.console = Console() | 626 | self.console = Console() |
623 | 627 | ||
624 | model.to(self.device) | 628 | model.to(self.device) |
@@ -56,10 +56,8 @@ from ultralight.unet import Model | @@ -56,10 +56,8 @@ from ultralight.unet import Model | ||
56 | from ultralight.audio2feature import Audio2Feature | 56 | from ultralight.audio2feature import Audio2Feature |
57 | from logger import logger | 57 | from logger import logger |
58 | 58 | ||
59 | - | ||
60 | -device = 'cuda' if torch.cuda.is_available() else 'cpu' | ||
61 | -logger.info('Using {} for inference.'.format(device)) | ||
62 | - | 59 | +device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu") |
60 | +print('Using {} for inference.'.format(device)) | ||
63 | 61 | ||
64 | def load_model(opt): | 62 | def load_model(opt): |
65 | audio_processor = Audio2Feature() | 63 | audio_processor = Audio2Feature() |
@@ -44,8 +44,8 @@ from basereal import BaseReal | @@ -44,8 +44,8 @@ from basereal import BaseReal | ||
44 | from tqdm import tqdm | 44 | from tqdm import tqdm |
45 | from logger import logger | 45 | from logger import logger |
46 | 46 | ||
47 | -device = 'cuda' if torch.cuda.is_available() else 'cpu' | ||
48 | -logger.info('Using {} for inference.'.format(device)) | 47 | +device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu") |
48 | +print('Using {} for inference.'.format(device)) | ||
49 | 49 | ||
50 | def _load(checkpoint_path): | 50 | def _load(checkpoint_path): |
51 | if device == 'cuda': | 51 | if device == 'cuda': |
@@ -51,7 +51,7 @@ from logger import logger | @@ -51,7 +51,7 @@ from logger import logger | ||
51 | def load_model(): | 51 | def load_model(): |
52 | # load model weights | 52 | # load model weights |
53 | audio_processor,vae, unet, pe = load_all_model() | 53 | audio_processor,vae, unet, pe = load_all_model() |
54 | - device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | 54 | + device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) |
55 | timesteps = torch.tensor([0], device=device) | 55 | timesteps = torch.tensor([0], device=device) |
56 | pe = pe.half() | 56 | pe = pe.half() |
57 | vae.vae = vae.vae.half() | 57 | vae.vae = vae.vae.half() |
@@ -267,23 +267,50 @@ class MuseReal(BaseReal): | @@ -267,23 +267,50 @@ class MuseReal(BaseReal): | ||
267 | 267 | ||
268 | 268 | ||
269 | def process_frames(self,quit_event,loop=None,audio_track=None,video_track=None): | 269 | def process_frames(self,quit_event,loop=None,audio_track=None,video_track=None): |
270 | + enable_transition = True # 设置为False禁用过渡效果,True启用 | ||
271 | + | ||
272 | + if enable_transition: | ||
273 | + self.last_speaking = False | ||
274 | + self.transition_start = time.time() | ||
275 | + self.transition_duration = 0.1 # 过渡时间 | ||
276 | + self.last_silent_frame = None # 静音帧缓存 | ||
277 | + self.last_speaking_frame = None # 说话帧缓存 | ||
270 | 278 | ||
271 | while not quit_event.is_set(): | 279 | while not quit_event.is_set(): |
272 | try: | 280 | try: |
273 | res_frame,idx,audio_frames = self.res_frame_queue.get(block=True, timeout=1) | 281 | res_frame,idx,audio_frames = self.res_frame_queue.get(block=True, timeout=1) |
274 | except queue.Empty: | 282 | except queue.Empty: |
275 | continue | 283 | continue |
276 | - if audio_frames[0][1]!=0 and audio_frames[1][1]!=0: #全为静音数据,只需要取fullimg | 284 | + |
285 | + if enable_transition: | ||
286 | + # 检测状态变化 | ||
287 | + current_speaking = not (audio_frames[0][1]!=0 and audio_frames[1][1]!=0) | ||
288 | + if current_speaking != self.last_speaking: | ||
289 | + logger.info(f"状态切换:{'说话' if self.last_speaking else '静音'} → {'说话' if current_speaking else '静音'}") | ||
290 | + self.transition_start = time.time() | ||
291 | + self.last_speaking = current_speaking | ||
292 | + | ||
293 | + if audio_frames[0][1]!=0 and audio_frames[1][1]!=0: | ||
277 | self.speaking = False | 294 | self.speaking = False |
278 | audiotype = audio_frames[0][1] | 295 | audiotype = audio_frames[0][1] |
279 | - if self.custom_index.get(audiotype) is not None: #有自定义视频 | 296 | + if self.custom_index.get(audiotype) is not None: |
280 | mirindex = self.mirror_index(len(self.custom_img_cycle[audiotype]),self.custom_index[audiotype]) | 297 | mirindex = self.mirror_index(len(self.custom_img_cycle[audiotype]),self.custom_index[audiotype]) |
281 | - combine_frame = self.custom_img_cycle[audiotype][mirindex] | 298 | + target_frame = self.custom_img_cycle[audiotype][mirindex] |
282 | self.custom_index[audiotype] += 1 | 299 | self.custom_index[audiotype] += 1 |
283 | - # if not self.custom_opt[audiotype].loop and self.custom_index[audiotype]>=len(self.custom_img_cycle[audiotype]): | ||
284 | - # self.curr_state = 1 #当前视频不循环播放,切换到静音状态 | ||
285 | else: | 300 | else: |
286 | - combine_frame = self.frame_list_cycle[idx] | 301 | + target_frame = self.frame_list_cycle[idx] |
302 | + | ||
303 | + if enable_transition: | ||
304 | + # 说话→静音过渡 | ||
305 | + if time.time() - self.transition_start < self.transition_duration and self.last_speaking_frame is not None: | ||
306 | + alpha = min(1.0, (time.time() - self.transition_start) / self.transition_duration) | ||
307 | + combine_frame = cv2.addWeighted(self.last_speaking_frame, 1-alpha, target_frame, alpha, 0) | ||
308 | + else: | ||
309 | + combine_frame = target_frame | ||
310 | + # 缓存静音帧 | ||
311 | + self.last_silent_frame = combine_frame.copy() | ||
312 | + else: | ||
313 | + combine_frame = target_frame | ||
287 | else: | 314 | else: |
288 | self.speaking = True | 315 | self.speaking = True |
289 | bbox = self.coord_list_cycle[idx] | 316 | bbox = self.coord_list_cycle[idx] |
@@ -291,20 +318,29 @@ class MuseReal(BaseReal): | @@ -291,20 +318,29 @@ class MuseReal(BaseReal): | ||
291 | x1, y1, x2, y2 = bbox | 318 | x1, y1, x2, y2 = bbox |
292 | try: | 319 | try: |
293 | res_frame = cv2.resize(res_frame.astype(np.uint8),(x2-x1,y2-y1)) | 320 | res_frame = cv2.resize(res_frame.astype(np.uint8),(x2-x1,y2-y1)) |
294 | - except: | 321 | + except Exception as e: |
322 | + logger.warning(f"resize error: {e}") | ||
295 | continue | 323 | continue |
296 | mask = self.mask_list_cycle[idx] | 324 | mask = self.mask_list_cycle[idx] |
297 | mask_crop_box = self.mask_coords_list_cycle[idx] | 325 | mask_crop_box = self.mask_coords_list_cycle[idx] |
298 | - #combine_frame = get_image(ori_frame,res_frame,bbox) | ||
299 | - #t=time.perf_counter() | ||
300 | - combine_frame = get_image_blending(ori_frame,res_frame,bbox,mask,mask_crop_box) | ||
301 | - #print('blending time:',time.perf_counter()-t) | ||
302 | 326 | ||
303 | - image = combine_frame #(outputs['image'] * 255).astype(np.uint8) | 327 | + current_frame = get_image_blending(ori_frame,res_frame,bbox,mask,mask_crop_box) |
328 | + if enable_transition: | ||
329 | + # 静音→说话过渡 | ||
330 | + if time.time() - self.transition_start < self.transition_duration and self.last_silent_frame is not None: | ||
331 | + alpha = min(1.0, (time.time() - self.transition_start) / self.transition_duration) | ||
332 | + combine_frame = cv2.addWeighted(self.last_silent_frame, 1-alpha, current_frame, alpha, 0) | ||
333 | + else: | ||
334 | + combine_frame = current_frame | ||
335 | + # 缓存说话帧 | ||
336 | + self.last_speaking_frame = combine_frame.copy() | ||
337 | + else: | ||
338 | + combine_frame = current_frame | ||
339 | + | ||
340 | + image = combine_frame | ||
304 | new_frame = VideoFrame.from_ndarray(image, format="bgr24") | 341 | new_frame = VideoFrame.from_ndarray(image, format="bgr24") |
305 | asyncio.run_coroutine_threadsafe(video_track._queue.put((new_frame,None)), loop) | 342 | asyncio.run_coroutine_threadsafe(video_track._queue.put((new_frame,None)), loop) |
306 | self.record_video_data(image) | 343 | self.record_video_data(image) |
307 | - #self.recordq_video.put(new_frame) | ||
308 | 344 | ||
309 | for audio_frame in audio_frames: | 345 | for audio_frame in audio_frames: |
310 | frame,type,eventpoint = audio_frame | 346 | frame,type,eventpoint = audio_frame |
@@ -312,12 +348,8 @@ class MuseReal(BaseReal): | @@ -312,12 +348,8 @@ class MuseReal(BaseReal): | ||
312 | new_frame = AudioFrame(format='s16', layout='mono', samples=frame.shape[0]) | 348 | new_frame = AudioFrame(format='s16', layout='mono', samples=frame.shape[0]) |
313 | new_frame.planes[0].update(frame.tobytes()) | 349 | new_frame.planes[0].update(frame.tobytes()) |
314 | new_frame.sample_rate=16000 | 350 | new_frame.sample_rate=16000 |
315 | - # if audio_track._queue.qsize()>10: | ||
316 | - # time.sleep(0.1) | ||
317 | asyncio.run_coroutine_threadsafe(audio_track._queue.put((new_frame,eventpoint)), loop) | 351 | asyncio.run_coroutine_threadsafe(audio_track._queue.put((new_frame,eventpoint)), loop) |
318 | self.record_audio_data(frame) | 352 | self.record_audio_data(frame) |
319 | - #self.notify(eventpoint) | ||
320 | - #self.recordq_audio.put(new_frame) | ||
321 | logger.info('musereal process_frames thread stop') | 353 | logger.info('musereal process_frames thread stop') |
322 | 354 | ||
323 | def render(self,quit_event,loop=None,audio_track=None,video_track=None): | 355 | def render(self,quit_event,loop=None,audio_track=None,video_track=None): |
@@ -36,7 +36,7 @@ class UNet(): | @@ -36,7 +36,7 @@ class UNet(): | ||
36 | unet_config = json.load(f) | 36 | unet_config = json.load(f) |
37 | self.model = UNet2DConditionModel(**unet_config) | 37 | self.model = UNet2DConditionModel(**unet_config) |
38 | self.pe = PositionalEncoding(d_model=384) | 38 | self.pe = PositionalEncoding(d_model=384) |
39 | - self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | 39 | + self.device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) |
40 | weights = torch.load(model_path) if torch.cuda.is_available() else torch.load(model_path, map_location=self.device) | 40 | weights = torch.load(model_path) if torch.cuda.is_available() else torch.load(model_path, map_location=self.device) |
41 | self.model.load_state_dict(weights) | 41 | self.model.load_state_dict(weights) |
42 | if use_float16: | 42 | if use_float16: |
@@ -23,7 +23,7 @@ class VAE(): | @@ -23,7 +23,7 @@ class VAE(): | ||
23 | self.model_path = model_path | 23 | self.model_path = model_path |
24 | self.vae = AutoencoderKL.from_pretrained(self.model_path) | 24 | self.vae = AutoencoderKL.from_pretrained(self.model_path) |
25 | 25 | ||
26 | - self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | 26 | + self.device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) |
27 | self.vae.to(self.device) | 27 | self.vae.to(self.device) |
28 | 28 | ||
29 | if use_float16: | 29 | if use_float16: |
@@ -325,7 +325,7 @@ def create_musetalk_human(file, avatar_id): | @@ -325,7 +325,7 @@ def create_musetalk_human(file, avatar_id): | ||
325 | 325 | ||
326 | 326 | ||
327 | # initialize the mmpose model | 327 | # initialize the mmpose model |
328 | -device = "cuda" if torch.cuda.is_available() else "cpu" | 328 | +device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu") |
329 | fa = FaceAlignment(1, flip_input=False, device=device) | 329 | fa = FaceAlignment(1, flip_input=False, device=device) |
330 | config_file = os.path.join(current_dir, 'utils/dwpose/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py') | 330 | config_file = os.path.join(current_dir, 'utils/dwpose/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py') |
331 | checkpoint_file = os.path.abspath(os.path.join(current_dir, '../models/dwpose/dw-ll_ucoco_384.pth')) | 331 | checkpoint_file = os.path.abspath(os.path.join(current_dir, '../models/dwpose/dw-ll_ucoco_384.pth')) |
@@ -13,14 +13,14 @@ import torch | @@ -13,14 +13,14 @@ import torch | ||
13 | from tqdm import tqdm | 13 | from tqdm import tqdm |
14 | 14 | ||
15 | # initialize the mmpose model | 15 | # initialize the mmpose model |
16 | -device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | 16 | +device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) |
17 | config_file = './musetalk/utils/dwpose/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py' | 17 | config_file = './musetalk/utils/dwpose/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py' |
18 | checkpoint_file = './models/dwpose/dw-ll_ucoco_384.pth' | 18 | checkpoint_file = './models/dwpose/dw-ll_ucoco_384.pth' |
19 | model = init_model(config_file, checkpoint_file, device=device) | 19 | model = init_model(config_file, checkpoint_file, device=device) |
20 | 20 | ||
21 | # initialize the face detection model | 21 | # initialize the face detection model |
22 | -device = "cuda" if torch.cuda.is_available() else "cpu" | ||
23 | -fa = FaceAlignment(LandmarksType._2D, flip_input=False,device=device) | 22 | +device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu") |
23 | +fa = FaceAlignment(LandmarksType._2D, flip_input=False, device=device) | ||
24 | 24 | ||
25 | # maker if the bbox is not sufficient | 25 | # maker if the bbox is not sufficient |
26 | coord_placeholder = (0.0,0.0,0.0,0.0) | 26 | coord_placeholder = (0.0,0.0,0.0,0.0) |
@@ -91,7 +91,7 @@ def load_model(name: str, device: Optional[Union[str, torch.device]] = None, dow | @@ -91,7 +91,7 @@ def load_model(name: str, device: Optional[Union[str, torch.device]] = None, dow | ||
91 | """ | 91 | """ |
92 | 92 | ||
93 | if device is None: | 93 | if device is None: |
94 | - device = "cuda" if torch.cuda.is_available() else "cpu" | 94 | + device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu") |
95 | if download_root is None: | 95 | if download_root is None: |
96 | download_root = os.getenv( | 96 | download_root = os.getenv( |
97 | "XDG_CACHE_HOME", | 97 | "XDG_CACHE_HOME", |
@@ -78,6 +78,8 @@ def transcribe( | @@ -78,6 +78,8 @@ def transcribe( | ||
78 | if dtype == torch.float16: | 78 | if dtype == torch.float16: |
79 | warnings.warn("FP16 is not supported on CPU; using FP32 instead") | 79 | warnings.warn("FP16 is not supported on CPU; using FP32 instead") |
80 | dtype = torch.float32 | 80 | dtype = torch.float32 |
81 | + if hasattr(torch.backends, "mps") and torch.backends.mps.is_available(): | ||
82 | + warnings.warn("Performing inference on CPU when MPS is available") | ||
81 | 83 | ||
82 | if dtype == torch.float32: | 84 | if dtype == torch.float32: |
83 | decode_options["fp16"] = False | 85 | decode_options["fp16"] = False |
@@ -135,7 +137,7 @@ def cli(): | @@ -135,7 +137,7 @@ def cli(): | ||
135 | parser.add_argument("audio", nargs="+", type=str, help="audio file(s) to transcribe") | 137 | parser.add_argument("audio", nargs="+", type=str, help="audio file(s) to transcribe") |
136 | parser.add_argument("--model", default="small", choices=available_models(), help="name of the Whisper model to use") | 138 | parser.add_argument("--model", default="small", choices=available_models(), help="name of the Whisper model to use") |
137 | parser.add_argument("--model_dir", type=str, default=None, help="the path to save model files; uses ~/.cache/whisper by default") | 139 | parser.add_argument("--model_dir", type=str, default=None, help="the path to save model files; uses ~/.cache/whisper by default") |
138 | - parser.add_argument("--device", default="cuda" if torch.cuda.is_available() else "cpu", help="device to use for PyTorch inference") | 140 | + parser.add_argument("--device", default="cuda" if torch.cuda.is_available() else "mps", help="device to use for PyTorch inference") |
139 | parser.add_argument("--output_dir", "-o", type=str, default=".", help="directory to save the outputs") | 141 | parser.add_argument("--output_dir", "-o", type=str, default=".", help="directory to save the outputs") |
140 | parser.add_argument("--verbose", type=str2bool, default=True, help="whether to print out the progress and debug messages") | 142 | parser.add_argument("--verbose", type=str2bool, default=True, help="whether to print out the progress and debug messages") |
141 | 143 |
@@ -30,7 +30,7 @@ class NerfASR(BaseASR): | @@ -30,7 +30,7 @@ class NerfASR(BaseASR): | ||
30 | def __init__(self, opt, parent, audio_processor,audio_model): | 30 | def __init__(self, opt, parent, audio_processor,audio_model): |
31 | super().__init__(opt,parent) | 31 | super().__init__(opt,parent) |
32 | 32 | ||
33 | - self.device = 'cuda' if torch.cuda.is_available() else 'cpu' | 33 | + self.device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu") |
34 | if 'esperanto' in self.opt.asr_model: | 34 | if 'esperanto' in self.opt.asr_model: |
35 | self.audio_dim = 44 | 35 | self.audio_dim = 44 |
36 | elif 'deepspeech' in self.opt.asr_model: | 36 | elif 'deepspeech' in self.opt.asr_model: |
@@ -77,7 +77,7 @@ def load_model(opt): | @@ -77,7 +77,7 @@ def load_model(opt): | ||
77 | seed_everything(opt.seed) | 77 | seed_everything(opt.seed) |
78 | logger.info(opt) | 78 | logger.info(opt) |
79 | 79 | ||
80 | - device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') | 80 | + device = torch.device('cuda' if torch.cuda.is_available() else ('mps' if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else 'cpu')) |
81 | model = NeRFNetwork(opt) | 81 | model = NeRFNetwork(opt) |
82 | 82 | ||
83 | criterion = torch.nn.MSELoss(reduction='none') | 83 | criterion = torch.nn.MSELoss(reduction='none') |
@@ -90,7 +90,7 @@ class BaseTTS: | @@ -90,7 +90,7 @@ class BaseTTS: | ||
90 | ########################################################################################### | 90 | ########################################################################################### |
91 | class EdgeTTS(BaseTTS): | 91 | class EdgeTTS(BaseTTS): |
92 | def txt_to_audio(self,msg): | 92 | def txt_to_audio(self,msg): |
93 | - voicename = "zh-CN-XiaoxiaoNeural" | 93 | + voicename = "zh-CN-YunxiaNeural" |
94 | text,textevent = msg | 94 | text,textevent = msg |
95 | t = time.time() | 95 | t = time.time() |
96 | asyncio.new_event_loop().run_until_complete(self.__main(voicename,text)) | 96 | asyncio.new_event_loop().run_until_complete(self.__main(voicename,text)) |
@@ -107,9 +107,9 @@ class EdgeTTS(BaseTTS): | @@ -107,9 +107,9 @@ class EdgeTTS(BaseTTS): | ||
107 | eventpoint=None | 107 | eventpoint=None |
108 | streamlen -= self.chunk | 108 | streamlen -= self.chunk |
109 | if idx==0: | 109 | if idx==0: |
110 | - eventpoint={'status':'start','text':text,'msgenvent':textevent} | 110 | + eventpoint={'status':'start','text':text,'msgevent':textevent} |
111 | elif streamlen<self.chunk: | 111 | elif streamlen<self.chunk: |
112 | - eventpoint={'status':'end','text':text,'msgenvent':textevent} | 112 | + eventpoint={'status':'end','text':text,'msgevent':textevent} |
113 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) | 113 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) |
114 | idx += self.chunk | 114 | idx += self.chunk |
115 | #if streamlen>0: #skip last frame(not 20ms) | 115 | #if streamlen>0: #skip last frame(not 20ms) |
@@ -219,16 +219,16 @@ class FishTTS(BaseTTS): | @@ -219,16 +219,16 @@ class FishTTS(BaseTTS): | ||
219 | while streamlen >= self.chunk: | 219 | while streamlen >= self.chunk: |
220 | eventpoint=None | 220 | eventpoint=None |
221 | if first: | 221 | if first: |
222 | - eventpoint={'status':'start','text':text,'msgenvent':textevent} | 222 | + eventpoint={'status':'start','text':text,'msgevent':textevent} |
223 | first = False | 223 | first = False |
224 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) | 224 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) |
225 | streamlen -= self.chunk | 225 | streamlen -= self.chunk |
226 | idx += self.chunk | 226 | idx += self.chunk |
227 | - eventpoint={'status':'end','text':text,'msgenvent':textevent} | 227 | + eventpoint={'status':'end','text':text,'msgevent':textevent} |
228 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) | 228 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) |
229 | 229 | ||
230 | ########################################################################################### | 230 | ########################################################################################### |
231 | -class VoitsTTS(BaseTTS): | 231 | +class SovitsTTS(BaseTTS): |
232 | def txt_to_audio(self,msg): | 232 | def txt_to_audio(self,msg): |
233 | text,textevent = msg | 233 | text,textevent = msg |
234 | self.stream_tts( | 234 | self.stream_tts( |
@@ -316,12 +316,12 @@ class VoitsTTS(BaseTTS): | @@ -316,12 +316,12 @@ class VoitsTTS(BaseTTS): | ||
316 | while streamlen >= self.chunk: | 316 | while streamlen >= self.chunk: |
317 | eventpoint=None | 317 | eventpoint=None |
318 | if first: | 318 | if first: |
319 | - eventpoint={'status':'start','text':text,'msgenvent':textevent} | 319 | + eventpoint={'status':'start','text':text,'msgevent':textevent} |
320 | first = False | 320 | first = False |
321 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) | 321 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) |
322 | streamlen -= self.chunk | 322 | streamlen -= self.chunk |
323 | idx += self.chunk | 323 | idx += self.chunk |
324 | - eventpoint={'status':'end','text':text,'msgenvent':textevent} | 324 | + eventpoint={'status':'end','text':text,'msgevent':textevent} |
325 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) | 325 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) |
326 | 326 | ||
327 | ########################################################################################### | 327 | ########################################################################################### |
@@ -382,12 +382,12 @@ class CosyVoiceTTS(BaseTTS): | @@ -382,12 +382,12 @@ class CosyVoiceTTS(BaseTTS): | ||
382 | while streamlen >= self.chunk: | 382 | while streamlen >= self.chunk: |
383 | eventpoint=None | 383 | eventpoint=None |
384 | if first: | 384 | if first: |
385 | - eventpoint={'status':'start','text':text,'msgenvent':textevent} | 385 | + eventpoint={'status':'start','text':text,'msgevent':textevent} |
386 | first = False | 386 | first = False |
387 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) | 387 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) |
388 | streamlen -= self.chunk | 388 | streamlen -= self.chunk |
389 | idx += self.chunk | 389 | idx += self.chunk |
390 | - eventpoint={'status':'end','text':text,'msgenvent':textevent} | 390 | + eventpoint={'status':'end','text':text,'msgevent':textevent} |
391 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) | 391 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) |
392 | 392 | ||
393 | ########################################################################################### | 393 | ########################################################################################### |
@@ -505,13 +505,13 @@ class TencentTTS(BaseTTS): | @@ -505,13 +505,13 @@ class TencentTTS(BaseTTS): | ||
505 | while streamlen >= self.chunk: | 505 | while streamlen >= self.chunk: |
506 | eventpoint=None | 506 | eventpoint=None |
507 | if first: | 507 | if first: |
508 | - eventpoint={'status':'start','text':text,'msgenvent':textevent} | 508 | + eventpoint={'status':'start','text':text,'msgevent':textevent} |
509 | first = False | 509 | first = False |
510 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) | 510 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) |
511 | streamlen -= self.chunk | 511 | streamlen -= self.chunk |
512 | idx += self.chunk | 512 | idx += self.chunk |
513 | last_stream = stream[idx:] #get the remain stream | 513 | last_stream = stream[idx:] #get the remain stream |
514 | - eventpoint={'status':'end','text':text,'msgenvent':textevent} | 514 | + eventpoint={'status':'end','text':text,'msgevent':textevent} |
515 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) | 515 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) |
516 | 516 | ||
517 | ########################################################################################### | 517 | ########################################################################################### |
@@ -583,10 +583,10 @@ class XTTS(BaseTTS): | @@ -583,10 +583,10 @@ class XTTS(BaseTTS): | ||
583 | while streamlen >= self.chunk: | 583 | while streamlen >= self.chunk: |
584 | eventpoint=None | 584 | eventpoint=None |
585 | if first: | 585 | if first: |
586 | - eventpoint={'status':'start','text':text,'msgenvent':textevent} | 586 | + eventpoint={'status':'start','text':text,'msgevent':textevent} |
587 | first = False | 587 | first = False |
588 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) | 588 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) |
589 | streamlen -= self.chunk | 589 | streamlen -= self.chunk |
590 | idx += self.chunk | 590 | idx += self.chunk |
591 | - eventpoint={'status':'end','text':text,'msgenvent':textevent} | 591 | + eventpoint={'status':'end','text':text,'msgevent':textevent} |
592 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) | 592 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) |
@@ -236,7 +236,7 @@ if __name__ == '__main__': | @@ -236,7 +236,7 @@ if __name__ == '__main__': | ||
236 | if hasattr(module, 'reparameterize'): | 236 | if hasattr(module, 'reparameterize'): |
237 | module.reparameterize() | 237 | module.reparameterize() |
238 | return model | 238 | return model |
239 | - device = torch.device("cuda") | 239 | + device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) |
240 | def check_onnx(torch_out, torch_in, audio): | 240 | def check_onnx(torch_out, torch_in, audio): |
241 | onnx_model = onnx.load(onnx_path) | 241 | onnx_model = onnx.load(onnx_path) |
242 | onnx.checker.check_model(onnx_model) | 242 | onnx.checker.check_model(onnx_model) |
web/dashboard.html
0 → 100644
1 | +<!DOCTYPE html> | ||
2 | +<html lang="zh-CN"> | ||
3 | +<head> | ||
4 | + <meta charset="UTF-8"> | ||
5 | + <meta name="viewport" content="width=device-width, initial-scale=1.0"> | ||
6 | + <title>livetalking数字人交互平台</title> | ||
7 | + <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet"> | ||
8 | + <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.10.0/font/bootstrap-icons.css"> | ||
9 | + <style> | ||
10 | + :root { | ||
11 | + --primary-color: #4361ee; | ||
12 | + --secondary-color: #3f37c9; | ||
13 | + --accent-color: #4895ef; | ||
14 | + --background-color: #f8f9fa; | ||
15 | + --card-bg: #ffffff; | ||
16 | + --text-color: #212529; | ||
17 | + --border-radius: 10px; | ||
18 | + --box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); | ||
19 | + } | ||
20 | + | ||
21 | + body { | ||
22 | + font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; | ||
23 | + background-color: var(--background-color); | ||
24 | + color: var(--text-color); | ||
25 | + min-height: 100vh; | ||
26 | + padding-top: 20px; | ||
27 | + } | ||
28 | + | ||
29 | + .dashboard-container { | ||
30 | + max-width: 1400px; | ||
31 | + margin: 0 auto; | ||
32 | + padding: 20px; | ||
33 | + } | ||
34 | + | ||
35 | + .card { | ||
36 | + background-color: var(--card-bg); | ||
37 | + border-radius: var(--border-radius); | ||
38 | + box-shadow: var(--box-shadow); | ||
39 | + border: none; | ||
40 | + margin-bottom: 20px; | ||
41 | + overflow: hidden; | ||
42 | + } | ||
43 | + | ||
44 | + .card-header { | ||
45 | + background-color: var(--primary-color); | ||
46 | + color: white; | ||
47 | + font-weight: 600; | ||
48 | + padding: 15px 20px; | ||
49 | + border-bottom: none; | ||
50 | + } | ||
51 | + | ||
52 | + .video-container { | ||
53 | + position: relative; | ||
54 | + width: 100%; | ||
55 | + background-color: #000; | ||
56 | + border-radius: var(--border-radius); | ||
57 | + overflow: hidden; | ||
58 | + display: flex; | ||
59 | + justify-content: center; | ||
60 | + align-items: center; | ||
61 | + } | ||
62 | + | ||
63 | + video { | ||
64 | + max-width: 100%; | ||
65 | + max-height: 100%; | ||
66 | + display: block; | ||
67 | + border-radius: var(--border-radius); | ||
68 | + } | ||
69 | + | ||
70 | + .controls-container { | ||
71 | + padding: 20px; | ||
72 | + } | ||
73 | + | ||
74 | + .btn-primary { | ||
75 | + background-color: var(--primary-color); | ||
76 | + border-color: var(--primary-color); | ||
77 | + } | ||
78 | + | ||
79 | + .btn-primary:hover { | ||
80 | + background-color: var(--secondary-color); | ||
81 | + border-color: var(--secondary-color); | ||
82 | + } | ||
83 | + | ||
84 | + .btn-outline-primary { | ||
85 | + color: var(--primary-color); | ||
86 | + border-color: var(--primary-color); | ||
87 | + } | ||
88 | + | ||
89 | + .btn-outline-primary:hover { | ||
90 | + background-color: var(--primary-color); | ||
91 | + color: white; | ||
92 | + } | ||
93 | + | ||
94 | + .form-control { | ||
95 | + border-radius: var(--border-radius); | ||
96 | + padding: 10px 15px; | ||
97 | + border: 1px solid #ced4da; | ||
98 | + } | ||
99 | + | ||
100 | + .form-control:focus { | ||
101 | + border-color: var(--accent-color); | ||
102 | + box-shadow: 0 0 0 0.25rem rgba(67, 97, 238, 0.25); | ||
103 | + } | ||
104 | + | ||
105 | + .status-indicator { | ||
106 | + width: 10px; | ||
107 | + height: 10px; | ||
108 | + border-radius: 50%; | ||
109 | + display: inline-block; | ||
110 | + margin-right: 5px; | ||
111 | + } | ||
112 | + | ||
113 | + .status-connected { | ||
114 | + background-color: #28a745; | ||
115 | + } | ||
116 | + | ||
117 | + .status-disconnected { | ||
118 | + background-color: #dc3545; | ||
119 | + } | ||
120 | + | ||
121 | + .status-connecting { | ||
122 | + background-color: #ffc107; | ||
123 | + } | ||
124 | + | ||
125 | + .asr-container { | ||
126 | + height: 300px; | ||
127 | + overflow-y: auto; | ||
128 | + padding: 15px; | ||
129 | + background-color: #f8f9fa; | ||
130 | + border-radius: var(--border-radius); | ||
131 | + border: 1px solid #ced4da; | ||
132 | + } | ||
133 | + | ||
134 | + .asr-text { | ||
135 | + margin-bottom: 10px; | ||
136 | + padding: 10px; | ||
137 | + background-color: white; | ||
138 | + border-radius: var(--border-radius); | ||
139 | + box-shadow: 0 1px 3px rgba(0, 0, 0, 0.1); | ||
140 | + } | ||
141 | + | ||
142 | + .user-message { | ||
143 | + background-color: #e3f2fd; | ||
144 | + border-left: 4px solid var(--primary-color); | ||
145 | + } | ||
146 | + | ||
147 | + .system-message { | ||
148 | + background-color: #f1f8e9; | ||
149 | + border-left: 4px solid #8bc34a; | ||
150 | + } | ||
151 | + | ||
152 | + .recording-indicator { | ||
153 | + position: absolute; | ||
154 | + top: 15px; | ||
155 | + right: 15px; | ||
156 | + background-color: rgba(220, 53, 69, 0.8); | ||
157 | + color: white; | ||
158 | + padding: 5px 10px; | ||
159 | + border-radius: 20px; | ||
160 | + font-size: 0.8rem; | ||
161 | + display: none; | ||
162 | + } | ||
163 | + | ||
164 | + .recording-indicator.active { | ||
165 | + display: flex; | ||
166 | + align-items: center; | ||
167 | + } | ||
168 | + | ||
169 | + .recording-indicator .blink { | ||
170 | + width: 10px; | ||
171 | + height: 10px; | ||
172 | + background-color: #fff; | ||
173 | + border-radius: 50%; | ||
174 | + margin-right: 5px; | ||
175 | + animation: blink 1s infinite; | ||
176 | + } | ||
177 | + | ||
178 | + @keyframes blink { | ||
179 | + 0% { opacity: 1; } | ||
180 | + 50% { opacity: 0.3; } | ||
181 | + 100% { opacity: 1; } | ||
182 | + } | ||
183 | + | ||
184 | + .mode-switch { | ||
185 | + margin-bottom: 20px; | ||
186 | + } | ||
187 | + | ||
188 | + .nav-tabs .nav-link { | ||
189 | + color: var(--text-color); | ||
190 | + border: none; | ||
191 | + padding: 10px 20px; | ||
192 | + border-radius: var(--border-radius) var(--border-radius) 0 0; | ||
193 | + } | ||
194 | + | ||
195 | + .nav-tabs .nav-link.active { | ||
196 | + color: var(--primary-color); | ||
197 | + background-color: var(--card-bg); | ||
198 | + border-bottom: 3px solid var(--primary-color); | ||
199 | + font-weight: 600; | ||
200 | + } | ||
201 | + | ||
202 | + .tab-content { | ||
203 | + padding: 20px; | ||
204 | + background-color: var(--card-bg); | ||
205 | + border-radius: 0 0 var(--border-radius) var(--border-radius); | ||
206 | + } | ||
207 | + | ||
208 | + .settings-panel { | ||
209 | + padding: 15px; | ||
210 | + background-color: #f8f9fa; | ||
211 | + border-radius: var(--border-radius); | ||
212 | + margin-top: 15px; | ||
213 | + } | ||
214 | + | ||
215 | + .footer { | ||
216 | + text-align: center; | ||
217 | + margin-top: 30px; | ||
218 | + padding: 20px 0; | ||
219 | + color: #6c757d; | ||
220 | + font-size: 0.9rem; | ||
221 | + } | ||
222 | + | ||
223 | + .voice-record-btn { | ||
224 | + width: 60px; | ||
225 | + height: 60px; | ||
226 | + border-radius: 50%; | ||
227 | + background-color: var(--primary-color); | ||
228 | + color: white; | ||
229 | + display: flex; | ||
230 | + justify-content: center; | ||
231 | + align-items: center; | ||
232 | + cursor: pointer; | ||
233 | + transition: all 0.2s ease; | ||
234 | + box-shadow: 0 2px 5px rgba(0,0,0,0.2); | ||
235 | + margin: 0 auto; | ||
236 | + } | ||
237 | + | ||
238 | + .voice-record-btn:hover { | ||
239 | + background-color: var(--secondary-color); | ||
240 | + transform: scale(1.05); | ||
241 | + } | ||
242 | + | ||
243 | + .voice-record-btn:active { | ||
244 | + background-color: #dc3545; | ||
245 | + transform: scale(0.95); | ||
246 | + } | ||
247 | + | ||
248 | + .voice-record-btn i { | ||
249 | + font-size: 24px; | ||
250 | + } | ||
251 | + | ||
252 | + .voice-record-label { | ||
253 | + text-align: center; | ||
254 | + margin-top: 10px; | ||
255 | + font-size: 14px; | ||
256 | + color: #6c757d; | ||
257 | + } | ||
258 | + | ||
259 | + .video-size-control { | ||
260 | + margin-top: 15px; | ||
261 | + } | ||
262 | + | ||
263 | + .recording-pulse { | ||
264 | + animation: pulse 1.5s infinite; | ||
265 | + } | ||
266 | + | ||
267 | + @keyframes pulse { | ||
268 | + 0% { | ||
269 | + box-shadow: 0 0 0 0 rgba(220, 53, 69, 0.7); | ||
270 | + } | ||
271 | + 70% { | ||
272 | + box-shadow: 0 0 0 15px rgba(220, 53, 69, 0); | ||
273 | + } | ||
274 | + 100% { | ||
275 | + box-shadow: 0 0 0 0 rgba(220, 53, 69, 0); | ||
276 | + } | ||
277 | + } | ||
278 | + </style> | ||
279 | +</head> | ||
280 | +<body> | ||
281 | + <div class="dashboard-container"> | ||
282 | + <div class="row"> | ||
283 | + <div class="col-12"> | ||
284 | + <h1 class="text-center mb-4">livetalking数字人交互平台</h1> | ||
285 | + </div> | ||
286 | + </div> | ||
287 | + | ||
288 | + <div class="row"> | ||
289 | + <!-- 视频区域 --> | ||
290 | + <div class="col-lg-8"> | ||
291 | + <div class="card"> | ||
292 | + <div class="card-header d-flex justify-content-between align-items-center"> | ||
293 | + <div> | ||
294 | + <span class="status-indicator status-disconnected" id="connection-status"></span> | ||
295 | + <span id="status-text">未连接</span> | ||
296 | + </div> | ||
297 | + </div> | ||
298 | + <div class="card-body p-0"> | ||
299 | + <div class="video-container"> | ||
300 | + <video id="video" autoplay playsinline></video> | ||
301 | + <div class="recording-indicator" id="recording-indicator"> | ||
302 | + <div class="blink"></div> | ||
303 | + <span>录制中</span> | ||
304 | + </div> | ||
305 | + </div> | ||
306 | + | ||
307 | + <div class="controls-container"> | ||
308 | + <div class="row"> | ||
309 | + <div class="col-md-6 mb-3"> | ||
310 | + <button class="btn btn-primary w-100" id="start"> | ||
311 | + <i class="bi bi-play-fill"></i> 开始连接 | ||
312 | + </button> | ||
313 | + <button class="btn btn-danger w-100" id="stop" style="display: none;"> | ||
314 | + <i class="bi bi-stop-fill"></i> 停止连接 | ||
315 | + </button> | ||
316 | + </div> | ||
317 | + <div class="col-md-6 mb-3"> | ||
318 | + <div class="d-flex"> | ||
319 | + <button class="btn btn-outline-primary flex-grow-1 me-2" id="btn_start_record"> | ||
320 | + <i class="bi bi-record-fill"></i> 开始录制 | ||
321 | + </button> | ||
322 | + <button class="btn btn-outline-danger flex-grow-1" id="btn_stop_record" disabled> | ||
323 | + <i class="bi bi-stop-fill"></i> 停止录制 | ||
324 | + </button> | ||
325 | + </div> | ||
326 | + </div> | ||
327 | + </div> | ||
328 | + | ||
329 | + <div class="row"> | ||
330 | + <div class="col-12"> | ||
331 | + <div class="video-size-control"> | ||
332 | + <label for="video-size-slider" class="form-label">视频大小调节: <span id="video-size-value">100%</span></label> | ||
333 | + <input type="range" class="form-range" id="video-size-slider" min="50" max="150" value="100"> | ||
334 | + </div> | ||
335 | + </div> | ||
336 | + </div> | ||
337 | + | ||
338 | + <div class="settings-panel mt-3"> | ||
339 | + <div class="row"> | ||
340 | + <div class="col-md-12"> | ||
341 | + <div class="form-check form-switch mb-3"> | ||
342 | + <input class="form-check-input" type="checkbox" id="use-stun"> | ||
343 | + <label class="form-check-label" for="use-stun">使用STUN服务器</label> | ||
344 | + </div> | ||
345 | + </div> | ||
346 | + </div> | ||
347 | + </div> | ||
348 | + </div> | ||
349 | + </div> | ||
350 | + </div> | ||
351 | + </div> | ||
352 | + | ||
353 | + <!-- 右侧交互 --> | ||
354 | + <div class="col-lg-4"> | ||
355 | + <div class="card"> | ||
356 | + <div class="card-header"> | ||
357 | + <ul class="nav nav-tabs card-header-tabs" id="interaction-tabs" role="tablist"> | ||
358 | + <li class="nav-item" role="presentation"> | ||
359 | + <button class="nav-link active" id="chat-tab" data-bs-toggle="tab" data-bs-target="#chat" type="button" role="tab" aria-controls="chat" aria-selected="true">对话模式</button> | ||
360 | + </li> | ||
361 | + <li class="nav-item" role="presentation"> | ||
362 | + <button class="nav-link" id="tts-tab" data-bs-toggle="tab" data-bs-target="#tts" type="button" role="tab" aria-controls="tts" aria-selected="false">朗读模式</button> | ||
363 | + </li> | ||
364 | + </ul> | ||
365 | + </div> | ||
366 | + <div class="card-body"> | ||
367 | + <div class="tab-content" id="interaction-tabs-content"> | ||
368 | + <!-- 对话模式 --> | ||
369 | + <div class="tab-pane fade show active" id="chat" role="tabpanel" aria-labelledby="chat-tab"> | ||
370 | + <div class="asr-container mb-3" id="chat-messages"> | ||
371 | + <div class="asr-text system-message"> | ||
372 | + 系统: 欢迎使用livetalking,请点击"开始连接"按钮开始对话。 | ||
373 | + </div> | ||
374 | + </div> | ||
375 | + | ||
376 | + <form id="chat-form"> | ||
377 | + <div class="input-group mb-3"> | ||
378 | + <textarea class="form-control" id="chat-message" rows="3" placeholder="输入您想对数字人说的话..."></textarea> | ||
379 | + <button class="btn btn-primary" type="submit"> | ||
380 | + <i class="bi bi-send"></i> 发送 | ||
381 | + </button> | ||
382 | + </div> | ||
383 | + </form> | ||
384 | + | ||
385 | + <!-- 按住说话按钮 --> | ||
386 | + <div class="voice-record-btn" id="voice-record-btn"> | ||
387 | + <i class="bi bi-mic-fill"></i> | ||
388 | + </div> | ||
389 | + <div class="voice-record-label">按住说话,松开发送</div> | ||
390 | + </div> | ||
391 | + | ||
392 | + <!-- 朗读模式 --> | ||
393 | + <div class="tab-pane fade" id="tts" role="tabpanel" aria-labelledby="tts-tab"> | ||
394 | + <form id="echo-form"> | ||
395 | + <div class="mb-3"> | ||
396 | + <label for="message" class="form-label">输入要朗读的文本</label> | ||
397 | + <textarea class="form-control" id="message" rows="6" placeholder="输入您想让数字人朗读的文字..."></textarea> | ||
398 | + </div> | ||
399 | + <button type="submit" class="btn btn-primary w-100"> | ||
400 | + <i class="bi bi-volume-up"></i> 朗读文本 | ||
401 | + </button> | ||
402 | + </form> | ||
403 | + </div> | ||
404 | + </div> | ||
405 | + </div> | ||
406 | + </div> | ||
407 | + </div> | ||
408 | + </div> | ||
409 | + | ||
410 | + <div class="footer"> | ||
411 | + <p>Made with ❤️ by Marstaos | Frontend & Performance Optimization</p> | ||
412 | + </div> | ||
413 | + </div> | ||
414 | + | ||
415 | + <!-- 隐藏的会话ID --> | ||
416 | + <input type="hidden" id="sessionid" value="0"> | ||
417 | + | ||
418 | + | ||
419 | + <script src="client.js"></script> | ||
420 | + <script src="srs.sdk.js"></script> | ||
421 | + <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script> | ||
422 | + <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script> | ||
423 | + <script> | ||
424 | + $(document).ready(function() { | ||
425 | + $('#video-size-slider').on('input', function() { | ||
426 | + const value = $(this).val(); | ||
427 | + $('#video-size-value').text(value + '%'); | ||
428 | + $('#video').css('width', value + '%'); | ||
429 | + }); | ||
430 | + function updateConnectionStatus(status) { | ||
431 | + const statusIndicator = $('#connection-status'); | ||
432 | + const statusText = $('#status-text'); | ||
433 | + | ||
434 | + statusIndicator.removeClass('status-connected status-disconnected status-connecting'); | ||
435 | + | ||
436 | + switch(status) { | ||
437 | + case 'connected': | ||
438 | + statusIndicator.addClass('status-connected'); | ||
439 | + statusText.text('已连接'); | ||
440 | + break; | ||
441 | + case 'connecting': | ||
442 | + statusIndicator.addClass('status-connecting'); | ||
443 | + statusText.text('连接中...'); | ||
444 | + break; | ||
445 | + case 'disconnected': | ||
446 | + default: | ||
447 | + statusIndicator.addClass('status-disconnected'); | ||
448 | + statusText.text('未连接'); | ||
449 | + break; | ||
450 | + } | ||
451 | + } | ||
452 | + | ||
453 | + // 添加聊天消息 | ||
454 | + function addChatMessage(message, type = 'user') { | ||
455 | + const messagesContainer = $('#chat-messages'); | ||
456 | + const messageClass = type === 'user' ? 'user-message' : 'system-message'; | ||
457 | + const sender = type === 'user' ? '您' : '数字人'; | ||
458 | + | ||
459 | + const messageElement = $(` | ||
460 | + <div class="asr-text ${messageClass}"> | ||
461 | + ${sender}: ${message} | ||
462 | + </div> | ||
463 | + `); | ||
464 | + | ||
465 | + messagesContainer.append(messageElement); | ||
466 | + messagesContainer.scrollTop(messagesContainer[0].scrollHeight); | ||
467 | + } | ||
468 | + | ||
469 | + // 开始/停止按钮 | ||
470 | + $('#start').click(function() { | ||
471 | + updateConnectionStatus('connecting'); | ||
472 | + start(); | ||
473 | + $(this).hide(); | ||
474 | + $('#stop').show(); | ||
475 | + | ||
476 | + // 添加定时器检查视频流是否已加载 | ||
477 | + let connectionCheckTimer = setInterval(function() { | ||
478 | + const video = document.getElementById('video'); | ||
479 | + // 检查视频是否有数据 | ||
480 | + if (video.readyState >= 3 && video.videoWidth > 0) { | ||
481 | + updateConnectionStatus('connected'); | ||
482 | + clearInterval(connectionCheckTimer); | ||
483 | + } | ||
484 | + }, 2000); // 每2秒检查一次 | ||
485 | + | ||
486 | + // 60秒后如果还是连接中状态,就停止检查 | ||
487 | + setTimeout(function() { | ||
488 | + if (connectionCheckTimer) { | ||
489 | + clearInterval(connectionCheckTimer); | ||
490 | + } | ||
491 | + }, 60000); | ||
492 | + }); | ||
493 | + | ||
494 | + $('#stop').click(function() { | ||
495 | + stop(); | ||
496 | + $(this).hide(); | ||
497 | + $('#start').show(); | ||
498 | + updateConnectionStatus('disconnected'); | ||
499 | + }); | ||
500 | + | ||
501 | + // 录制功能 | ||
502 | + $('#btn_start_record').click(function() { | ||
503 | + console.log('Starting recording...'); | ||
504 | + fetch('/record', { | ||
505 | + body: JSON.stringify({ | ||
506 | + type: 'start_record', | ||
507 | + sessionid: parseInt(document.getElementById('sessionid').value), | ||
508 | + }), | ||
509 | + headers: { | ||
510 | + 'Content-Type': 'application/json' | ||
511 | + }, | ||
512 | + method: 'POST' | ||
513 | + }).then(function(response) { | ||
514 | + if (response.ok) { | ||
515 | + console.log('Recording started.'); | ||
516 | + $('#btn_start_record').prop('disabled', true); | ||
517 | + $('#btn_stop_record').prop('disabled', false); | ||
518 | + $('#recording-indicator').addClass('active'); | ||
519 | + } else { | ||
520 | + console.error('Failed to start recording.'); | ||
521 | + } | ||
522 | + }).catch(function(error) { | ||
523 | + console.error('Error:', error); | ||
524 | + }); | ||
525 | + }); | ||
526 | + | ||
527 | + $('#btn_stop_record').click(function() { | ||
528 | + console.log('Stopping recording...'); | ||
529 | + fetch('/record', { | ||
530 | + body: JSON.stringify({ | ||
531 | + type: 'end_record', | ||
532 | + sessionid: parseInt(document.getElementById('sessionid').value), | ||
533 | + }), | ||
534 | + headers: { | ||
535 | + 'Content-Type': 'application/json' | ||
536 | + }, | ||
537 | + method: 'POST' | ||
538 | + }).then(function(response) { | ||
539 | + if (response.ok) { | ||
540 | + console.log('Recording stopped.'); | ||
541 | + $('#btn_start_record').prop('disabled', false); | ||
542 | + $('#btn_stop_record').prop('disabled', true); | ||
543 | + $('#recording-indicator').removeClass('active'); | ||
544 | + } else { | ||
545 | + console.error('Failed to stop recording.'); | ||
546 | + } | ||
547 | + }).catch(function(error) { | ||
548 | + console.error('Error:', error); | ||
549 | + }); | ||
550 | + }); | ||
551 | + | ||
552 | + $('#echo-form').on('submit', function(e) { | ||
553 | + e.preventDefault(); | ||
554 | + var message = $('#message').val(); | ||
555 | + if (!message.trim()) return; | ||
556 | + | ||
557 | + console.log('Sending echo message:', message); | ||
558 | + | ||
559 | + fetch('/human', { | ||
560 | + body: JSON.stringify({ | ||
561 | + text: message, | ||
562 | + type: 'echo', | ||
563 | + interrupt: true, | ||
564 | + sessionid: parseInt(document.getElementById('sessionid').value), | ||
565 | + }), | ||
566 | + headers: { | ||
567 | + 'Content-Type': 'application/json' | ||
568 | + }, | ||
569 | + method: 'POST' | ||
570 | + }); | ||
571 | + | ||
572 | + $('#message').val(''); | ||
573 | + addChatMessage(`已发送朗读请求: "${message}"`, 'system'); | ||
574 | + }); | ||
575 | + | ||
576 | + // 聊天模式表单提交 | ||
577 | + $('#chat-form').on('submit', function(e) { | ||
578 | + e.preventDefault(); | ||
579 | + var message = $('#chat-message').val(); | ||
580 | + if (!message.trim()) return; | ||
581 | + | ||
582 | + console.log('Sending chat message:', message); | ||
583 | + | ||
584 | + fetch('/human', { | ||
585 | + body: JSON.stringify({ | ||
586 | + text: message, | ||
587 | + type: 'chat', | ||
588 | + interrupt: true, | ||
589 | + sessionid: parseInt(document.getElementById('sessionid').value), | ||
590 | + }), | ||
591 | + headers: { | ||
592 | + 'Content-Type': 'application/json' | ||
593 | + }, | ||
594 | + method: 'POST' | ||
595 | + }); | ||
596 | + | ||
597 | + addChatMessage(message, 'user'); | ||
598 | + $('#chat-message').val(''); | ||
599 | + }); | ||
600 | + | ||
601 | + // 按住说话功能 | ||
602 | + let mediaRecorder; | ||
603 | + let audioChunks = []; | ||
604 | + let isRecording = false; | ||
605 | + let recognition; | ||
606 | + | ||
607 | + // 检查浏览器是否支持语音识别 | ||
608 | + const isSpeechRecognitionSupported = 'webkitSpeechRecognition' in window || 'SpeechRecognition' in window; | ||
609 | + | ||
610 | + if (isSpeechRecognitionSupported) { | ||
611 | + recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)(); | ||
612 | + recognition.continuous = true; | ||
613 | + recognition.interimResults = true; | ||
614 | + recognition.lang = 'zh-CN'; | ||
615 | + | ||
616 | + recognition.onresult = function(event) { | ||
617 | + let interimTranscript = ''; | ||
618 | + let finalTranscript = ''; | ||
619 | + | ||
620 | + for (let i = event.resultIndex; i < event.results.length; ++i) { | ||
621 | + if (event.results[i].isFinal) { | ||
622 | + finalTranscript += event.results[i][0].transcript; | ||
623 | + } else { | ||
624 | + interimTranscript += event.results[i][0].transcript; | ||
625 | + $('#chat-message').val(interimTranscript); | ||
626 | + } | ||
627 | + } | ||
628 | + | ||
629 | + if (finalTranscript) { | ||
630 | + $('#chat-message').val(finalTranscript); | ||
631 | + } | ||
632 | + }; | ||
633 | + | ||
634 | + recognition.onerror = function(event) { | ||
635 | + console.error('语音识别错误:', event.error); | ||
636 | + }; | ||
637 | + } | ||
638 | + | ||
639 | + // 按住说话按钮事件 | ||
640 | + $('#voice-record-btn').on('mousedown touchstart', function(e) { | ||
641 | + e.preventDefault(); | ||
642 | + startRecording(); | ||
643 | + }).on('mouseup mouseleave touchend', function() { | ||
644 | + if (isRecording) { | ||
645 | + stopRecording(); | ||
646 | + } | ||
647 | + }); | ||
648 | + | ||
649 | + // 开始录音 | ||
650 | + function startRecording() { | ||
651 | + if (isRecording) return; | ||
652 | + | ||
653 | + navigator.mediaDevices.getUserMedia({ audio: true }) | ||
654 | + .then(function(stream) { | ||
655 | + audioChunks = []; | ||
656 | + mediaRecorder = new MediaRecorder(stream); | ||
657 | + | ||
658 | + mediaRecorder.ondataavailable = function(e) { | ||
659 | + if (e.data.size > 0) { | ||
660 | + audioChunks.push(e.data); | ||
661 | + } | ||
662 | + }; | ||
663 | + | ||
664 | + mediaRecorder.start(); | ||
665 | + isRecording = true; | ||
666 | + | ||
667 | + $('#voice-record-btn').addClass('recording-pulse'); | ||
668 | + $('#voice-record-btn').css('background-color', '#dc3545'); | ||
669 | + | ||
670 | + if (recognition) { | ||
671 | + recognition.start(); | ||
672 | + } | ||
673 | + }) | ||
674 | + .catch(function(error) { | ||
675 | + console.error('无法访问麦克风:', error); | ||
676 | + alert('无法访问麦克风,请检查浏览器权限设置。'); | ||
677 | + }); | ||
678 | + } | ||
679 | + | ||
680 | + function stopRecording() { | ||
681 | + if (!isRecording) return; | ||
682 | + | ||
683 | + mediaRecorder.stop(); | ||
684 | + isRecording = false; | ||
685 | + | ||
686 | + // 停止所有音轨 | ||
687 | + mediaRecorder.stream.getTracks().forEach(track => track.stop()); | ||
688 | + | ||
689 | + // 视觉反馈恢复 | ||
690 | + $('#voice-record-btn').removeClass('recording-pulse'); | ||
691 | + $('#voice-record-btn').css('background-color', ''); | ||
692 | + | ||
693 | + // 停止语音识别 | ||
694 | + if (recognition) { | ||
695 | + recognition.stop(); | ||
696 | + } | ||
697 | + | ||
698 | + // 获取识别的文本并发送 | ||
699 | + setTimeout(function() { | ||
700 | + const recognizedText = $('#chat-message').val().trim(); | ||
701 | + if (recognizedText) { | ||
702 | + // 发送识别的文本 | ||
703 | + fetch('/human', { | ||
704 | + body: JSON.stringify({ | ||
705 | + text: recognizedText, | ||
706 | + type: 'chat', | ||
707 | + interrupt: true, | ||
708 | + sessionid: parseInt(document.getElementById('sessionid').value), | ||
709 | + }), | ||
710 | + headers: { | ||
711 | + 'Content-Type': 'application/json' | ||
712 | + }, | ||
713 | + method: 'POST' | ||
714 | + }); | ||
715 | + | ||
716 | + addChatMessage(recognizedText, 'user'); | ||
717 | + $('#chat-message').val(''); | ||
718 | + } | ||
719 | + }, 500); | ||
720 | + } | ||
721 | + | ||
722 | + // WebRTC 相关功能 | ||
723 | + if (typeof window.onWebRTCConnected === 'function') { | ||
724 | + const originalOnConnected = window.onWebRTCConnected; | ||
725 | + window.onWebRTCConnected = function() { | ||
726 | + updateConnectionStatus('connected'); | ||
727 | + if (originalOnConnected) originalOnConnected(); | ||
728 | + }; | ||
729 | + } else { | ||
730 | + window.onWebRTCConnected = function() { | ||
731 | + updateConnectionStatus('connected'); | ||
732 | + }; | ||
733 | + } | ||
734 | + | ||
735 | + // 当连接断开时更新状态 | ||
736 | + if (typeof window.onWebRTCDisconnected === 'function') { | ||
737 | + const originalOnDisconnected = window.onWebRTCDisconnected; | ||
738 | + window.onWebRTCDisconnected = function() { | ||
739 | + updateConnectionStatus('disconnected'); | ||
740 | + if (originalOnDisconnected) originalOnDisconnected(); | ||
741 | + }; | ||
742 | + } else { | ||
743 | + window.onWebRTCDisconnected = function() { | ||
744 | + updateConnectionStatus('disconnected'); | ||
745 | + }; | ||
746 | + } | ||
747 | + | ||
748 | + // SRS WebRTC播放功能 | ||
749 | + var sdk = null; // 全局处理器,用于在重新发布时进行清理 | ||
750 | + | ||
751 | + function startPlay() { | ||
752 | + // 关闭之前的连接 | ||
753 | + if (sdk) { | ||
754 | + sdk.close(); | ||
755 | + } | ||
756 | + | ||
757 | + sdk = new SrsRtcWhipWhepAsync(); | ||
758 | + $('#video').prop('srcObject', sdk.stream); | ||
759 | + | ||
760 | + var host = window.location.hostname; | ||
761 | + var url = "http://" + host + ":1985/rtc/v1/whep/?app=live&stream=livestream"; | ||
762 | + | ||
763 | + sdk.play(url).then(function(session) { | ||
764 | + console.log('WebRTC播放已启动,会话ID:', session.sessionid); | ||
765 | + }).catch(function(reason) { | ||
766 | + sdk.close(); | ||
767 | + console.error('WebRTC播放失败:', reason); | ||
768 | + }); | ||
769 | + } | ||
770 | + }); | ||
771 | + </script> | ||
772 | +</body> | ||
773 | +</html> |
-
Please register or login to post a comment