同步github官方更新截止Commits on Apr 18, 2025
a9c36c76e569107b5a39b3de8afd6e016b24d662
Showing
23 changed files
with
1003 additions
and
99 deletions
README-EN.md
0 → 100644
| 1 | +Real-time interactive streaming digital human enables synchronous audio and video dialogue. It can basically achieve commercial effects. | ||
| 2 | + | ||
| 3 | +[Effect of wav2lip](https://www.bilibili.com/video/BV1scwBeyELA/) | [Effect of ernerf](https://www.bilibili.com/video/BV1G1421z73r/) | [Effect of musetalk](https://www.bilibili.com/video/BV1gm421N7vQ/) | ||
| 4 | + | ||
| 5 | +## News | ||
| 6 | +- December 8, 2024: Improved multi-concurrency, and the video memory does not increase with the number of concurrent connections. | ||
| 7 | +- December 21, 2024: Added model warm-up for wav2lip and musetalk to solve the problem of stuttering during the first inference. Thanks to [@heimaojinzhangyz](https://github.com/heimaojinzhangyz) | ||
| 8 | +- December 28, 2024: Added the digital human model Ultralight-Digital-Human. Thanks to [@lijihua2017](https://github.com/lijihua2017) | ||
| 9 | +- February 7, 2025: Added fish-speech tts | ||
| 10 | +- February 21, 2025: Added the open-source model wav2lip256. Thanks to @不蠢不蠢 | ||
| 11 | +- March 2, 2025: Added Tencent's speech synthesis service | ||
| 12 | +- March 16, 2025: Supports mac gpu inference. Thanks to [@GcsSloop](https://github.com/GcsSloop) | ||
| 13 | + | ||
| 14 | +## Features | ||
| 15 | +1. Supports multiple digital human models: ernerf, musetalk, wav2lip, Ultralight-Digital-Human | ||
| 16 | +2. Supports voice cloning | ||
| 17 | +3. Supports interrupting the digital human while it is speaking | ||
| 18 | +4. Supports full-body video stitching | ||
| 19 | +5. Supports rtmp and webrtc | ||
| 20 | +6. Supports video arrangement: Play custom videos when not speaking | ||
| 21 | +7. Supports multi-concurrency | ||
| 22 | + | ||
| 23 | +## 1. Installation | ||
| 24 | + | ||
| 25 | +Tested on Ubuntu 20.04, Python 3.10, Pytorch 1.12 and CUDA 11.3 | ||
| 26 | + | ||
| 27 | +### 1.1 Install dependency | ||
| 28 | + | ||
| 29 | +```bash | ||
| 30 | +conda create -n nerfstream python=3.10 | ||
| 31 | +conda activate nerfstream | ||
| 32 | +# If the cuda version is not 11.3 (confirm the version by running nvidia-smi), install the corresponding version of pytorch according to <https://pytorch.org/get-started/previous-versions/> | ||
| 33 | +conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch | ||
| 34 | +pip install -r requirements.txt | ||
| 35 | +# If you need to train the ernerf model, install the following libraries | ||
| 36 | +# pip install "git+https://github.com/facebookresearch/pytorch3d.git" | ||
| 37 | +# pip install tensorflow-gpu==2.8.0 | ||
| 38 | +# pip install --upgrade "protobuf<=3.20.1" | ||
| 39 | +``` | ||
| 40 | +Common installation issues [FAQ](https://livetalking-doc.readthedocs.io/en/latest/faq.html) | ||
| 41 | +For setting up the linux cuda environment, you can refer to this article https://zhuanlan.zhihu.com/p/674972886 | ||
| 42 | + | ||
| 43 | + | ||
| 44 | +## 2. Quick Start | ||
| 45 | +- Download the models | ||
| 46 | +Quark Cloud Disk <https://pan.quark.cn/s/83a750323ef0> | ||
| 47 | +Google Drive <https://drive.google.com/drive/folders/1FOC_MD6wdogyyX_7V1d4NDIO7P9NlSAJ?usp=sharing> | ||
| 48 | +Copy wav2lip256.pth to the models folder of this project and rename it to wav2lip.pth; | ||
| 49 | +Extract wav2lip256_avatar1.tar.gz and copy the entire folder to the data/avatars folder of this project. | ||
| 50 | +- Run | ||
| 51 | +python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1 | ||
| 52 | +Open http://serverip:8010/webrtcapi.html in a browser. First click'start' to play the digital human video; then enter any text in the text box and submit it. The digital human will broadcast this text. | ||
| 53 | +<font color=red>The server side needs to open ports tcp:8010; udp:1-65536</font> | ||
| 54 | +If you need to purchase a high-definition wav2lip model for commercial use, [Link](https://livetalking-doc.readthedocs.io/zh-cn/latest/service.html#wav2lip). | ||
| 55 | + | ||
| 56 | +- Quick experience | ||
| 57 | +<https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking1.3> Create an instance with this image to run it. | ||
| 58 | + | ||
| 59 | +If you can't access huggingface, before running | ||
| 60 | +``` | ||
| 61 | +export HF_ENDPOINT=https://hf-mirror.com | ||
| 62 | +``` | ||
| 63 | + | ||
| 64 | + | ||
| 65 | +## 3. More Usage | ||
| 66 | +Usage instructions: <https://livetalking-doc.readthedocs.io/en/latest> | ||
| 67 | + | ||
| 68 | +## 4. Docker Run | ||
| 69 | +No need for the previous installation, just run directly. | ||
| 70 | +``` | ||
| 71 | +docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/codewithgpu2/lipku-metahuman-stream:2K9qaMBu8v | ||
| 72 | +``` | ||
| 73 | +The code is in /root/metahuman-stream. First, git pull to get the latest code, and then execute the commands as in steps 2 and 3. | ||
| 74 | + | ||
| 75 | +The following images are provided: | ||
| 76 | +- autodl image: <https://www.codewithgpu.com/i/lipku/metahuman-stream/base> | ||
| 77 | +[autodl Tutorial](https://livetalking-doc.readthedocs.io/en/latest/autodl/README.html) | ||
| 78 | +- ucloud image: <https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_livetalking1.3> | ||
| 79 | +Any port can be opened, and there is no need to deploy an srs service additionally. | ||
| 80 | +[ucloud Tutorial](https://livetalking-doc.readthedocs.io/en/latest/ucloud/ucloud.html) | ||
| 81 | + | ||
| 82 | + | ||
| 83 | +## 5. TODO | ||
| 84 | +- [x] Added chatgpt to enable digital human dialogue | ||
| 85 | +- [x] Voice cloning | ||
| 86 | +- [x] Replace the digital human with a video when it is silent | ||
| 87 | +- [x] MuseTalk | ||
| 88 | +- [x] Wav2Lip | ||
| 89 | +- [x] Ultralight-Digital-Human | ||
| 90 | + | ||
| 91 | +--- | ||
| 92 | +If this project is helpful to you, please give it a star. Friends who are interested are also welcome to join in and improve this project together. | ||
| 93 | +* Knowledge Planet: https://t.zsxq.com/7NMyO, where high-quality common problems, best practice experiences, and problem solutions are accumulated. | ||
| 94 | +* WeChat Official Account: Digital Human Technology | ||
| 95 | + |
| 1 | -Real time interactive streaming digital human, realize audio video synchronous dialogue. It can basically achieve commercial effects. | 1 | +[English](./README-EN.md) | 中文版 |
| 2 | 实时交互流式数字人,实现音视频同步对话。基本可以达到商用效果 | 2 | 实时交互流式数字人,实现音视频同步对话。基本可以达到商用效果 |
| 3 | +[wav2lip效果](https://www.bilibili.com/video/BV1scwBeyELA/) | [ernerf效果](https://www.bilibili.com/video/BV1G1421z73r/) | [musetalk效果](https://www.bilibili.com/video/BV1gm421N7vQ/) | ||
| 3 | 4 | ||
| 4 | -[ernerf 效果](https://www.bilibili.com/video/BV1PM4m1y7Q2/) [musetalk 效果](https://www.bilibili.com/video/BV1gm421N7vQ/) [wav2lip 效果](https://www.bilibili.com/video/BV1Bw4m1e74P/) | ||
| 5 | - | ||
| 6 | -## 为避免与 3d 数字人混淆,原项目 metahuman-stream 改名为 livetalking,原有链接地址继续可用 | 5 | +## 为避免与3d数字人混淆,原项目metahuman-stream改名为livetalking,原有链接地址继续可用 |
| 7 | 6 | ||
| 8 | ## News | 7 | ## News |
| 9 | - | ||
| 10 | - 2024.12.8 完善多并发,显存不随并发数增加 | 8 | - 2024.12.8 完善多并发,显存不随并发数增加 |
| 11 | -- 2024.12.21 添加 wav2lip、musetalk 模型预热,解决第一次推理卡顿问题。感谢@heimaojinzhangyz | ||
| 12 | -- 2024.12.28 添加数字人模型 Ultralight-Digital-Human。 感谢@lijihua2017 | ||
| 13 | -- 2025.2.7 添加 fish-speech tts | ||
| 14 | -- 2025.2.21 添加 wav2lip256 开源模型 感谢@不蠢不蠢 | 9 | +- 2024.12.21 添加wav2lip、musetalk模型预热,解决第一次推理卡顿问题。感谢[@heimaojinzhangyz](https://github.com/heimaojinzhangyz) |
| 10 | +- 2024.12.28 添加数字人模型Ultralight-Digital-Human。 感谢[@lijihua2017](https://github.com/lijihua2017) | ||
| 11 | +- 2025.2.7 添加fish-speech tts | ||
| 12 | +- 2025.2.21 添加wav2lip256开源模型 感谢@不蠢不蠢 | ||
| 15 | - 2025.3.2 添加腾讯语音合成服务 | 13 | - 2025.3.2 添加腾讯语音合成服务 |
| 14 | +- 2025.3.16 支持mac gpu推理,感谢[@GcsSloop](https://github.com/GcsSloop) | ||
| 16 | 15 | ||
| 17 | ## Features | 16 | ## Features |
| 18 | - | ||
| 19 | 1. 支持多种数字人模型: ernerf、musetalk、wav2lip、Ultralight-Digital-Human | 17 | 1. 支持多种数字人模型: ernerf、musetalk、wav2lip、Ultralight-Digital-Human |
| 20 | 2. 支持声音克隆 | 18 | 2. 支持声音克隆 |
| 21 | 3. 支持数字人说话被打断 | 19 | 3. 支持数字人说话被打断 |
| 22 | 4. 支持全身视频拼接 | 20 | 4. 支持全身视频拼接 |
| 23 | -5. 支持 rtmp 和 webrtc | 21 | +5. 支持rtmp和webrtc |
| 24 | 6. 支持视频编排:不说话时播放自定义视频 | 22 | 6. 支持视频编排:不说话时播放自定义视频 |
| 25 | 7. 支持多并发 | 23 | 7. 支持多并发 |
| 26 | 24 | ||
| @@ -41,59 +39,53 @@ pip install -r requirements.txt | @@ -41,59 +39,53 @@ pip install -r requirements.txt | ||
| 41 | # pip install tensorflow-gpu==2.8.0 | 39 | # pip install tensorflow-gpu==2.8.0 |
| 42 | # pip install --upgrade "protobuf<=3.20.1" | 40 | # pip install --upgrade "protobuf<=3.20.1" |
| 43 | ``` | 41 | ``` |
| 44 | - | ||
| 45 | 安装常见问题[FAQ](https://livetalking-doc.readthedocs.io/en/latest/faq.html) | 42 | 安装常见问题[FAQ](https://livetalking-doc.readthedocs.io/en/latest/faq.html) |
| 46 | -linux cuda 环境搭建可以参考这篇文章 https://zhuanlan.zhihu.com/p/674972886 | 43 | +linux cuda环境搭建可以参考这篇文章 https://zhuanlan.zhihu.com/p/674972886 |
| 47 | 44 | ||
| 48 | -## 2. Quick Start | ||
| 49 | 45 | ||
| 46 | +## 2. Quick Start | ||
| 50 | - 下载模型 | 47 | - 下载模型 |
| 51 | - 百度云盘<https://pan.baidu.com/s/1yOsQ06-RIDTJd3HFCw4wtA> 密码: ltua | 48 | + 夸克云盘<https://pan.quark.cn/s/83a750323ef0> |
| 52 | GoogleDriver <https://drive.google.com/drive/folders/1FOC_MD6wdogyyX_7V1d4NDIO7P9NlSAJ?usp=sharing> | 49 | GoogleDriver <https://drive.google.com/drive/folders/1FOC_MD6wdogyyX_7V1d4NDIO7P9NlSAJ?usp=sharing> |
| 53 | - 将 wav2lip256.pth 拷到本项目的 models 下, 重命名为 wav2lip.pth; | ||
| 54 | - 将 wav2lip256_avatar1.tar.gz 解压后整个文件夹拷到本项目的 data/avatars 下 | 50 | + 将wav2lip256.pth拷到本项目的models下, 重命名为wav2lip.pth; |
| 51 | + 将wav2lip256_avatar1.tar.gz解压后整个文件夹拷到本项目的data/avatars下 | ||
| 55 | - 运行 | 52 | - 运行 |
| 56 | python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1 --preload 2 | 53 | python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1 --preload 2 |
| 57 | - | ||
| 58 | 使用 GPU 启动模特 3 号:python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar3 --preload 2 | 54 | 使用 GPU 启动模特 3 号:python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar3 --preload 2 |
| 59 | - 用浏览器打开 http://serverip:8010/webrtcapi.html , 先点‘start',播放数字人视频;然后在文本框输入任意文字,提交。数字人播报该段文字 | 55 | + |
| 56 | +用浏览器打开http://serverip:8010/webrtcapi.html , 先点‘start',播放数字人视频;然后在文本框输入任意文字,提交。数字人播报该段文字 | ||
| 60 | <font color=red>服务端需要开放端口 tcp:8010; udp:1-65536 </font> | 57 | <font color=red>服务端需要开放端口 tcp:8010; udp:1-65536 </font> |
| 61 | - 如果需要商用高清 wav2lip 模型,可以与我联系购买 | 58 | + 如果需要商用高清wav2lip模型,[链接](https://livetalking-doc.readthedocs.io/zh-cn/latest/service.html#wav2lip) |
| 62 | 59 | ||
| 63 | - 快速体验 | 60 | - 快速体验 |
| 64 | <https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking1.3> 用该镜像创建实例即可运行成功 | 61 | <https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking1.3> 用该镜像创建实例即可运行成功 |
| 65 | 62 | ||
| 66 | -如果访问不了 huggingface,在运行前 | ||
| 67 | - | 63 | +如果访问不了huggingface,在运行前 |
| 68 | ``` | 64 | ``` |
| 69 | export HF_ENDPOINT=https://hf-mirror.com | 65 | export HF_ENDPOINT=https://hf-mirror.com |
| 70 | ``` | 66 | ``` |
| 71 | 67 | ||
| 72 | -## 3. More Usage | ||
| 73 | 68 | ||
| 69 | +## 3. More Usage | ||
| 74 | 使用说明: <https://livetalking-doc.readthedocs.io/> | 70 | 使用说明: <https://livetalking-doc.readthedocs.io/> |
| 75 | 71 | ||
| 76 | ## 4. Docker Run | 72 | ## 4. Docker Run |
| 77 | - | ||
| 78 | 不需要前面的安装,直接运行。 | 73 | 不需要前面的安装,直接运行。 |
| 79 | - | ||
| 80 | ``` | 74 | ``` |
| 81 | docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/codewithgpu2/lipku-metahuman-stream:2K9qaMBu8v | 75 | docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/codewithgpu2/lipku-metahuman-stream:2K9qaMBu8v |
| 82 | ``` | 76 | ``` |
| 83 | - | ||
| 84 | -代码在/root/metahuman-stream,先 git pull 拉一下最新代码,然后执行命令同第 2、3 步 | 77 | +代码在/root/metahuman-stream,先git pull拉一下最新代码,然后执行命令同第2、3步 |
| 85 | 78 | ||
| 86 | 提供如下镜像 | 79 | 提供如下镜像 |
| 80 | +- autodl镜像: <https://www.codewithgpu.com/i/lipku/metahuman-stream/base> | ||
| 81 | + [autodl教程](https://livetalking-doc.readthedocs.io/en/latest/autodl/README.html) | ||
| 82 | +- ucloud镜像: <https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_livetalking1.3> | ||
| 83 | + 可以开放任意端口,不需要另外部署srs服务. | ||
| 84 | + [ucloud教程](https://livetalking-doc.readthedocs.io/en/latest/ucloud/ucloud.html) | ||
| 87 | 85 | ||
| 88 | -- autodl 镜像: <https://www.codewithgpu.com/i/lipku/metahuman-stream/base> | ||
| 89 | - [autodl 教程](https://livetalking-doc.readthedocs.io/en/latest/autodl/README.html) | ||
| 90 | -- ucloud 镜像: <https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_livetalking1.3> | ||
| 91 | - 可以开放任意端口,不需要另外部署 srs 服务. | ||
| 92 | - [ucloud 教程](https://livetalking-doc.readthedocs.io/en/latest/ucloud/ucloud.html) | ||
| 93 | 86 | ||
| 94 | ## 5. TODO | 87 | ## 5. TODO |
| 95 | - | ||
| 96 | -- [x] 添加 chatgpt 实现数字人对话 | 88 | +- [x] 添加chatgpt实现数字人对话 |
| 97 | - [x] 声音克隆 | 89 | - [x] 声音克隆 |
| 98 | - [x] 数字人静音时用一段视频代替 | 90 | - [x] 数字人静音时用一段视频代替 |
| 99 | - [x] MuseTalk | 91 | - [x] MuseTalk |
| @@ -101,9 +93,8 @@ docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/c | @@ -101,9 +93,8 @@ docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/c | ||
| 101 | - [x] Ultralight-Digital-Human | 93 | - [x] Ultralight-Digital-Human |
| 102 | 94 | ||
| 103 | --- | 95 | --- |
| 96 | +如果本项目对你有帮助,帮忙点个star。也欢迎感兴趣的朋友一起来完善该项目. | ||
| 97 | +* 知识星球: https://t.zsxq.com/7NMyO 沉淀高质量常见问题、最佳实践经验、问题解答 | ||
| 98 | +* 微信公众号:数字人技术 | ||
| 99 | +  | ||
| 104 | 100 | ||
| 105 | -如果本项目对你有帮助,帮忙点个 star。也欢迎感兴趣的朋友一起来完善该项目. | ||
| 106 | - | ||
| 107 | -- 知识星球: https://t.zsxq.com/7NMyO 沉淀高质量常见问题、最佳实践经验、问题解答 | ||
| 108 | -- 微信公众号:数字人技术 | ||
| 109 | -  |
| @@ -201,7 +201,7 @@ async def set_audiotype(request): | @@ -201,7 +201,7 @@ async def set_audiotype(request): | ||
| 201 | params = await request.json() | 201 | params = await request.json() |
| 202 | 202 | ||
| 203 | sessionid = params.get('sessionid',0) | 203 | sessionid = params.get('sessionid',0) |
| 204 | - nerfreals[sessionid].set_curr_state(params['audiotype'],params['reinit']) | 204 | + nerfreals[sessionid].set_custom_state(params['audiotype'],params['reinit']) |
| 205 | 205 | ||
| 206 | return web.Response( | 206 | return web.Response( |
| 207 | content_type="application/json", | 207 | content_type="application/json", |
| @@ -495,6 +495,8 @@ if __name__ == '__main__': | @@ -495,6 +495,8 @@ if __name__ == '__main__': | ||
| 495 | elif opt.transport=='rtcpush': | 495 | elif opt.transport=='rtcpush': |
| 496 | pagename='rtcpushapi.html' | 496 | pagename='rtcpushapi.html' |
| 497 | logger.info('start http server; http://<serverip>:'+str(opt.listenport)+'/'+pagename) | 497 | logger.info('start http server; http://<serverip>:'+str(opt.listenport)+'/'+pagename) |
| 498 | + logger.info('如果使用webrtc,推荐访问webrtc集成前端: http://<serverip>:'+str(opt.listenport)+'/dashboard.html') | ||
| 499 | + | ||
| 498 | def run_server(runner): | 500 | def run_server(runner): |
| 499 | loop = asyncio.new_event_loop() | 501 | loop = asyncio.new_event_loop() |
| 500 | asyncio.set_event_loop(loop) | 502 | asyncio.set_event_loop(loop) |
| @@ -35,7 +35,7 @@ import soundfile as sf | @@ -35,7 +35,7 @@ import soundfile as sf | ||
| 35 | import av | 35 | import av |
| 36 | from fractions import Fraction | 36 | from fractions import Fraction |
| 37 | 37 | ||
| 38 | -from ttsreal import EdgeTTS,VoitsTTS,XTTS,CosyVoiceTTS,FishTTS,TencentTTS | 38 | +from ttsreal import EdgeTTS,SovitsTTS,XTTS,CosyVoiceTTS,FishTTS,TencentTTS |
| 39 | from logger import logger | 39 | from logger import logger |
| 40 | 40 | ||
| 41 | from tqdm import tqdm | 41 | from tqdm import tqdm |
| @@ -57,7 +57,7 @@ class BaseReal: | @@ -57,7 +57,7 @@ class BaseReal: | ||
| 57 | if opt.tts == "edgetts": | 57 | if opt.tts == "edgetts": |
| 58 | self.tts = EdgeTTS(opt,self) | 58 | self.tts = EdgeTTS(opt,self) |
| 59 | elif opt.tts == "gpt-sovits": | 59 | elif opt.tts == "gpt-sovits": |
| 60 | - self.tts = VoitsTTS(opt,self) | 60 | + self.tts = SovitsTTS(opt,self) |
| 61 | elif opt.tts == "xtts": | 61 | elif opt.tts == "xtts": |
| 62 | self.tts = XTTS(opt,self) | 62 | self.tts = XTTS(opt,self) |
| 63 | elif opt.tts == "cosyvoice": | 63 | elif opt.tts == "cosyvoice": |
| @@ -262,8 +262,8 @@ class BaseReal: | @@ -262,8 +262,8 @@ class BaseReal: | ||
| 262 | self.curr_state = 1 #当前视频不循环播放,切换到静音状态 | 262 | self.curr_state = 1 #当前视频不循环播放,切换到静音状态 |
| 263 | return stream | 263 | return stream |
| 264 | 264 | ||
| 265 | - def set_curr_state(self,audiotype, reinit): | ||
| 266 | - print('set_curr_state:',audiotype) | 265 | + def set_custom_state(self,audiotype, reinit=True): |
| 266 | + print('set_custom_state:',audiotype) | ||
| 267 | self.curr_state = audiotype | 267 | self.curr_state = audiotype |
| 268 | if reinit: | 268 | if reinit: |
| 269 | self.custom_audio_index[audiotype] = 0 | 269 | self.custom_audio_index[audiotype] = 0 |
| @@ -179,8 +179,11 @@ print(f'[INFO] fitting light...') | @@ -179,8 +179,11 @@ print(f'[INFO] fitting light...') | ||
| 179 | 179 | ||
| 180 | batch_size = 32 | 180 | batch_size = 32 |
| 181 | 181 | ||
| 182 | -device_default = torch.device("cuda:0") | ||
| 183 | -device_render = torch.device("cuda:0") | 182 | +device_default = torch.device("cuda:0" if torch.cuda.is_available() else ( |
| 183 | + "mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) | ||
| 184 | +device_render = torch.device("cuda:0" if torch.cuda.is_available() else ( | ||
| 185 | + "mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) | ||
| 186 | + | ||
| 184 | renderer = Render_3DMM(arg_focal, h, w, batch_size, device_render) | 187 | renderer = Render_3DMM(arg_focal, h, w, batch_size, device_render) |
| 185 | 188 | ||
| 186 | sel_ids = np.arange(0, num_frames, int(num_frames / batch_size))[:batch_size] | 189 | sel_ids = np.arange(0, num_frames, int(num_frames / batch_size))[:batch_size] |
| @@ -83,7 +83,7 @@ class Render_3DMM(nn.Module): | @@ -83,7 +83,7 @@ class Render_3DMM(nn.Module): | ||
| 83 | img_h=500, | 83 | img_h=500, |
| 84 | img_w=500, | 84 | img_w=500, |
| 85 | batch_size=1, | 85 | batch_size=1, |
| 86 | - device=torch.device("cuda:0"), | 86 | + device=torch.device("cuda:0" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")), |
| 87 | ): | 87 | ): |
| 88 | super(Render_3DMM, self).__init__() | 88 | super(Render_3DMM, self).__init__() |
| 89 | 89 |
| @@ -147,7 +147,7 @@ if __name__ == '__main__': | @@ -147,7 +147,7 @@ if __name__ == '__main__': | ||
| 147 | 147 | ||
| 148 | seed_everything(opt.seed) | 148 | seed_everything(opt.seed) |
| 149 | 149 | ||
| 150 | - device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') | 150 | + device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) |
| 151 | 151 | ||
| 152 | model = NeRFNetwork(opt) | 152 | model = NeRFNetwork(opt) |
| 153 | 153 |
| @@ -442,7 +442,7 @@ class LPIPSMeter: | @@ -442,7 +442,7 @@ class LPIPSMeter: | ||
| 442 | self.N = 0 | 442 | self.N = 0 |
| 443 | self.net = net | 443 | self.net = net |
| 444 | 444 | ||
| 445 | - self.device = device if device is not None else torch.device('cuda' if torch.cuda.is_available() else 'cpu') | 445 | + self.device = device if device is not None else torch.device('cuda' if torch.cuda.is_available() else ('mps' if hasattr(torch.backends, "mps") and torch.backends.mps.is_available() else 'cpu')) |
| 446 | self.fn = lpips.LPIPS(net=net).eval().to(self.device) | 446 | self.fn = lpips.LPIPS(net=net).eval().to(self.device) |
| 447 | 447 | ||
| 448 | def clear(self): | 448 | def clear(self): |
| @@ -618,7 +618,11 @@ class Trainer(object): | @@ -618,7 +618,11 @@ class Trainer(object): | ||
| 618 | self.flip_init_lips = self.opt.init_lips | 618 | self.flip_init_lips = self.opt.init_lips |
| 619 | self.time_stamp = time.strftime("%Y-%m-%d_%H-%M-%S") | 619 | self.time_stamp = time.strftime("%Y-%m-%d_%H-%M-%S") |
| 620 | self.scheduler_update_every_step = scheduler_update_every_step | 620 | self.scheduler_update_every_step = scheduler_update_every_step |
| 621 | - self.device = device if device is not None else torch.device(f'cuda:{local_rank}' if torch.cuda.is_available() else 'cpu') | 621 | + self.device = device if device is not None else torch.device( |
| 622 | + f'cuda:{local_rank}' if torch.cuda.is_available() else ( | ||
| 623 | + 'mps' if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else 'cpu' | ||
| 624 | + ) | ||
| 625 | + ) | ||
| 622 | self.console = Console() | 626 | self.console = Console() |
| 623 | 627 | ||
| 624 | model.to(self.device) | 628 | model.to(self.device) |
| @@ -56,10 +56,8 @@ from ultralight.unet import Model | @@ -56,10 +56,8 @@ from ultralight.unet import Model | ||
| 56 | from ultralight.audio2feature import Audio2Feature | 56 | from ultralight.audio2feature import Audio2Feature |
| 57 | from logger import logger | 57 | from logger import logger |
| 58 | 58 | ||
| 59 | - | ||
| 60 | -device = 'cuda' if torch.cuda.is_available() else 'cpu' | ||
| 61 | -logger.info('Using {} for inference.'.format(device)) | ||
| 62 | - | 59 | +device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu") |
| 60 | +print('Using {} for inference.'.format(device)) | ||
| 63 | 61 | ||
| 64 | def load_model(opt): | 62 | def load_model(opt): |
| 65 | audio_processor = Audio2Feature() | 63 | audio_processor = Audio2Feature() |
| @@ -44,8 +44,8 @@ from basereal import BaseReal | @@ -44,8 +44,8 @@ from basereal import BaseReal | ||
| 44 | from tqdm import tqdm | 44 | from tqdm import tqdm |
| 45 | from logger import logger | 45 | from logger import logger |
| 46 | 46 | ||
| 47 | -device = 'cuda' if torch.cuda.is_available() else 'cpu' | ||
| 48 | -logger.info('Using {} for inference.'.format(device)) | 47 | +device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu") |
| 48 | +print('Using {} for inference.'.format(device)) | ||
| 49 | 49 | ||
| 50 | def _load(checkpoint_path): | 50 | def _load(checkpoint_path): |
| 51 | if device == 'cuda': | 51 | if device == 'cuda': |
| @@ -51,7 +51,7 @@ from logger import logger | @@ -51,7 +51,7 @@ from logger import logger | ||
| 51 | def load_model(): | 51 | def load_model(): |
| 52 | # load model weights | 52 | # load model weights |
| 53 | audio_processor,vae, unet, pe = load_all_model() | 53 | audio_processor,vae, unet, pe = load_all_model() |
| 54 | - device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | 54 | + device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) |
| 55 | timesteps = torch.tensor([0], device=device) | 55 | timesteps = torch.tensor([0], device=device) |
| 56 | pe = pe.half() | 56 | pe = pe.half() |
| 57 | vae.vae = vae.vae.half() | 57 | vae.vae = vae.vae.half() |
| @@ -267,23 +267,50 @@ class MuseReal(BaseReal): | @@ -267,23 +267,50 @@ class MuseReal(BaseReal): | ||
| 267 | 267 | ||
| 268 | 268 | ||
| 269 | def process_frames(self,quit_event,loop=None,audio_track=None,video_track=None): | 269 | def process_frames(self,quit_event,loop=None,audio_track=None,video_track=None): |
| 270 | + enable_transition = True # 设置为False禁用过渡效果,True启用 | ||
| 271 | + | ||
| 272 | + if enable_transition: | ||
| 273 | + self.last_speaking = False | ||
| 274 | + self.transition_start = time.time() | ||
| 275 | + self.transition_duration = 0.1 # 过渡时间 | ||
| 276 | + self.last_silent_frame = None # 静音帧缓存 | ||
| 277 | + self.last_speaking_frame = None # 说话帧缓存 | ||
| 270 | 278 | ||
| 271 | while not quit_event.is_set(): | 279 | while not quit_event.is_set(): |
| 272 | try: | 280 | try: |
| 273 | res_frame,idx,audio_frames = self.res_frame_queue.get(block=True, timeout=1) | 281 | res_frame,idx,audio_frames = self.res_frame_queue.get(block=True, timeout=1) |
| 274 | except queue.Empty: | 282 | except queue.Empty: |
| 275 | continue | 283 | continue |
| 276 | - if audio_frames[0][1]!=0 and audio_frames[1][1]!=0: #全为静音数据,只需要取fullimg | 284 | + |
| 285 | + if enable_transition: | ||
| 286 | + # 检测状态变化 | ||
| 287 | + current_speaking = not (audio_frames[0][1]!=0 and audio_frames[1][1]!=0) | ||
| 288 | + if current_speaking != self.last_speaking: | ||
| 289 | + logger.info(f"状态切换:{'说话' if self.last_speaking else '静音'} → {'说话' if current_speaking else '静音'}") | ||
| 290 | + self.transition_start = time.time() | ||
| 291 | + self.last_speaking = current_speaking | ||
| 292 | + | ||
| 293 | + if audio_frames[0][1]!=0 and audio_frames[1][1]!=0: | ||
| 277 | self.speaking = False | 294 | self.speaking = False |
| 278 | audiotype = audio_frames[0][1] | 295 | audiotype = audio_frames[0][1] |
| 279 | - if self.custom_index.get(audiotype) is not None: #有自定义视频 | 296 | + if self.custom_index.get(audiotype) is not None: |
| 280 | mirindex = self.mirror_index(len(self.custom_img_cycle[audiotype]),self.custom_index[audiotype]) | 297 | mirindex = self.mirror_index(len(self.custom_img_cycle[audiotype]),self.custom_index[audiotype]) |
| 281 | - combine_frame = self.custom_img_cycle[audiotype][mirindex] | 298 | + target_frame = self.custom_img_cycle[audiotype][mirindex] |
| 282 | self.custom_index[audiotype] += 1 | 299 | self.custom_index[audiotype] += 1 |
| 283 | - # if not self.custom_opt[audiotype].loop and self.custom_index[audiotype]>=len(self.custom_img_cycle[audiotype]): | ||
| 284 | - # self.curr_state = 1 #当前视频不循环播放,切换到静音状态 | ||
| 285 | else: | 300 | else: |
| 286 | - combine_frame = self.frame_list_cycle[idx] | 301 | + target_frame = self.frame_list_cycle[idx] |
| 302 | + | ||
| 303 | + if enable_transition: | ||
| 304 | + # 说话→静音过渡 | ||
| 305 | + if time.time() - self.transition_start < self.transition_duration and self.last_speaking_frame is not None: | ||
| 306 | + alpha = min(1.0, (time.time() - self.transition_start) / self.transition_duration) | ||
| 307 | + combine_frame = cv2.addWeighted(self.last_speaking_frame, 1-alpha, target_frame, alpha, 0) | ||
| 308 | + else: | ||
| 309 | + combine_frame = target_frame | ||
| 310 | + # 缓存静音帧 | ||
| 311 | + self.last_silent_frame = combine_frame.copy() | ||
| 312 | + else: | ||
| 313 | + combine_frame = target_frame | ||
| 287 | else: | 314 | else: |
| 288 | self.speaking = True | 315 | self.speaking = True |
| 289 | bbox = self.coord_list_cycle[idx] | 316 | bbox = self.coord_list_cycle[idx] |
| @@ -291,20 +318,29 @@ class MuseReal(BaseReal): | @@ -291,20 +318,29 @@ class MuseReal(BaseReal): | ||
| 291 | x1, y1, x2, y2 = bbox | 318 | x1, y1, x2, y2 = bbox |
| 292 | try: | 319 | try: |
| 293 | res_frame = cv2.resize(res_frame.astype(np.uint8),(x2-x1,y2-y1)) | 320 | res_frame = cv2.resize(res_frame.astype(np.uint8),(x2-x1,y2-y1)) |
| 294 | - except: | 321 | + except Exception as e: |
| 322 | + logger.warning(f"resize error: {e}") | ||
| 295 | continue | 323 | continue |
| 296 | mask = self.mask_list_cycle[idx] | 324 | mask = self.mask_list_cycle[idx] |
| 297 | mask_crop_box = self.mask_coords_list_cycle[idx] | 325 | mask_crop_box = self.mask_coords_list_cycle[idx] |
| 298 | - #combine_frame = get_image(ori_frame,res_frame,bbox) | ||
| 299 | - #t=time.perf_counter() | ||
| 300 | - combine_frame = get_image_blending(ori_frame,res_frame,bbox,mask,mask_crop_box) | ||
| 301 | - #print('blending time:',time.perf_counter()-t) | ||
| 302 | 326 | ||
| 303 | - image = combine_frame #(outputs['image'] * 255).astype(np.uint8) | 327 | + current_frame = get_image_blending(ori_frame,res_frame,bbox,mask,mask_crop_box) |
| 328 | + if enable_transition: | ||
| 329 | + # 静音→说话过渡 | ||
| 330 | + if time.time() - self.transition_start < self.transition_duration and self.last_silent_frame is not None: | ||
| 331 | + alpha = min(1.0, (time.time() - self.transition_start) / self.transition_duration) | ||
| 332 | + combine_frame = cv2.addWeighted(self.last_silent_frame, 1-alpha, current_frame, alpha, 0) | ||
| 333 | + else: | ||
| 334 | + combine_frame = current_frame | ||
| 335 | + # 缓存说话帧 | ||
| 336 | + self.last_speaking_frame = combine_frame.copy() | ||
| 337 | + else: | ||
| 338 | + combine_frame = current_frame | ||
| 339 | + | ||
| 340 | + image = combine_frame | ||
| 304 | new_frame = VideoFrame.from_ndarray(image, format="bgr24") | 341 | new_frame = VideoFrame.from_ndarray(image, format="bgr24") |
| 305 | asyncio.run_coroutine_threadsafe(video_track._queue.put((new_frame,None)), loop) | 342 | asyncio.run_coroutine_threadsafe(video_track._queue.put((new_frame,None)), loop) |
| 306 | self.record_video_data(image) | 343 | self.record_video_data(image) |
| 307 | - #self.recordq_video.put(new_frame) | ||
| 308 | 344 | ||
| 309 | for audio_frame in audio_frames: | 345 | for audio_frame in audio_frames: |
| 310 | frame,type,eventpoint = audio_frame | 346 | frame,type,eventpoint = audio_frame |
| @@ -312,12 +348,8 @@ class MuseReal(BaseReal): | @@ -312,12 +348,8 @@ class MuseReal(BaseReal): | ||
| 312 | new_frame = AudioFrame(format='s16', layout='mono', samples=frame.shape[0]) | 348 | new_frame = AudioFrame(format='s16', layout='mono', samples=frame.shape[0]) |
| 313 | new_frame.planes[0].update(frame.tobytes()) | 349 | new_frame.planes[0].update(frame.tobytes()) |
| 314 | new_frame.sample_rate=16000 | 350 | new_frame.sample_rate=16000 |
| 315 | - # if audio_track._queue.qsize()>10: | ||
| 316 | - # time.sleep(0.1) | ||
| 317 | asyncio.run_coroutine_threadsafe(audio_track._queue.put((new_frame,eventpoint)), loop) | 351 | asyncio.run_coroutine_threadsafe(audio_track._queue.put((new_frame,eventpoint)), loop) |
| 318 | self.record_audio_data(frame) | 352 | self.record_audio_data(frame) |
| 319 | - #self.notify(eventpoint) | ||
| 320 | - #self.recordq_audio.put(new_frame) | ||
| 321 | logger.info('musereal process_frames thread stop') | 353 | logger.info('musereal process_frames thread stop') |
| 322 | 354 | ||
| 323 | def render(self,quit_event,loop=None,audio_track=None,video_track=None): | 355 | def render(self,quit_event,loop=None,audio_track=None,video_track=None): |
| @@ -36,7 +36,7 @@ class UNet(): | @@ -36,7 +36,7 @@ class UNet(): | ||
| 36 | unet_config = json.load(f) | 36 | unet_config = json.load(f) |
| 37 | self.model = UNet2DConditionModel(**unet_config) | 37 | self.model = UNet2DConditionModel(**unet_config) |
| 38 | self.pe = PositionalEncoding(d_model=384) | 38 | self.pe = PositionalEncoding(d_model=384) |
| 39 | - self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | 39 | + self.device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) |
| 40 | weights = torch.load(model_path) if torch.cuda.is_available() else torch.load(model_path, map_location=self.device) | 40 | weights = torch.load(model_path) if torch.cuda.is_available() else torch.load(model_path, map_location=self.device) |
| 41 | self.model.load_state_dict(weights) | 41 | self.model.load_state_dict(weights) |
| 42 | if use_float16: | 42 | if use_float16: |
| @@ -23,7 +23,7 @@ class VAE(): | @@ -23,7 +23,7 @@ class VAE(): | ||
| 23 | self.model_path = model_path | 23 | self.model_path = model_path |
| 24 | self.vae = AutoencoderKL.from_pretrained(self.model_path) | 24 | self.vae = AutoencoderKL.from_pretrained(self.model_path) |
| 25 | 25 | ||
| 26 | - self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | 26 | + self.device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) |
| 27 | self.vae.to(self.device) | 27 | self.vae.to(self.device) |
| 28 | 28 | ||
| 29 | if use_float16: | 29 | if use_float16: |
| @@ -325,7 +325,7 @@ def create_musetalk_human(file, avatar_id): | @@ -325,7 +325,7 @@ def create_musetalk_human(file, avatar_id): | ||
| 325 | 325 | ||
| 326 | 326 | ||
| 327 | # initialize the mmpose model | 327 | # initialize the mmpose model |
| 328 | -device = "cuda" if torch.cuda.is_available() else "cpu" | 328 | +device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu") |
| 329 | fa = FaceAlignment(1, flip_input=False, device=device) | 329 | fa = FaceAlignment(1, flip_input=False, device=device) |
| 330 | config_file = os.path.join(current_dir, 'utils/dwpose/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py') | 330 | config_file = os.path.join(current_dir, 'utils/dwpose/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py') |
| 331 | checkpoint_file = os.path.abspath(os.path.join(current_dir, '../models/dwpose/dw-ll_ucoco_384.pth')) | 331 | checkpoint_file = os.path.abspath(os.path.join(current_dir, '../models/dwpose/dw-ll_ucoco_384.pth')) |
| @@ -13,14 +13,14 @@ import torch | @@ -13,14 +13,14 @@ import torch | ||
| 13 | from tqdm import tqdm | 13 | from tqdm import tqdm |
| 14 | 14 | ||
| 15 | # initialize the mmpose model | 15 | # initialize the mmpose model |
| 16 | -device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | 16 | +device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) |
| 17 | config_file = './musetalk/utils/dwpose/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py' | 17 | config_file = './musetalk/utils/dwpose/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py' |
| 18 | checkpoint_file = './models/dwpose/dw-ll_ucoco_384.pth' | 18 | checkpoint_file = './models/dwpose/dw-ll_ucoco_384.pth' |
| 19 | model = init_model(config_file, checkpoint_file, device=device) | 19 | model = init_model(config_file, checkpoint_file, device=device) |
| 20 | 20 | ||
| 21 | # initialize the face detection model | 21 | # initialize the face detection model |
| 22 | -device = "cuda" if torch.cuda.is_available() else "cpu" | ||
| 23 | -fa = FaceAlignment(LandmarksType._2D, flip_input=False,device=device) | 22 | +device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu") |
| 23 | +fa = FaceAlignment(LandmarksType._2D, flip_input=False, device=device) | ||
| 24 | 24 | ||
| 25 | # maker if the bbox is not sufficient | 25 | # maker if the bbox is not sufficient |
| 26 | coord_placeholder = (0.0,0.0,0.0,0.0) | 26 | coord_placeholder = (0.0,0.0,0.0,0.0) |
| @@ -91,7 +91,7 @@ def load_model(name: str, device: Optional[Union[str, torch.device]] = None, dow | @@ -91,7 +91,7 @@ def load_model(name: str, device: Optional[Union[str, torch.device]] = None, dow | ||
| 91 | """ | 91 | """ |
| 92 | 92 | ||
| 93 | if device is None: | 93 | if device is None: |
| 94 | - device = "cuda" if torch.cuda.is_available() else "cpu" | 94 | + device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu") |
| 95 | if download_root is None: | 95 | if download_root is None: |
| 96 | download_root = os.getenv( | 96 | download_root = os.getenv( |
| 97 | "XDG_CACHE_HOME", | 97 | "XDG_CACHE_HOME", |
| @@ -78,6 +78,8 @@ def transcribe( | @@ -78,6 +78,8 @@ def transcribe( | ||
| 78 | if dtype == torch.float16: | 78 | if dtype == torch.float16: |
| 79 | warnings.warn("FP16 is not supported on CPU; using FP32 instead") | 79 | warnings.warn("FP16 is not supported on CPU; using FP32 instead") |
| 80 | dtype = torch.float32 | 80 | dtype = torch.float32 |
| 81 | + if hasattr(torch.backends, "mps") and torch.backends.mps.is_available(): | ||
| 82 | + warnings.warn("Performing inference on CPU when MPS is available") | ||
| 81 | 83 | ||
| 82 | if dtype == torch.float32: | 84 | if dtype == torch.float32: |
| 83 | decode_options["fp16"] = False | 85 | decode_options["fp16"] = False |
| @@ -135,7 +137,7 @@ def cli(): | @@ -135,7 +137,7 @@ def cli(): | ||
| 135 | parser.add_argument("audio", nargs="+", type=str, help="audio file(s) to transcribe") | 137 | parser.add_argument("audio", nargs="+", type=str, help="audio file(s) to transcribe") |
| 136 | parser.add_argument("--model", default="small", choices=available_models(), help="name of the Whisper model to use") | 138 | parser.add_argument("--model", default="small", choices=available_models(), help="name of the Whisper model to use") |
| 137 | parser.add_argument("--model_dir", type=str, default=None, help="the path to save model files; uses ~/.cache/whisper by default") | 139 | parser.add_argument("--model_dir", type=str, default=None, help="the path to save model files; uses ~/.cache/whisper by default") |
| 138 | - parser.add_argument("--device", default="cuda" if torch.cuda.is_available() else "cpu", help="device to use for PyTorch inference") | 140 | + parser.add_argument("--device", default="cuda" if torch.cuda.is_available() else "mps", help="device to use for PyTorch inference") |
| 139 | parser.add_argument("--output_dir", "-o", type=str, default=".", help="directory to save the outputs") | 141 | parser.add_argument("--output_dir", "-o", type=str, default=".", help="directory to save the outputs") |
| 140 | parser.add_argument("--verbose", type=str2bool, default=True, help="whether to print out the progress and debug messages") | 142 | parser.add_argument("--verbose", type=str2bool, default=True, help="whether to print out the progress and debug messages") |
| 141 | 143 |
| @@ -30,7 +30,7 @@ class NerfASR(BaseASR): | @@ -30,7 +30,7 @@ class NerfASR(BaseASR): | ||
| 30 | def __init__(self, opt, parent, audio_processor,audio_model): | 30 | def __init__(self, opt, parent, audio_processor,audio_model): |
| 31 | super().__init__(opt,parent) | 31 | super().__init__(opt,parent) |
| 32 | 32 | ||
| 33 | - self.device = 'cuda' if torch.cuda.is_available() else 'cpu' | 33 | + self.device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu") |
| 34 | if 'esperanto' in self.opt.asr_model: | 34 | if 'esperanto' in self.opt.asr_model: |
| 35 | self.audio_dim = 44 | 35 | self.audio_dim = 44 |
| 36 | elif 'deepspeech' in self.opt.asr_model: | 36 | elif 'deepspeech' in self.opt.asr_model: |
| @@ -77,7 +77,7 @@ def load_model(opt): | @@ -77,7 +77,7 @@ def load_model(opt): | ||
| 77 | seed_everything(opt.seed) | 77 | seed_everything(opt.seed) |
| 78 | logger.info(opt) | 78 | logger.info(opt) |
| 79 | 79 | ||
| 80 | - device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') | 80 | + device = torch.device('cuda' if torch.cuda.is_available() else ('mps' if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else 'cpu')) |
| 81 | model = NeRFNetwork(opt) | 81 | model = NeRFNetwork(opt) |
| 82 | 82 | ||
| 83 | criterion = torch.nn.MSELoss(reduction='none') | 83 | criterion = torch.nn.MSELoss(reduction='none') |
| @@ -90,7 +90,7 @@ class BaseTTS: | @@ -90,7 +90,7 @@ class BaseTTS: | ||
| 90 | ########################################################################################### | 90 | ########################################################################################### |
| 91 | class EdgeTTS(BaseTTS): | 91 | class EdgeTTS(BaseTTS): |
| 92 | def txt_to_audio(self,msg): | 92 | def txt_to_audio(self,msg): |
| 93 | - voicename = "zh-CN-XiaoxiaoNeural" | 93 | + voicename = "zh-CN-YunxiaNeural" |
| 94 | text,textevent = msg | 94 | text,textevent = msg |
| 95 | t = time.time() | 95 | t = time.time() |
| 96 | asyncio.new_event_loop().run_until_complete(self.__main(voicename,text)) | 96 | asyncio.new_event_loop().run_until_complete(self.__main(voicename,text)) |
| @@ -107,9 +107,9 @@ class EdgeTTS(BaseTTS): | @@ -107,9 +107,9 @@ class EdgeTTS(BaseTTS): | ||
| 107 | eventpoint=None | 107 | eventpoint=None |
| 108 | streamlen -= self.chunk | 108 | streamlen -= self.chunk |
| 109 | if idx==0: | 109 | if idx==0: |
| 110 | - eventpoint={'status':'start','text':text,'msgenvent':textevent} | 110 | + eventpoint={'status':'start','text':text,'msgevent':textevent} |
| 111 | elif streamlen<self.chunk: | 111 | elif streamlen<self.chunk: |
| 112 | - eventpoint={'status':'end','text':text,'msgenvent':textevent} | 112 | + eventpoint={'status':'end','text':text,'msgevent':textevent} |
| 113 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) | 113 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) |
| 114 | idx += self.chunk | 114 | idx += self.chunk |
| 115 | #if streamlen>0: #skip last frame(not 20ms) | 115 | #if streamlen>0: #skip last frame(not 20ms) |
| @@ -219,16 +219,16 @@ class FishTTS(BaseTTS): | @@ -219,16 +219,16 @@ class FishTTS(BaseTTS): | ||
| 219 | while streamlen >= self.chunk: | 219 | while streamlen >= self.chunk: |
| 220 | eventpoint=None | 220 | eventpoint=None |
| 221 | if first: | 221 | if first: |
| 222 | - eventpoint={'status':'start','text':text,'msgenvent':textevent} | 222 | + eventpoint={'status':'start','text':text,'msgevent':textevent} |
| 223 | first = False | 223 | first = False |
| 224 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) | 224 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) |
| 225 | streamlen -= self.chunk | 225 | streamlen -= self.chunk |
| 226 | idx += self.chunk | 226 | idx += self.chunk |
| 227 | - eventpoint={'status':'end','text':text,'msgenvent':textevent} | 227 | + eventpoint={'status':'end','text':text,'msgevent':textevent} |
| 228 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) | 228 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) |
| 229 | 229 | ||
| 230 | ########################################################################################### | 230 | ########################################################################################### |
| 231 | -class VoitsTTS(BaseTTS): | 231 | +class SovitsTTS(BaseTTS): |
| 232 | def txt_to_audio(self,msg): | 232 | def txt_to_audio(self,msg): |
| 233 | text,textevent = msg | 233 | text,textevent = msg |
| 234 | self.stream_tts( | 234 | self.stream_tts( |
| @@ -316,12 +316,12 @@ class VoitsTTS(BaseTTS): | @@ -316,12 +316,12 @@ class VoitsTTS(BaseTTS): | ||
| 316 | while streamlen >= self.chunk: | 316 | while streamlen >= self.chunk: |
| 317 | eventpoint=None | 317 | eventpoint=None |
| 318 | if first: | 318 | if first: |
| 319 | - eventpoint={'status':'start','text':text,'msgenvent':textevent} | 319 | + eventpoint={'status':'start','text':text,'msgevent':textevent} |
| 320 | first = False | 320 | first = False |
| 321 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) | 321 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) |
| 322 | streamlen -= self.chunk | 322 | streamlen -= self.chunk |
| 323 | idx += self.chunk | 323 | idx += self.chunk |
| 324 | - eventpoint={'status':'end','text':text,'msgenvent':textevent} | 324 | + eventpoint={'status':'end','text':text,'msgevent':textevent} |
| 325 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) | 325 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) |
| 326 | 326 | ||
| 327 | ########################################################################################### | 327 | ########################################################################################### |
| @@ -382,12 +382,12 @@ class CosyVoiceTTS(BaseTTS): | @@ -382,12 +382,12 @@ class CosyVoiceTTS(BaseTTS): | ||
| 382 | while streamlen >= self.chunk: | 382 | while streamlen >= self.chunk: |
| 383 | eventpoint=None | 383 | eventpoint=None |
| 384 | if first: | 384 | if first: |
| 385 | - eventpoint={'status':'start','text':text,'msgenvent':textevent} | 385 | + eventpoint={'status':'start','text':text,'msgevent':textevent} |
| 386 | first = False | 386 | first = False |
| 387 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) | 387 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) |
| 388 | streamlen -= self.chunk | 388 | streamlen -= self.chunk |
| 389 | idx += self.chunk | 389 | idx += self.chunk |
| 390 | - eventpoint={'status':'end','text':text,'msgenvent':textevent} | 390 | + eventpoint={'status':'end','text':text,'msgevent':textevent} |
| 391 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) | 391 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) |
| 392 | 392 | ||
| 393 | ########################################################################################### | 393 | ########################################################################################### |
| @@ -505,13 +505,13 @@ class TencentTTS(BaseTTS): | @@ -505,13 +505,13 @@ class TencentTTS(BaseTTS): | ||
| 505 | while streamlen >= self.chunk: | 505 | while streamlen >= self.chunk: |
| 506 | eventpoint=None | 506 | eventpoint=None |
| 507 | if first: | 507 | if first: |
| 508 | - eventpoint={'status':'start','text':text,'msgenvent':textevent} | 508 | + eventpoint={'status':'start','text':text,'msgevent':textevent} |
| 509 | first = False | 509 | first = False |
| 510 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) | 510 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) |
| 511 | streamlen -= self.chunk | 511 | streamlen -= self.chunk |
| 512 | idx += self.chunk | 512 | idx += self.chunk |
| 513 | last_stream = stream[idx:] #get the remain stream | 513 | last_stream = stream[idx:] #get the remain stream |
| 514 | - eventpoint={'status':'end','text':text,'msgenvent':textevent} | 514 | + eventpoint={'status':'end','text':text,'msgevent':textevent} |
| 515 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) | 515 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) |
| 516 | 516 | ||
| 517 | ########################################################################################### | 517 | ########################################################################################### |
| @@ -583,10 +583,10 @@ class XTTS(BaseTTS): | @@ -583,10 +583,10 @@ class XTTS(BaseTTS): | ||
| 583 | while streamlen >= self.chunk: | 583 | while streamlen >= self.chunk: |
| 584 | eventpoint=None | 584 | eventpoint=None |
| 585 | if first: | 585 | if first: |
| 586 | - eventpoint={'status':'start','text':text,'msgenvent':textevent} | 586 | + eventpoint={'status':'start','text':text,'msgevent':textevent} |
| 587 | first = False | 587 | first = False |
| 588 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) | 588 | self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint) |
| 589 | streamlen -= self.chunk | 589 | streamlen -= self.chunk |
| 590 | idx += self.chunk | 590 | idx += self.chunk |
| 591 | - eventpoint={'status':'end','text':text,'msgenvent':textevent} | 591 | + eventpoint={'status':'end','text':text,'msgevent':textevent} |
| 592 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) | 592 | self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint) |
| @@ -236,7 +236,7 @@ if __name__ == '__main__': | @@ -236,7 +236,7 @@ if __name__ == '__main__': | ||
| 236 | if hasattr(module, 'reparameterize'): | 236 | if hasattr(module, 'reparameterize'): |
| 237 | module.reparameterize() | 237 | module.reparameterize() |
| 238 | return model | 238 | return model |
| 239 | - device = torch.device("cuda") | 239 | + device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")) |
| 240 | def check_onnx(torch_out, torch_in, audio): | 240 | def check_onnx(torch_out, torch_in, audio): |
| 241 | onnx_model = onnx.load(onnx_path) | 241 | onnx_model = onnx.load(onnx_path) |
| 242 | onnx.checker.check_model(onnx_model) | 242 | onnx.checker.check_model(onnx_model) |
web/dashboard.html
0 → 100644
| 1 | +<!DOCTYPE html> | ||
| 2 | +<html lang="zh-CN"> | ||
| 3 | +<head> | ||
| 4 | + <meta charset="UTF-8"> | ||
| 5 | + <meta name="viewport" content="width=device-width, initial-scale=1.0"> | ||
| 6 | + <title>livetalking数字人交互平台</title> | ||
| 7 | + <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet"> | ||
| 8 | + <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.10.0/font/bootstrap-icons.css"> | ||
| 9 | + <style> | ||
| 10 | + :root { | ||
| 11 | + --primary-color: #4361ee; | ||
| 12 | + --secondary-color: #3f37c9; | ||
| 13 | + --accent-color: #4895ef; | ||
| 14 | + --background-color: #f8f9fa; | ||
| 15 | + --card-bg: #ffffff; | ||
| 16 | + --text-color: #212529; | ||
| 17 | + --border-radius: 10px; | ||
| 18 | + --box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); | ||
| 19 | + } | ||
| 20 | + | ||
| 21 | + body { | ||
| 22 | + font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; | ||
| 23 | + background-color: var(--background-color); | ||
| 24 | + color: var(--text-color); | ||
| 25 | + min-height: 100vh; | ||
| 26 | + padding-top: 20px; | ||
| 27 | + } | ||
| 28 | + | ||
| 29 | + .dashboard-container { | ||
| 30 | + max-width: 1400px; | ||
| 31 | + margin: 0 auto; | ||
| 32 | + padding: 20px; | ||
| 33 | + } | ||
| 34 | + | ||
| 35 | + .card { | ||
| 36 | + background-color: var(--card-bg); | ||
| 37 | + border-radius: var(--border-radius); | ||
| 38 | + box-shadow: var(--box-shadow); | ||
| 39 | + border: none; | ||
| 40 | + margin-bottom: 20px; | ||
| 41 | + overflow: hidden; | ||
| 42 | + } | ||
| 43 | + | ||
| 44 | + .card-header { | ||
| 45 | + background-color: var(--primary-color); | ||
| 46 | + color: white; | ||
| 47 | + font-weight: 600; | ||
| 48 | + padding: 15px 20px; | ||
| 49 | + border-bottom: none; | ||
| 50 | + } | ||
| 51 | + | ||
| 52 | + .video-container { | ||
| 53 | + position: relative; | ||
| 54 | + width: 100%; | ||
| 55 | + background-color: #000; | ||
| 56 | + border-radius: var(--border-radius); | ||
| 57 | + overflow: hidden; | ||
| 58 | + display: flex; | ||
| 59 | + justify-content: center; | ||
| 60 | + align-items: center; | ||
| 61 | + } | ||
| 62 | + | ||
| 63 | + video { | ||
| 64 | + max-width: 100%; | ||
| 65 | + max-height: 100%; | ||
| 66 | + display: block; | ||
| 67 | + border-radius: var(--border-radius); | ||
| 68 | + } | ||
| 69 | + | ||
| 70 | + .controls-container { | ||
| 71 | + padding: 20px; | ||
| 72 | + } | ||
| 73 | + | ||
| 74 | + .btn-primary { | ||
| 75 | + background-color: var(--primary-color); | ||
| 76 | + border-color: var(--primary-color); | ||
| 77 | + } | ||
| 78 | + | ||
| 79 | + .btn-primary:hover { | ||
| 80 | + background-color: var(--secondary-color); | ||
| 81 | + border-color: var(--secondary-color); | ||
| 82 | + } | ||
| 83 | + | ||
| 84 | + .btn-outline-primary { | ||
| 85 | + color: var(--primary-color); | ||
| 86 | + border-color: var(--primary-color); | ||
| 87 | + } | ||
| 88 | + | ||
| 89 | + .btn-outline-primary:hover { | ||
| 90 | + background-color: var(--primary-color); | ||
| 91 | + color: white; | ||
| 92 | + } | ||
| 93 | + | ||
| 94 | + .form-control { | ||
| 95 | + border-radius: var(--border-radius); | ||
| 96 | + padding: 10px 15px; | ||
| 97 | + border: 1px solid #ced4da; | ||
| 98 | + } | ||
| 99 | + | ||
| 100 | + .form-control:focus { | ||
| 101 | + border-color: var(--accent-color); | ||
| 102 | + box-shadow: 0 0 0 0.25rem rgba(67, 97, 238, 0.25); | ||
| 103 | + } | ||
| 104 | + | ||
| 105 | + .status-indicator { | ||
| 106 | + width: 10px; | ||
| 107 | + height: 10px; | ||
| 108 | + border-radius: 50%; | ||
| 109 | + display: inline-block; | ||
| 110 | + margin-right: 5px; | ||
| 111 | + } | ||
| 112 | + | ||
| 113 | + .status-connected { | ||
| 114 | + background-color: #28a745; | ||
| 115 | + } | ||
| 116 | + | ||
| 117 | + .status-disconnected { | ||
| 118 | + background-color: #dc3545; | ||
| 119 | + } | ||
| 120 | + | ||
| 121 | + .status-connecting { | ||
| 122 | + background-color: #ffc107; | ||
| 123 | + } | ||
| 124 | + | ||
| 125 | + .asr-container { | ||
| 126 | + height: 300px; | ||
| 127 | + overflow-y: auto; | ||
| 128 | + padding: 15px; | ||
| 129 | + background-color: #f8f9fa; | ||
| 130 | + border-radius: var(--border-radius); | ||
| 131 | + border: 1px solid #ced4da; | ||
| 132 | + } | ||
| 133 | + | ||
| 134 | + .asr-text { | ||
| 135 | + margin-bottom: 10px; | ||
| 136 | + padding: 10px; | ||
| 137 | + background-color: white; | ||
| 138 | + border-radius: var(--border-radius); | ||
| 139 | + box-shadow: 0 1px 3px rgba(0, 0, 0, 0.1); | ||
| 140 | + } | ||
| 141 | + | ||
| 142 | + .user-message { | ||
| 143 | + background-color: #e3f2fd; | ||
| 144 | + border-left: 4px solid var(--primary-color); | ||
| 145 | + } | ||
| 146 | + | ||
| 147 | + .system-message { | ||
| 148 | + background-color: #f1f8e9; | ||
| 149 | + border-left: 4px solid #8bc34a; | ||
| 150 | + } | ||
| 151 | + | ||
| 152 | + .recording-indicator { | ||
| 153 | + position: absolute; | ||
| 154 | + top: 15px; | ||
| 155 | + right: 15px; | ||
| 156 | + background-color: rgba(220, 53, 69, 0.8); | ||
| 157 | + color: white; | ||
| 158 | + padding: 5px 10px; | ||
| 159 | + border-radius: 20px; | ||
| 160 | + font-size: 0.8rem; | ||
| 161 | + display: none; | ||
| 162 | + } | ||
| 163 | + | ||
| 164 | + .recording-indicator.active { | ||
| 165 | + display: flex; | ||
| 166 | + align-items: center; | ||
| 167 | + } | ||
| 168 | + | ||
| 169 | + .recording-indicator .blink { | ||
| 170 | + width: 10px; | ||
| 171 | + height: 10px; | ||
| 172 | + background-color: #fff; | ||
| 173 | + border-radius: 50%; | ||
| 174 | + margin-right: 5px; | ||
| 175 | + animation: blink 1s infinite; | ||
| 176 | + } | ||
| 177 | + | ||
| 178 | + @keyframes blink { | ||
| 179 | + 0% { opacity: 1; } | ||
| 180 | + 50% { opacity: 0.3; } | ||
| 181 | + 100% { opacity: 1; } | ||
| 182 | + } | ||
| 183 | + | ||
| 184 | + .mode-switch { | ||
| 185 | + margin-bottom: 20px; | ||
| 186 | + } | ||
| 187 | + | ||
| 188 | + .nav-tabs .nav-link { | ||
| 189 | + color: var(--text-color); | ||
| 190 | + border: none; | ||
| 191 | + padding: 10px 20px; | ||
| 192 | + border-radius: var(--border-radius) var(--border-radius) 0 0; | ||
| 193 | + } | ||
| 194 | + | ||
| 195 | + .nav-tabs .nav-link.active { | ||
| 196 | + color: var(--primary-color); | ||
| 197 | + background-color: var(--card-bg); | ||
| 198 | + border-bottom: 3px solid var(--primary-color); | ||
| 199 | + font-weight: 600; | ||
| 200 | + } | ||
| 201 | + | ||
| 202 | + .tab-content { | ||
| 203 | + padding: 20px; | ||
| 204 | + background-color: var(--card-bg); | ||
| 205 | + border-radius: 0 0 var(--border-radius) var(--border-radius); | ||
| 206 | + } | ||
| 207 | + | ||
| 208 | + .settings-panel { | ||
| 209 | + padding: 15px; | ||
| 210 | + background-color: #f8f9fa; | ||
| 211 | + border-radius: var(--border-radius); | ||
| 212 | + margin-top: 15px; | ||
| 213 | + } | ||
| 214 | + | ||
| 215 | + .footer { | ||
| 216 | + text-align: center; | ||
| 217 | + margin-top: 30px; | ||
| 218 | + padding: 20px 0; | ||
| 219 | + color: #6c757d; | ||
| 220 | + font-size: 0.9rem; | ||
| 221 | + } | ||
| 222 | + | ||
| 223 | + .voice-record-btn { | ||
| 224 | + width: 60px; | ||
| 225 | + height: 60px; | ||
| 226 | + border-radius: 50%; | ||
| 227 | + background-color: var(--primary-color); | ||
| 228 | + color: white; | ||
| 229 | + display: flex; | ||
| 230 | + justify-content: center; | ||
| 231 | + align-items: center; | ||
| 232 | + cursor: pointer; | ||
| 233 | + transition: all 0.2s ease; | ||
| 234 | + box-shadow: 0 2px 5px rgba(0,0,0,0.2); | ||
| 235 | + margin: 0 auto; | ||
| 236 | + } | ||
| 237 | + | ||
| 238 | + .voice-record-btn:hover { | ||
| 239 | + background-color: var(--secondary-color); | ||
| 240 | + transform: scale(1.05); | ||
| 241 | + } | ||
| 242 | + | ||
| 243 | + .voice-record-btn:active { | ||
| 244 | + background-color: #dc3545; | ||
| 245 | + transform: scale(0.95); | ||
| 246 | + } | ||
| 247 | + | ||
| 248 | + .voice-record-btn i { | ||
| 249 | + font-size: 24px; | ||
| 250 | + } | ||
| 251 | + | ||
| 252 | + .voice-record-label { | ||
| 253 | + text-align: center; | ||
| 254 | + margin-top: 10px; | ||
| 255 | + font-size: 14px; | ||
| 256 | + color: #6c757d; | ||
| 257 | + } | ||
| 258 | + | ||
| 259 | + .video-size-control { | ||
| 260 | + margin-top: 15px; | ||
| 261 | + } | ||
| 262 | + | ||
| 263 | + .recording-pulse { | ||
| 264 | + animation: pulse 1.5s infinite; | ||
| 265 | + } | ||
| 266 | + | ||
| 267 | + @keyframes pulse { | ||
| 268 | + 0% { | ||
| 269 | + box-shadow: 0 0 0 0 rgba(220, 53, 69, 0.7); | ||
| 270 | + } | ||
| 271 | + 70% { | ||
| 272 | + box-shadow: 0 0 0 15px rgba(220, 53, 69, 0); | ||
| 273 | + } | ||
| 274 | + 100% { | ||
| 275 | + box-shadow: 0 0 0 0 rgba(220, 53, 69, 0); | ||
| 276 | + } | ||
| 277 | + } | ||
| 278 | + </style> | ||
| 279 | +</head> | ||
| 280 | +<body> | ||
| 281 | + <div class="dashboard-container"> | ||
| 282 | + <div class="row"> | ||
| 283 | + <div class="col-12"> | ||
| 284 | + <h1 class="text-center mb-4">livetalking数字人交互平台</h1> | ||
| 285 | + </div> | ||
| 286 | + </div> | ||
| 287 | + | ||
| 288 | + <div class="row"> | ||
| 289 | + <!-- 视频区域 --> | ||
| 290 | + <div class="col-lg-8"> | ||
| 291 | + <div class="card"> | ||
| 292 | + <div class="card-header d-flex justify-content-between align-items-center"> | ||
| 293 | + <div> | ||
| 294 | + <span class="status-indicator status-disconnected" id="connection-status"></span> | ||
| 295 | + <span id="status-text">未连接</span> | ||
| 296 | + </div> | ||
| 297 | + </div> | ||
| 298 | + <div class="card-body p-0"> | ||
| 299 | + <div class="video-container"> | ||
| 300 | + <video id="video" autoplay playsinline></video> | ||
| 301 | + <div class="recording-indicator" id="recording-indicator"> | ||
| 302 | + <div class="blink"></div> | ||
| 303 | + <span>录制中</span> | ||
| 304 | + </div> | ||
| 305 | + </div> | ||
| 306 | + | ||
| 307 | + <div class="controls-container"> | ||
| 308 | + <div class="row"> | ||
| 309 | + <div class="col-md-6 mb-3"> | ||
| 310 | + <button class="btn btn-primary w-100" id="start"> | ||
| 311 | + <i class="bi bi-play-fill"></i> 开始连接 | ||
| 312 | + </button> | ||
| 313 | + <button class="btn btn-danger w-100" id="stop" style="display: none;"> | ||
| 314 | + <i class="bi bi-stop-fill"></i> 停止连接 | ||
| 315 | + </button> | ||
| 316 | + </div> | ||
| 317 | + <div class="col-md-6 mb-3"> | ||
| 318 | + <div class="d-flex"> | ||
| 319 | + <button class="btn btn-outline-primary flex-grow-1 me-2" id="btn_start_record"> | ||
| 320 | + <i class="bi bi-record-fill"></i> 开始录制 | ||
| 321 | + </button> | ||
| 322 | + <button class="btn btn-outline-danger flex-grow-1" id="btn_stop_record" disabled> | ||
| 323 | + <i class="bi bi-stop-fill"></i> 停止录制 | ||
| 324 | + </button> | ||
| 325 | + </div> | ||
| 326 | + </div> | ||
| 327 | + </div> | ||
| 328 | + | ||
| 329 | + <div class="row"> | ||
| 330 | + <div class="col-12"> | ||
| 331 | + <div class="video-size-control"> | ||
| 332 | + <label for="video-size-slider" class="form-label">视频大小调节: <span id="video-size-value">100%</span></label> | ||
| 333 | + <input type="range" class="form-range" id="video-size-slider" min="50" max="150" value="100"> | ||
| 334 | + </div> | ||
| 335 | + </div> | ||
| 336 | + </div> | ||
| 337 | + | ||
| 338 | + <div class="settings-panel mt-3"> | ||
| 339 | + <div class="row"> | ||
| 340 | + <div class="col-md-12"> | ||
| 341 | + <div class="form-check form-switch mb-3"> | ||
| 342 | + <input class="form-check-input" type="checkbox" id="use-stun"> | ||
| 343 | + <label class="form-check-label" for="use-stun">使用STUN服务器</label> | ||
| 344 | + </div> | ||
| 345 | + </div> | ||
| 346 | + </div> | ||
| 347 | + </div> | ||
| 348 | + </div> | ||
| 349 | + </div> | ||
| 350 | + </div> | ||
| 351 | + </div> | ||
| 352 | + | ||
| 353 | + <!-- 右侧交互 --> | ||
| 354 | + <div class="col-lg-4"> | ||
| 355 | + <div class="card"> | ||
| 356 | + <div class="card-header"> | ||
| 357 | + <ul class="nav nav-tabs card-header-tabs" id="interaction-tabs" role="tablist"> | ||
| 358 | + <li class="nav-item" role="presentation"> | ||
| 359 | + <button class="nav-link active" id="chat-tab" data-bs-toggle="tab" data-bs-target="#chat" type="button" role="tab" aria-controls="chat" aria-selected="true">对话模式</button> | ||
| 360 | + </li> | ||
| 361 | + <li class="nav-item" role="presentation"> | ||
| 362 | + <button class="nav-link" id="tts-tab" data-bs-toggle="tab" data-bs-target="#tts" type="button" role="tab" aria-controls="tts" aria-selected="false">朗读模式</button> | ||
| 363 | + </li> | ||
| 364 | + </ul> | ||
| 365 | + </div> | ||
| 366 | + <div class="card-body"> | ||
| 367 | + <div class="tab-content" id="interaction-tabs-content"> | ||
| 368 | + <!-- 对话模式 --> | ||
| 369 | + <div class="tab-pane fade show active" id="chat" role="tabpanel" aria-labelledby="chat-tab"> | ||
| 370 | + <div class="asr-container mb-3" id="chat-messages"> | ||
| 371 | + <div class="asr-text system-message"> | ||
| 372 | + 系统: 欢迎使用livetalking,请点击"开始连接"按钮开始对话。 | ||
| 373 | + </div> | ||
| 374 | + </div> | ||
| 375 | + | ||
| 376 | + <form id="chat-form"> | ||
| 377 | + <div class="input-group mb-3"> | ||
| 378 | + <textarea class="form-control" id="chat-message" rows="3" placeholder="输入您想对数字人说的话..."></textarea> | ||
| 379 | + <button class="btn btn-primary" type="submit"> | ||
| 380 | + <i class="bi bi-send"></i> 发送 | ||
| 381 | + </button> | ||
| 382 | + </div> | ||
| 383 | + </form> | ||
| 384 | + | ||
| 385 | + <!-- 按住说话按钮 --> | ||
| 386 | + <div class="voice-record-btn" id="voice-record-btn"> | ||
| 387 | + <i class="bi bi-mic-fill"></i> | ||
| 388 | + </div> | ||
| 389 | + <div class="voice-record-label">按住说话,松开发送</div> | ||
| 390 | + </div> | ||
| 391 | + | ||
| 392 | + <!-- 朗读模式 --> | ||
| 393 | + <div class="tab-pane fade" id="tts" role="tabpanel" aria-labelledby="tts-tab"> | ||
| 394 | + <form id="echo-form"> | ||
| 395 | + <div class="mb-3"> | ||
| 396 | + <label for="message" class="form-label">输入要朗读的文本</label> | ||
| 397 | + <textarea class="form-control" id="message" rows="6" placeholder="输入您想让数字人朗读的文字..."></textarea> | ||
| 398 | + </div> | ||
| 399 | + <button type="submit" class="btn btn-primary w-100"> | ||
| 400 | + <i class="bi bi-volume-up"></i> 朗读文本 | ||
| 401 | + </button> | ||
| 402 | + </form> | ||
| 403 | + </div> | ||
| 404 | + </div> | ||
| 405 | + </div> | ||
| 406 | + </div> | ||
| 407 | + </div> | ||
| 408 | + </div> | ||
| 409 | + | ||
| 410 | + <div class="footer"> | ||
| 411 | + <p>Made with ❤️ by Marstaos | Frontend & Performance Optimization</p> | ||
| 412 | + </div> | ||
| 413 | + </div> | ||
| 414 | + | ||
| 415 | + <!-- 隐藏的会话ID --> | ||
| 416 | + <input type="hidden" id="sessionid" value="0"> | ||
| 417 | + | ||
| 418 | + | ||
| 419 | + <script src="client.js"></script> | ||
| 420 | + <script src="srs.sdk.js"></script> | ||
| 421 | + <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script> | ||
| 422 | + <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script> | ||
| 423 | + <script> | ||
| 424 | + $(document).ready(function() { | ||
| 425 | + $('#video-size-slider').on('input', function() { | ||
| 426 | + const value = $(this).val(); | ||
| 427 | + $('#video-size-value').text(value + '%'); | ||
| 428 | + $('#video').css('width', value + '%'); | ||
| 429 | + }); | ||
| 430 | + function updateConnectionStatus(status) { | ||
| 431 | + const statusIndicator = $('#connection-status'); | ||
| 432 | + const statusText = $('#status-text'); | ||
| 433 | + | ||
| 434 | + statusIndicator.removeClass('status-connected status-disconnected status-connecting'); | ||
| 435 | + | ||
| 436 | + switch(status) { | ||
| 437 | + case 'connected': | ||
| 438 | + statusIndicator.addClass('status-connected'); | ||
| 439 | + statusText.text('已连接'); | ||
| 440 | + break; | ||
| 441 | + case 'connecting': | ||
| 442 | + statusIndicator.addClass('status-connecting'); | ||
| 443 | + statusText.text('连接中...'); | ||
| 444 | + break; | ||
| 445 | + case 'disconnected': | ||
| 446 | + default: | ||
| 447 | + statusIndicator.addClass('status-disconnected'); | ||
| 448 | + statusText.text('未连接'); | ||
| 449 | + break; | ||
| 450 | + } | ||
| 451 | + } | ||
| 452 | + | ||
| 453 | + // 添加聊天消息 | ||
| 454 | + function addChatMessage(message, type = 'user') { | ||
| 455 | + const messagesContainer = $('#chat-messages'); | ||
| 456 | + const messageClass = type === 'user' ? 'user-message' : 'system-message'; | ||
| 457 | + const sender = type === 'user' ? '您' : '数字人'; | ||
| 458 | + | ||
| 459 | + const messageElement = $(` | ||
| 460 | + <div class="asr-text ${messageClass}"> | ||
| 461 | + ${sender}: ${message} | ||
| 462 | + </div> | ||
| 463 | + `); | ||
| 464 | + | ||
| 465 | + messagesContainer.append(messageElement); | ||
| 466 | + messagesContainer.scrollTop(messagesContainer[0].scrollHeight); | ||
| 467 | + } | ||
| 468 | + | ||
| 469 | + // 开始/停止按钮 | ||
| 470 | + $('#start').click(function() { | ||
| 471 | + updateConnectionStatus('connecting'); | ||
| 472 | + start(); | ||
| 473 | + $(this).hide(); | ||
| 474 | + $('#stop').show(); | ||
| 475 | + | ||
| 476 | + // 添加定时器检查视频流是否已加载 | ||
| 477 | + let connectionCheckTimer = setInterval(function() { | ||
| 478 | + const video = document.getElementById('video'); | ||
| 479 | + // 检查视频是否有数据 | ||
| 480 | + if (video.readyState >= 3 && video.videoWidth > 0) { | ||
| 481 | + updateConnectionStatus('connected'); | ||
| 482 | + clearInterval(connectionCheckTimer); | ||
| 483 | + } | ||
| 484 | + }, 2000); // 每2秒检查一次 | ||
| 485 | + | ||
| 486 | + // 60秒后如果还是连接中状态,就停止检查 | ||
| 487 | + setTimeout(function() { | ||
| 488 | + if (connectionCheckTimer) { | ||
| 489 | + clearInterval(connectionCheckTimer); | ||
| 490 | + } | ||
| 491 | + }, 60000); | ||
| 492 | + }); | ||
| 493 | + | ||
| 494 | + $('#stop').click(function() { | ||
| 495 | + stop(); | ||
| 496 | + $(this).hide(); | ||
| 497 | + $('#start').show(); | ||
| 498 | + updateConnectionStatus('disconnected'); | ||
| 499 | + }); | ||
| 500 | + | ||
| 501 | + // 录制功能 | ||
| 502 | + $('#btn_start_record').click(function() { | ||
| 503 | + console.log('Starting recording...'); | ||
| 504 | + fetch('/record', { | ||
| 505 | + body: JSON.stringify({ | ||
| 506 | + type: 'start_record', | ||
| 507 | + sessionid: parseInt(document.getElementById('sessionid').value), | ||
| 508 | + }), | ||
| 509 | + headers: { | ||
| 510 | + 'Content-Type': 'application/json' | ||
| 511 | + }, | ||
| 512 | + method: 'POST' | ||
| 513 | + }).then(function(response) { | ||
| 514 | + if (response.ok) { | ||
| 515 | + console.log('Recording started.'); | ||
| 516 | + $('#btn_start_record').prop('disabled', true); | ||
| 517 | + $('#btn_stop_record').prop('disabled', false); | ||
| 518 | + $('#recording-indicator').addClass('active'); | ||
| 519 | + } else { | ||
| 520 | + console.error('Failed to start recording.'); | ||
| 521 | + } | ||
| 522 | + }).catch(function(error) { | ||
| 523 | + console.error('Error:', error); | ||
| 524 | + }); | ||
| 525 | + }); | ||
| 526 | + | ||
| 527 | + $('#btn_stop_record').click(function() { | ||
| 528 | + console.log('Stopping recording...'); | ||
| 529 | + fetch('/record', { | ||
| 530 | + body: JSON.stringify({ | ||
| 531 | + type: 'end_record', | ||
| 532 | + sessionid: parseInt(document.getElementById('sessionid').value), | ||
| 533 | + }), | ||
| 534 | + headers: { | ||
| 535 | + 'Content-Type': 'application/json' | ||
| 536 | + }, | ||
| 537 | + method: 'POST' | ||
| 538 | + }).then(function(response) { | ||
| 539 | + if (response.ok) { | ||
| 540 | + console.log('Recording stopped.'); | ||
| 541 | + $('#btn_start_record').prop('disabled', false); | ||
| 542 | + $('#btn_stop_record').prop('disabled', true); | ||
| 543 | + $('#recording-indicator').removeClass('active'); | ||
| 544 | + } else { | ||
| 545 | + console.error('Failed to stop recording.'); | ||
| 546 | + } | ||
| 547 | + }).catch(function(error) { | ||
| 548 | + console.error('Error:', error); | ||
| 549 | + }); | ||
| 550 | + }); | ||
| 551 | + | ||
| 552 | + $('#echo-form').on('submit', function(e) { | ||
| 553 | + e.preventDefault(); | ||
| 554 | + var message = $('#message').val(); | ||
| 555 | + if (!message.trim()) return; | ||
| 556 | + | ||
| 557 | + console.log('Sending echo message:', message); | ||
| 558 | + | ||
| 559 | + fetch('/human', { | ||
| 560 | + body: JSON.stringify({ | ||
| 561 | + text: message, | ||
| 562 | + type: 'echo', | ||
| 563 | + interrupt: true, | ||
| 564 | + sessionid: parseInt(document.getElementById('sessionid').value), | ||
| 565 | + }), | ||
| 566 | + headers: { | ||
| 567 | + 'Content-Type': 'application/json' | ||
| 568 | + }, | ||
| 569 | + method: 'POST' | ||
| 570 | + }); | ||
| 571 | + | ||
| 572 | + $('#message').val(''); | ||
| 573 | + addChatMessage(`已发送朗读请求: "${message}"`, 'system'); | ||
| 574 | + }); | ||
| 575 | + | ||
| 576 | + // 聊天模式表单提交 | ||
| 577 | + $('#chat-form').on('submit', function(e) { | ||
| 578 | + e.preventDefault(); | ||
| 579 | + var message = $('#chat-message').val(); | ||
| 580 | + if (!message.trim()) return; | ||
| 581 | + | ||
| 582 | + console.log('Sending chat message:', message); | ||
| 583 | + | ||
| 584 | + fetch('/human', { | ||
| 585 | + body: JSON.stringify({ | ||
| 586 | + text: message, | ||
| 587 | + type: 'chat', | ||
| 588 | + interrupt: true, | ||
| 589 | + sessionid: parseInt(document.getElementById('sessionid').value), | ||
| 590 | + }), | ||
| 591 | + headers: { | ||
| 592 | + 'Content-Type': 'application/json' | ||
| 593 | + }, | ||
| 594 | + method: 'POST' | ||
| 595 | + }); | ||
| 596 | + | ||
| 597 | + addChatMessage(message, 'user'); | ||
| 598 | + $('#chat-message').val(''); | ||
| 599 | + }); | ||
| 600 | + | ||
| 601 | + // 按住说话功能 | ||
| 602 | + let mediaRecorder; | ||
| 603 | + let audioChunks = []; | ||
| 604 | + let isRecording = false; | ||
| 605 | + let recognition; | ||
| 606 | + | ||
| 607 | + // 检查浏览器是否支持语音识别 | ||
| 608 | + const isSpeechRecognitionSupported = 'webkitSpeechRecognition' in window || 'SpeechRecognition' in window; | ||
| 609 | + | ||
| 610 | + if (isSpeechRecognitionSupported) { | ||
| 611 | + recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)(); | ||
| 612 | + recognition.continuous = true; | ||
| 613 | + recognition.interimResults = true; | ||
| 614 | + recognition.lang = 'zh-CN'; | ||
| 615 | + | ||
| 616 | + recognition.onresult = function(event) { | ||
| 617 | + let interimTranscript = ''; | ||
| 618 | + let finalTranscript = ''; | ||
| 619 | + | ||
| 620 | + for (let i = event.resultIndex; i < event.results.length; ++i) { | ||
| 621 | + if (event.results[i].isFinal) { | ||
| 622 | + finalTranscript += event.results[i][0].transcript; | ||
| 623 | + } else { | ||
| 624 | + interimTranscript += event.results[i][0].transcript; | ||
| 625 | + $('#chat-message').val(interimTranscript); | ||
| 626 | + } | ||
| 627 | + } | ||
| 628 | + | ||
| 629 | + if (finalTranscript) { | ||
| 630 | + $('#chat-message').val(finalTranscript); | ||
| 631 | + } | ||
| 632 | + }; | ||
| 633 | + | ||
| 634 | + recognition.onerror = function(event) { | ||
| 635 | + console.error('语音识别错误:', event.error); | ||
| 636 | + }; | ||
| 637 | + } | ||
| 638 | + | ||
| 639 | + // 按住说话按钮事件 | ||
| 640 | + $('#voice-record-btn').on('mousedown touchstart', function(e) { | ||
| 641 | + e.preventDefault(); | ||
| 642 | + startRecording(); | ||
| 643 | + }).on('mouseup mouseleave touchend', function() { | ||
| 644 | + if (isRecording) { | ||
| 645 | + stopRecording(); | ||
| 646 | + } | ||
| 647 | + }); | ||
| 648 | + | ||
| 649 | + // 开始录音 | ||
| 650 | + function startRecording() { | ||
| 651 | + if (isRecording) return; | ||
| 652 | + | ||
| 653 | + navigator.mediaDevices.getUserMedia({ audio: true }) | ||
| 654 | + .then(function(stream) { | ||
| 655 | + audioChunks = []; | ||
| 656 | + mediaRecorder = new MediaRecorder(stream); | ||
| 657 | + | ||
| 658 | + mediaRecorder.ondataavailable = function(e) { | ||
| 659 | + if (e.data.size > 0) { | ||
| 660 | + audioChunks.push(e.data); | ||
| 661 | + } | ||
| 662 | + }; | ||
| 663 | + | ||
| 664 | + mediaRecorder.start(); | ||
| 665 | + isRecording = true; | ||
| 666 | + | ||
| 667 | + $('#voice-record-btn').addClass('recording-pulse'); | ||
| 668 | + $('#voice-record-btn').css('background-color', '#dc3545'); | ||
| 669 | + | ||
| 670 | + if (recognition) { | ||
| 671 | + recognition.start(); | ||
| 672 | + } | ||
| 673 | + }) | ||
| 674 | + .catch(function(error) { | ||
| 675 | + console.error('无法访问麦克风:', error); | ||
| 676 | + alert('无法访问麦克风,请检查浏览器权限设置。'); | ||
| 677 | + }); | ||
| 678 | + } | ||
| 679 | + | ||
| 680 | + function stopRecording() { | ||
| 681 | + if (!isRecording) return; | ||
| 682 | + | ||
| 683 | + mediaRecorder.stop(); | ||
| 684 | + isRecording = false; | ||
| 685 | + | ||
| 686 | + // 停止所有音轨 | ||
| 687 | + mediaRecorder.stream.getTracks().forEach(track => track.stop()); | ||
| 688 | + | ||
| 689 | + // 视觉反馈恢复 | ||
| 690 | + $('#voice-record-btn').removeClass('recording-pulse'); | ||
| 691 | + $('#voice-record-btn').css('background-color', ''); | ||
| 692 | + | ||
| 693 | + // 停止语音识别 | ||
| 694 | + if (recognition) { | ||
| 695 | + recognition.stop(); | ||
| 696 | + } | ||
| 697 | + | ||
| 698 | + // 获取识别的文本并发送 | ||
| 699 | + setTimeout(function() { | ||
| 700 | + const recognizedText = $('#chat-message').val().trim(); | ||
| 701 | + if (recognizedText) { | ||
| 702 | + // 发送识别的文本 | ||
| 703 | + fetch('/human', { | ||
| 704 | + body: JSON.stringify({ | ||
| 705 | + text: recognizedText, | ||
| 706 | + type: 'chat', | ||
| 707 | + interrupt: true, | ||
| 708 | + sessionid: parseInt(document.getElementById('sessionid').value), | ||
| 709 | + }), | ||
| 710 | + headers: { | ||
| 711 | + 'Content-Type': 'application/json' | ||
| 712 | + }, | ||
| 713 | + method: 'POST' | ||
| 714 | + }); | ||
| 715 | + | ||
| 716 | + addChatMessage(recognizedText, 'user'); | ||
| 717 | + $('#chat-message').val(''); | ||
| 718 | + } | ||
| 719 | + }, 500); | ||
| 720 | + } | ||
| 721 | + | ||
| 722 | + // WebRTC 相关功能 | ||
| 723 | + if (typeof window.onWebRTCConnected === 'function') { | ||
| 724 | + const originalOnConnected = window.onWebRTCConnected; | ||
| 725 | + window.onWebRTCConnected = function() { | ||
| 726 | + updateConnectionStatus('connected'); | ||
| 727 | + if (originalOnConnected) originalOnConnected(); | ||
| 728 | + }; | ||
| 729 | + } else { | ||
| 730 | + window.onWebRTCConnected = function() { | ||
| 731 | + updateConnectionStatus('connected'); | ||
| 732 | + }; | ||
| 733 | + } | ||
| 734 | + | ||
| 735 | + // 当连接断开时更新状态 | ||
| 736 | + if (typeof window.onWebRTCDisconnected === 'function') { | ||
| 737 | + const originalOnDisconnected = window.onWebRTCDisconnected; | ||
| 738 | + window.onWebRTCDisconnected = function() { | ||
| 739 | + updateConnectionStatus('disconnected'); | ||
| 740 | + if (originalOnDisconnected) originalOnDisconnected(); | ||
| 741 | + }; | ||
| 742 | + } else { | ||
| 743 | + window.onWebRTCDisconnected = function() { | ||
| 744 | + updateConnectionStatus('disconnected'); | ||
| 745 | + }; | ||
| 746 | + } | ||
| 747 | + | ||
| 748 | + // SRS WebRTC播放功能 | ||
| 749 | + var sdk = null; // 全局处理器,用于在重新发布时进行清理 | ||
| 750 | + | ||
| 751 | + function startPlay() { | ||
| 752 | + // 关闭之前的连接 | ||
| 753 | + if (sdk) { | ||
| 754 | + sdk.close(); | ||
| 755 | + } | ||
| 756 | + | ||
| 757 | + sdk = new SrsRtcWhipWhepAsync(); | ||
| 758 | + $('#video').prop('srcObject', sdk.stream); | ||
| 759 | + | ||
| 760 | + var host = window.location.hostname; | ||
| 761 | + var url = "http://" + host + ":1985/rtc/v1/whep/?app=live&stream=livestream"; | ||
| 762 | + | ||
| 763 | + sdk.play(url).then(function(session) { | ||
| 764 | + console.log('WebRTC播放已启动,会话ID:', session.sessionid); | ||
| 765 | + }).catch(function(reason) { | ||
| 766 | + sdk.close(); | ||
| 767 | + console.error('WebRTC播放失败:', reason); | ||
| 768 | + }); | ||
| 769 | + } | ||
| 770 | + }); | ||
| 771 | + </script> | ||
| 772 | +</body> | ||
| 773 | +</html> |
-
Please register or login to post a comment