冯杨

同步github官方更新截止Commits on Apr 18, 2025

a9c36c76e569107b5a39b3de8afd6e016b24d662
... ... @@ -16,3 +16,7 @@ pretrained
.DS_Store
workspace/log_ngp.txt
.idea
models/
*.log
dist
\ No newline at end of file
... ...
Real-time interactive streaming digital human enables synchronous audio and video dialogue. It can basically achieve commercial effects.
[Effect of wav2lip](https://www.bilibili.com/video/BV1scwBeyELA/) | [Effect of ernerf](https://www.bilibili.com/video/BV1G1421z73r/) | [Effect of musetalk](https://www.bilibili.com/video/BV1gm421N7vQ/)
## News
- December 8, 2024: Improved multi-concurrency, and the video memory does not increase with the number of concurrent connections.
- December 21, 2024: Added model warm-up for wav2lip and musetalk to solve the problem of stuttering during the first inference. Thanks to [@heimaojinzhangyz](https://github.com/heimaojinzhangyz)
- December 28, 2024: Added the digital human model Ultralight-Digital-Human. Thanks to [@lijihua2017](https://github.com/lijihua2017)
- February 7, 2025: Added fish-speech tts
- February 21, 2025: Added the open-source model wav2lip256. Thanks to @不蠢不蠢
- March 2, 2025: Added Tencent's speech synthesis service
- March 16, 2025: Supports mac gpu inference. Thanks to [@GcsSloop](https://github.com/GcsSloop)
## Features
1. Supports multiple digital human models: ernerf, musetalk, wav2lip, Ultralight-Digital-Human
2. Supports voice cloning
3. Supports interrupting the digital human while it is speaking
4. Supports full-body video stitching
5. Supports rtmp and webrtc
6. Supports video arrangement: Play custom videos when not speaking
7. Supports multi-concurrency
## 1. Installation
Tested on Ubuntu 20.04, Python 3.10, Pytorch 1.12 and CUDA 11.3
### 1.1 Install dependency
```bash
conda create -n nerfstream python=3.10
conda activate nerfstream
# If the cuda version is not 11.3 (confirm the version by running nvidia-smi), install the corresponding version of pytorch according to <https://pytorch.org/get-started/previous-versions/>
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
# If you need to train the ernerf model, install the following libraries
# pip install "git+https://github.com/facebookresearch/pytorch3d.git"
# pip install tensorflow-gpu==2.8.0
# pip install --upgrade "protobuf<=3.20.1"
```
Common installation issues [FAQ](https://livetalking-doc.readthedocs.io/en/latest/faq.html)
For setting up the linux cuda environment, you can refer to this article https://zhuanlan.zhihu.com/p/674972886
## 2. Quick Start
- Download the models
Quark Cloud Disk <https://pan.quark.cn/s/83a750323ef0>
Google Drive <https://drive.google.com/drive/folders/1FOC_MD6wdogyyX_7V1d4NDIO7P9NlSAJ?usp=sharing>
Copy wav2lip256.pth to the models folder of this project and rename it to wav2lip.pth;
Extract wav2lip256_avatar1.tar.gz and copy the entire folder to the data/avatars folder of this project.
- Run
python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1
Open http://serverip:8010/webrtcapi.html in a browser. First click'start' to play the digital human video; then enter any text in the text box and submit it. The digital human will broadcast this text.
<font color=red>The server side needs to open ports tcp:8010; udp:1-65536</font>
If you need to purchase a high-definition wav2lip model for commercial use, [Link](https://livetalking-doc.readthedocs.io/zh-cn/latest/service.html#wav2lip).
- Quick experience
<https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking1.3> Create an instance with this image to run it.
If you can't access huggingface, before running
```
export HF_ENDPOINT=https://hf-mirror.com
```
## 3. More Usage
Usage instructions: <https://livetalking-doc.readthedocs.io/en/latest>
## 4. Docker Run
No need for the previous installation, just run directly.
```
docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/codewithgpu2/lipku-metahuman-stream:2K9qaMBu8v
```
The code is in /root/metahuman-stream. First, git pull to get the latest code, and then execute the commands as in steps 2 and 3.
The following images are provided:
- autodl image: <https://www.codewithgpu.com/i/lipku/metahuman-stream/base>
[autodl Tutorial](https://livetalking-doc.readthedocs.io/en/latest/autodl/README.html)
- ucloud image: <https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_livetalking1.3>
Any port can be opened, and there is no need to deploy an srs service additionally.
[ucloud Tutorial](https://livetalking-doc.readthedocs.io/en/latest/ucloud/ucloud.html)
## 5. TODO
- [x] Added chatgpt to enable digital human dialogue
- [x] Voice cloning
- [x] Replace the digital human with a video when it is silent
- [x] MuseTalk
- [x] Wav2Lip
- [x] Ultralight-Digital-Human
---
If this project is helpful to you, please give it a star. Friends who are interested are also welcome to join in and improve this project together.
* Knowledge Planet: https://t.zsxq.com/7NMyO, where high-quality common problems, best practice experiences, and problem solutions are accumulated.
* WeChat Official Account: Digital Human Technology
![](https://mmbiz.qpic.cn/sz_mmbiz_jpg/l3ZibgueFiaeyfaiaLZGuMGQXnhLWxibpJUS2gfs8Dje6JuMY8zu2tVyU9n8Zx1yaNncvKHBMibX0ocehoITy5qQEZg/640?wxfrom=12&tp=wxpic&usePicPrefetch=1&wx_fmt=jpeg&amp;from=appmsg)
\ No newline at end of file
... ...
Real time interactive streaming digital human, realize audio video synchronous dialogue. It can basically achieve commercial effects.
[English](./README-EN.md) | 中文版
实时交互流式数字人,实现音视频同步对话。基本可以达到商用效果
[wav2lip效果](https://www.bilibili.com/video/BV1scwBeyELA/) | [ernerf效果](https://www.bilibili.com/video/BV1G1421z73r/) | [musetalk效果](https://www.bilibili.com/video/BV1gm421N7vQ/)
[ernerf 效果](https://www.bilibili.com/video/BV1PM4m1y7Q2/) [musetalk 效果](https://www.bilibili.com/video/BV1gm421N7vQ/) [wav2lip 效果](https://www.bilibili.com/video/BV1Bw4m1e74P/)
## 为避免与 3d 数字人混淆,原项目 metahuman-stream 改名为 livetalking,原有链接地址继续可用
## 为避免与3d数字人混淆,原项目metahuman-stream改名为livetalking,原有链接地址继续可用
## News
- 2024.12.8 完善多并发,显存不随并发数增加
- 2024.12.21 添加 wav2lip、musetalk 模型预热,解决第一次推理卡顿问题。感谢@heimaojinzhangyz
- 2024.12.28 添加数字人模型 Ultralight-Digital-Human。 感谢@lijihua2017
- 2025.2.7 添加 fish-speech tts
- 2025.2.21 添加 wav2lip256 开源模型 感谢@不蠢不蠢
- 2024.12.21 添加wav2lip、musetalk模型预热,解决第一次推理卡顿问题。感谢[@heimaojinzhangyz](https://github.com/heimaojinzhangyz)
- 2024.12.28 添加数字人模型Ultralight-Digital-Human。 感谢[@lijihua2017](https://github.com/lijihua2017)
- 2025.2.7 添加fish-speech tts
- 2025.2.21 添加wav2lip256开源模型 感谢@不蠢不蠢
- 2025.3.2 添加腾讯语音合成服务
- 2025.3.16 支持mac gpu推理,感谢[@GcsSloop](https://github.com/GcsSloop)
## Features
1. 支持多种数字人模型: ernerf、musetalk、wav2lip、Ultralight-Digital-Human
2. 支持声音克隆
3. 支持数字人说话被打断
4. 支持全身视频拼接
5. 支持 rtmp 和 webrtc
5. 支持rtmp和webrtc
6. 支持视频编排:不说话时播放自定义视频
7. 支持多并发
... ... @@ -41,59 +39,53 @@ pip install -r requirements.txt
# pip install tensorflow-gpu==2.8.0
# pip install --upgrade "protobuf<=3.20.1"
```
安装常见问题[FAQ](https://livetalking-doc.readthedocs.io/en/latest/faq.html)
linux cuda 环境搭建可以参考这篇文章 https://zhuanlan.zhihu.com/p/674972886
linux cuda环境搭建可以参考这篇文章 https://zhuanlan.zhihu.com/p/674972886
## 2. Quick Start
## 2. Quick Start
- 下载模型
百度云盘<https://pan.baidu.com/s/1yOsQ06-RIDTJd3HFCw4wtA> 密码: ltua
夸克云盘<https://pan.quark.cn/s/83a750323ef0>
GoogleDriver <https://drive.google.com/drive/folders/1FOC_MD6wdogyyX_7V1d4NDIO7P9NlSAJ?usp=sharing>
将 wav2lip256.pth 拷到本项目的 models 下, 重命名为 wav2lip.pth;
将 wav2lip256_avatar1.tar.gz 解压后整个文件夹拷到本项目的 data/avatars 下
将wav2lip256.pth拷到本项目的models下, 重命名为wav2lip.pth;
将wav2lip256_avatar1.tar.gz解压后整个文件夹拷到本项目的data/avatars下
- 运行
python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1 --preload 2
使用 GPU 启动模特 3 号:python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar3 --preload 2
用浏览器打开 http://serverip:8010/webrtcapi.html , 先点‘start',播放数字人视频;然后在文本框输入任意文字,提交。数字人播报该段文字
用浏览器打开http://serverip:8010/webrtcapi.html , 先点‘start',播放数字人视频;然后在文本框输入任意文字,提交。数字人播报该段文字
<font color=red>服务端需要开放端口 tcp:8010; udp:1-65536 </font>
如果需要商用高清 wav2lip 模型,可以与我联系购买
如果需要商用高清wav2lip模型,[链接](https://livetalking-doc.readthedocs.io/zh-cn/latest/service.html#wav2lip)
- 快速体验
<https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking1.3> 用该镜像创建实例即可运行成功
如果访问不了 huggingface,在运行前
如果访问不了huggingface,在运行前
```
export HF_ENDPOINT=https://hf-mirror.com
```
## 3. More Usage
## 3. More Usage
使用说明: <https://livetalking-doc.readthedocs.io/>
## 4. Docker Run
不需要前面的安装,直接运行。
```
docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/codewithgpu2/lipku-metahuman-stream:2K9qaMBu8v
```
代码在/root/metahuman-stream,先 git pull 拉一下最新代码,然后执行命令同第 2、3 步
代码在/root/metahuman-stream,先git pull拉一下最新代码,然后执行命令同第2、3步
提供如下镜像
- autodl镜像: <https://www.codewithgpu.com/i/lipku/metahuman-stream/base>
[autodl教程](https://livetalking-doc.readthedocs.io/en/latest/autodl/README.html)
- ucloud镜像: <https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_livetalking1.3>
可以开放任意端口,不需要另外部署srs服务.
[ucloud教程](https://livetalking-doc.readthedocs.io/en/latest/ucloud/ucloud.html)
- autodl 镜像: <https://www.codewithgpu.com/i/lipku/metahuman-stream/base>
[autodl 教程](https://livetalking-doc.readthedocs.io/en/latest/autodl/README.html)
- ucloud 镜像: <https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_livetalking1.3>
可以开放任意端口,不需要另外部署 srs 服务.
[ucloud 教程](https://livetalking-doc.readthedocs.io/en/latest/ucloud/ucloud.html)
## 5. TODO
- [x] 添加 chatgpt 实现数字人对话
- [x] 添加chatgpt实现数字人对话
- [x] 声音克隆
- [x] 数字人静音时用一段视频代替
- [x] MuseTalk
... ... @@ -101,9 +93,8 @@ docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/c
- [x] Ultralight-Digital-Human
---
如果本项目对你有帮助,帮忙点个star。也欢迎感兴趣的朋友一起来完善该项目.
* 知识星球: https://t.zsxq.com/7NMyO 沉淀高质量常见问题、最佳实践经验、问题解答
* 微信公众号:数字人技术
![](https://mmbiz.qpic.cn/sz_mmbiz_jpg/l3ZibgueFiaeyfaiaLZGuMGQXnhLWxibpJUS2gfs8Dje6JuMY8zu2tVyU9n8Zx1yaNncvKHBMibX0ocehoITy5qQEZg/640?wxfrom=12&tp=wxpic&usePicPrefetch=1&wx_fmt=jpeg&amp;from=appmsg)
如果本项目对你有帮助,帮忙点个 star。也欢迎感兴趣的朋友一起来完善该项目.
- 知识星球: https://t.zsxq.com/7NMyO 沉淀高质量常见问题、最佳实践经验、问题解答
- 微信公众号:数字人技术
![](https://mmbiz.qpic.cn/sz_mmbiz_jpg/l3ZibgueFiaeyfaiaLZGuMGQXnhLWxibpJUS2gfs8Dje6JuMY8zu2tVyU9n8Zx1yaNncvKHBMibX0ocehoITy5qQEZg/640?wxfrom=12&tp=wxpic&usePicPrefetch=1&wx_fmt=jpeg&from=appmsg)
... ...
... ... @@ -201,7 +201,7 @@ async def set_audiotype(request):
params = await request.json()
sessionid = params.get('sessionid',0)
nerfreals[sessionid].set_curr_state(params['audiotype'],params['reinit'])
nerfreals[sessionid].set_custom_state(params['audiotype'],params['reinit'])
return web.Response(
content_type="application/json",
... ... @@ -495,6 +495,8 @@ if __name__ == '__main__':
elif opt.transport=='rtcpush':
pagename='rtcpushapi.html'
logger.info('start http server; http://<serverip>:'+str(opt.listenport)+'/'+pagename)
logger.info('如果使用webrtc,推荐访问webrtc集成前端: http://<serverip>:'+str(opt.listenport)+'/dashboard.html')
def run_server(runner):
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
... ...
... ... @@ -35,7 +35,7 @@ import soundfile as sf
import av
from fractions import Fraction
from ttsreal import EdgeTTS,VoitsTTS,XTTS,CosyVoiceTTS,FishTTS,TencentTTS
from ttsreal import EdgeTTS,SovitsTTS,XTTS,CosyVoiceTTS,FishTTS,TencentTTS
from logger import logger
from tqdm import tqdm
... ... @@ -57,7 +57,7 @@ class BaseReal:
if opt.tts == "edgetts":
self.tts = EdgeTTS(opt,self)
elif opt.tts == "gpt-sovits":
self.tts = VoitsTTS(opt,self)
self.tts = SovitsTTS(opt,self)
elif opt.tts == "xtts":
self.tts = XTTS(opt,self)
elif opt.tts == "cosyvoice":
... ... @@ -262,8 +262,8 @@ class BaseReal:
self.curr_state = 1 #当前视频不循环播放,切换到静音状态
return stream
def set_curr_state(self,audiotype, reinit):
print('set_curr_state:',audiotype)
def set_custom_state(self,audiotype, reinit=True):
print('set_custom_state:',audiotype)
self.curr_state = audiotype
if reinit:
self.custom_audio_index[audiotype] = 0
... ...
... ... @@ -179,8 +179,11 @@ print(f'[INFO] fitting light...')
batch_size = 32
device_default = torch.device("cuda:0")
device_render = torch.device("cuda:0")
device_default = torch.device("cuda:0" if torch.cuda.is_available() else (
"mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu"))
device_render = torch.device("cuda:0" if torch.cuda.is_available() else (
"mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu"))
renderer = Render_3DMM(arg_focal, h, w, batch_size, device_render)
sel_ids = np.arange(0, num_frames, int(num_frames / batch_size))[:batch_size]
... ...
... ... @@ -83,7 +83,7 @@ class Render_3DMM(nn.Module):
img_h=500,
img_w=500,
batch_size=1,
device=torch.device("cuda:0"),
device=torch.device("cuda:0" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")),
):
super(Render_3DMM, self).__init__()
... ...
... ... @@ -147,7 +147,7 @@ if __name__ == '__main__':
seed_everything(opt.seed)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu"))
model = NeRFNetwork(opt)
... ...
... ... @@ -442,7 +442,7 @@ class LPIPSMeter:
self.N = 0
self.net = net
self.device = device if device is not None else torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.device = device if device is not None else torch.device('cuda' if torch.cuda.is_available() else ('mps' if hasattr(torch.backends, "mps") and torch.backends.mps.is_available() else 'cpu'))
self.fn = lpips.LPIPS(net=net).eval().to(self.device)
def clear(self):
... ... @@ -618,7 +618,11 @@ class Trainer(object):
self.flip_init_lips = self.opt.init_lips
self.time_stamp = time.strftime("%Y-%m-%d_%H-%M-%S")
self.scheduler_update_every_step = scheduler_update_every_step
self.device = device if device is not None else torch.device(f'cuda:{local_rank}' if torch.cuda.is_available() else 'cpu')
self.device = device if device is not None else torch.device(
f'cuda:{local_rank}' if torch.cuda.is_available() else (
'mps' if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else 'cpu'
)
)
self.console = Console()
model.to(self.device)
... ...
... ... @@ -56,10 +56,8 @@ from ultralight.unet import Model
from ultralight.audio2feature import Audio2Feature
from logger import logger
device = 'cuda' if torch.cuda.is_available() else 'cpu'
logger.info('Using {} for inference.'.format(device))
device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")
print('Using {} for inference.'.format(device))
def load_model(opt):
audio_processor = Audio2Feature()
... ...
... ... @@ -44,8 +44,8 @@ from basereal import BaseReal
from tqdm import tqdm
from logger import logger
device = 'cuda' if torch.cuda.is_available() else 'cpu'
logger.info('Using {} for inference.'.format(device))
device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")
print('Using {} for inference.'.format(device))
def _load(checkpoint_path):
if device == 'cuda':
... ...
... ... @@ -51,7 +51,7 @@ from logger import logger
def load_model():
# load model weights
audio_processor,vae, unet, pe = load_all_model()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu"))
timesteps = torch.tensor([0], device=device)
pe = pe.half()
vae.vae = vae.vae.half()
... ... @@ -267,23 +267,50 @@ class MuseReal(BaseReal):
def process_frames(self,quit_event,loop=None,audio_track=None,video_track=None):
enable_transition = True # 设置为False禁用过渡效果,True启用
if enable_transition:
self.last_speaking = False
self.transition_start = time.time()
self.transition_duration = 0.1 # 过渡时间
self.last_silent_frame = None # 静音帧缓存
self.last_speaking_frame = None # 说话帧缓存
while not quit_event.is_set():
try:
res_frame,idx,audio_frames = self.res_frame_queue.get(block=True, timeout=1)
except queue.Empty:
continue
if audio_frames[0][1]!=0 and audio_frames[1][1]!=0: #全为静音数据,只需要取fullimg
if enable_transition:
# 检测状态变化
current_speaking = not (audio_frames[0][1]!=0 and audio_frames[1][1]!=0)
if current_speaking != self.last_speaking:
logger.info(f"状态切换:{'说话' if self.last_speaking else '静音'} → {'说话' if current_speaking else '静音'}")
self.transition_start = time.time()
self.last_speaking = current_speaking
if audio_frames[0][1]!=0 and audio_frames[1][1]!=0:
self.speaking = False
audiotype = audio_frames[0][1]
if self.custom_index.get(audiotype) is not None: #有自定义视频
if self.custom_index.get(audiotype) is not None:
mirindex = self.mirror_index(len(self.custom_img_cycle[audiotype]),self.custom_index[audiotype])
combine_frame = self.custom_img_cycle[audiotype][mirindex]
target_frame = self.custom_img_cycle[audiotype][mirindex]
self.custom_index[audiotype] += 1
# if not self.custom_opt[audiotype].loop and self.custom_index[audiotype]>=len(self.custom_img_cycle[audiotype]):
# self.curr_state = 1 #当前视频不循环播放,切换到静音状态
else:
combine_frame = self.frame_list_cycle[idx]
target_frame = self.frame_list_cycle[idx]
if enable_transition:
# 说话→静音过渡
if time.time() - self.transition_start < self.transition_duration and self.last_speaking_frame is not None:
alpha = min(1.0, (time.time() - self.transition_start) / self.transition_duration)
combine_frame = cv2.addWeighted(self.last_speaking_frame, 1-alpha, target_frame, alpha, 0)
else:
combine_frame = target_frame
# 缓存静音帧
self.last_silent_frame = combine_frame.copy()
else:
combine_frame = target_frame
else:
self.speaking = True
bbox = self.coord_list_cycle[idx]
... ... @@ -291,20 +318,29 @@ class MuseReal(BaseReal):
x1, y1, x2, y2 = bbox
try:
res_frame = cv2.resize(res_frame.astype(np.uint8),(x2-x1,y2-y1))
except:
except Exception as e:
logger.warning(f"resize error: {e}")
continue
mask = self.mask_list_cycle[idx]
mask_crop_box = self.mask_coords_list_cycle[idx]
#combine_frame = get_image(ori_frame,res_frame,bbox)
#t=time.perf_counter()
combine_frame = get_image_blending(ori_frame,res_frame,bbox,mask,mask_crop_box)
#print('blending time:',time.perf_counter()-t)
image = combine_frame #(outputs['image'] * 255).astype(np.uint8)
current_frame = get_image_blending(ori_frame,res_frame,bbox,mask,mask_crop_box)
if enable_transition:
# 静音→说话过渡
if time.time() - self.transition_start < self.transition_duration and self.last_silent_frame is not None:
alpha = min(1.0, (time.time() - self.transition_start) / self.transition_duration)
combine_frame = cv2.addWeighted(self.last_silent_frame, 1-alpha, current_frame, alpha, 0)
else:
combine_frame = current_frame
# 缓存说话帧
self.last_speaking_frame = combine_frame.copy()
else:
combine_frame = current_frame
image = combine_frame
new_frame = VideoFrame.from_ndarray(image, format="bgr24")
asyncio.run_coroutine_threadsafe(video_track._queue.put((new_frame,None)), loop)
self.record_video_data(image)
#self.recordq_video.put(new_frame)
for audio_frame in audio_frames:
frame,type,eventpoint = audio_frame
... ... @@ -312,12 +348,8 @@ class MuseReal(BaseReal):
new_frame = AudioFrame(format='s16', layout='mono', samples=frame.shape[0])
new_frame.planes[0].update(frame.tobytes())
new_frame.sample_rate=16000
# if audio_track._queue.qsize()>10:
# time.sleep(0.1)
asyncio.run_coroutine_threadsafe(audio_track._queue.put((new_frame,eventpoint)), loop)
self.record_audio_data(frame)
#self.notify(eventpoint)
#self.recordq_audio.put(new_frame)
logger.info('musereal process_frames thread stop')
def render(self,quit_event,loop=None,audio_track=None,video_track=None):
... ...
... ... @@ -36,7 +36,7 @@ class UNet():
unet_config = json.load(f)
self.model = UNet2DConditionModel(**unet_config)
self.pe = PositionalEncoding(d_model=384)
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu"))
weights = torch.load(model_path) if torch.cuda.is_available() else torch.load(model_path, map_location=self.device)
self.model.load_state_dict(weights)
if use_float16:
... ...
... ... @@ -23,7 +23,7 @@ class VAE():
self.model_path = model_path
self.vae = AutoencoderKL.from_pretrained(self.model_path)
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu"))
self.vae.to(self.device)
if use_float16:
... ...
... ... @@ -325,7 +325,7 @@ def create_musetalk_human(file, avatar_id):
# initialize the mmpose model
device = "cuda" if torch.cuda.is_available() else "cpu"
device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")
fa = FaceAlignment(1, flip_input=False, device=device)
config_file = os.path.join(current_dir, 'utils/dwpose/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py')
checkpoint_file = os.path.abspath(os.path.join(current_dir, '../models/dwpose/dw-ll_ucoco_384.pth'))
... ...
... ... @@ -13,14 +13,14 @@ import torch
from tqdm import tqdm
# initialize the mmpose model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu"))
config_file = './musetalk/utils/dwpose/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py'
checkpoint_file = './models/dwpose/dw-ll_ucoco_384.pth'
model = init_model(config_file, checkpoint_file, device=device)
# initialize the face detection model
device = "cuda" if torch.cuda.is_available() else "cpu"
fa = FaceAlignment(LandmarksType._2D, flip_input=False,device=device)
device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")
fa = FaceAlignment(LandmarksType._2D, flip_input=False, device=device)
# maker if the bbox is not sufficient
coord_placeholder = (0.0,0.0,0.0,0.0)
... ...
... ... @@ -91,7 +91,7 @@ def load_model(name: str, device: Optional[Union[str, torch.device]] = None, dow
"""
if device is None:
device = "cuda" if torch.cuda.is_available() else "cpu"
device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")
if download_root is None:
download_root = os.getenv(
"XDG_CACHE_HOME",
... ...
... ... @@ -78,6 +78,8 @@ def transcribe(
if dtype == torch.float16:
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
dtype = torch.float32
if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
warnings.warn("Performing inference on CPU when MPS is available")
if dtype == torch.float32:
decode_options["fp16"] = False
... ... @@ -135,7 +137,7 @@ def cli():
parser.add_argument("audio", nargs="+", type=str, help="audio file(s) to transcribe")
parser.add_argument("--model", default="small", choices=available_models(), help="name of the Whisper model to use")
parser.add_argument("--model_dir", type=str, default=None, help="the path to save model files; uses ~/.cache/whisper by default")
parser.add_argument("--device", default="cuda" if torch.cuda.is_available() else "cpu", help="device to use for PyTorch inference")
parser.add_argument("--device", default="cuda" if torch.cuda.is_available() else "mps", help="device to use for PyTorch inference")
parser.add_argument("--output_dir", "-o", type=str, default=".", help="directory to save the outputs")
parser.add_argument("--verbose", type=str2bool, default=True, help="whether to print out the progress and debug messages")
... ...
... ... @@ -30,7 +30,7 @@ class NerfASR(BaseASR):
def __init__(self, opt, parent, audio_processor,audio_model):
super().__init__(opt,parent)
self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
self.device = "cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu")
if 'esperanto' in self.opt.asr_model:
self.audio_dim = 44
elif 'deepspeech' in self.opt.asr_model:
... ...
... ... @@ -77,7 +77,7 @@ def load_model(opt):
seed_everything(opt.seed)
logger.info(opt)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device = torch.device('cuda' if torch.cuda.is_available() else ('mps' if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else 'cpu'))
model = NeRFNetwork(opt)
criterion = torch.nn.MSELoss(reduction='none')
... ...
... ... @@ -90,7 +90,7 @@ class BaseTTS:
###########################################################################################
class EdgeTTS(BaseTTS):
def txt_to_audio(self,msg):
voicename = "zh-CN-XiaoxiaoNeural"
voicename = "zh-CN-YunxiaNeural"
text,textevent = msg
t = time.time()
asyncio.new_event_loop().run_until_complete(self.__main(voicename,text))
... ... @@ -107,9 +107,9 @@ class EdgeTTS(BaseTTS):
eventpoint=None
streamlen -= self.chunk
if idx==0:
eventpoint={'status':'start','text':text,'msgenvent':textevent}
eventpoint={'status':'start','text':text,'msgevent':textevent}
elif streamlen<self.chunk:
eventpoint={'status':'end','text':text,'msgenvent':textevent}
eventpoint={'status':'end','text':text,'msgevent':textevent}
self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint)
idx += self.chunk
#if streamlen>0: #skip last frame(not 20ms)
... ... @@ -219,16 +219,16 @@ class FishTTS(BaseTTS):
while streamlen >= self.chunk:
eventpoint=None
if first:
eventpoint={'status':'start','text':text,'msgenvent':textevent}
eventpoint={'status':'start','text':text,'msgevent':textevent}
first = False
self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint)
streamlen -= self.chunk
idx += self.chunk
eventpoint={'status':'end','text':text,'msgenvent':textevent}
eventpoint={'status':'end','text':text,'msgevent':textevent}
self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint)
###########################################################################################
class VoitsTTS(BaseTTS):
class SovitsTTS(BaseTTS):
def txt_to_audio(self,msg):
text,textevent = msg
self.stream_tts(
... ... @@ -316,12 +316,12 @@ class VoitsTTS(BaseTTS):
while streamlen >= self.chunk:
eventpoint=None
if first:
eventpoint={'status':'start','text':text,'msgenvent':textevent}
eventpoint={'status':'start','text':text,'msgevent':textevent}
first = False
self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint)
streamlen -= self.chunk
idx += self.chunk
eventpoint={'status':'end','text':text,'msgenvent':textevent}
eventpoint={'status':'end','text':text,'msgevent':textevent}
self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint)
###########################################################################################
... ... @@ -382,12 +382,12 @@ class CosyVoiceTTS(BaseTTS):
while streamlen >= self.chunk:
eventpoint=None
if first:
eventpoint={'status':'start','text':text,'msgenvent':textevent}
eventpoint={'status':'start','text':text,'msgevent':textevent}
first = False
self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint)
streamlen -= self.chunk
idx += self.chunk
eventpoint={'status':'end','text':text,'msgenvent':textevent}
eventpoint={'status':'end','text':text,'msgevent':textevent}
self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint)
###########################################################################################
... ... @@ -505,13 +505,13 @@ class TencentTTS(BaseTTS):
while streamlen >= self.chunk:
eventpoint=None
if first:
eventpoint={'status':'start','text':text,'msgenvent':textevent}
eventpoint={'status':'start','text':text,'msgevent':textevent}
first = False
self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint)
streamlen -= self.chunk
idx += self.chunk
last_stream = stream[idx:] #get the remain stream
eventpoint={'status':'end','text':text,'msgenvent':textevent}
eventpoint={'status':'end','text':text,'msgevent':textevent}
self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint)
###########################################################################################
... ... @@ -583,10 +583,10 @@ class XTTS(BaseTTS):
while streamlen >= self.chunk:
eventpoint=None
if first:
eventpoint={'status':'start','text':text,'msgenvent':textevent}
eventpoint={'status':'start','text':text,'msgevent':textevent}
first = False
self.parent.put_audio_frame(stream[idx:idx+self.chunk],eventpoint)
streamlen -= self.chunk
idx += self.chunk
eventpoint={'status':'end','text':text,'msgenvent':textevent}
eventpoint={'status':'end','text':text,'msgevent':textevent}
self.parent.put_audio_frame(np.zeros(self.chunk,np.float32),eventpoint)
\ No newline at end of file
... ...
... ... @@ -236,7 +236,7 @@ if __name__ == '__main__':
if hasattr(module, 'reparameterize'):
module.reparameterize()
return model
device = torch.device("cuda")
device = torch.device("cuda" if torch.cuda.is_available() else ("mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else "cpu"))
def check_onnx(torch_out, torch_in, audio):
onnx_model = onnx.load(onnx_path)
onnx.checker.check_model(onnx_model)
... ...
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>livetalking数字人交互平台</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.10.0/font/bootstrap-icons.css">
<style>
:root {
--primary-color: #4361ee;
--secondary-color: #3f37c9;
--accent-color: #4895ef;
--background-color: #f8f9fa;
--card-bg: #ffffff;
--text-color: #212529;
--border-radius: 10px;
--box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
}
body {
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
background-color: var(--background-color);
color: var(--text-color);
min-height: 100vh;
padding-top: 20px;
}
.dashboard-container {
max-width: 1400px;
margin: 0 auto;
padding: 20px;
}
.card {
background-color: var(--card-bg);
border-radius: var(--border-radius);
box-shadow: var(--box-shadow);
border: none;
margin-bottom: 20px;
overflow: hidden;
}
.card-header {
background-color: var(--primary-color);
color: white;
font-weight: 600;
padding: 15px 20px;
border-bottom: none;
}
.video-container {
position: relative;
width: 100%;
background-color: #000;
border-radius: var(--border-radius);
overflow: hidden;
display: flex;
justify-content: center;
align-items: center;
}
video {
max-width: 100%;
max-height: 100%;
display: block;
border-radius: var(--border-radius);
}
.controls-container {
padding: 20px;
}
.btn-primary {
background-color: var(--primary-color);
border-color: var(--primary-color);
}
.btn-primary:hover {
background-color: var(--secondary-color);
border-color: var(--secondary-color);
}
.btn-outline-primary {
color: var(--primary-color);
border-color: var(--primary-color);
}
.btn-outline-primary:hover {
background-color: var(--primary-color);
color: white;
}
.form-control {
border-radius: var(--border-radius);
padding: 10px 15px;
border: 1px solid #ced4da;
}
.form-control:focus {
border-color: var(--accent-color);
box-shadow: 0 0 0 0.25rem rgba(67, 97, 238, 0.25);
}
.status-indicator {
width: 10px;
height: 10px;
border-radius: 50%;
display: inline-block;
margin-right: 5px;
}
.status-connected {
background-color: #28a745;
}
.status-disconnected {
background-color: #dc3545;
}
.status-connecting {
background-color: #ffc107;
}
.asr-container {
height: 300px;
overflow-y: auto;
padding: 15px;
background-color: #f8f9fa;
border-radius: var(--border-radius);
border: 1px solid #ced4da;
}
.asr-text {
margin-bottom: 10px;
padding: 10px;
background-color: white;
border-radius: var(--border-radius);
box-shadow: 0 1px 3px rgba(0, 0, 0, 0.1);
}
.user-message {
background-color: #e3f2fd;
border-left: 4px solid var(--primary-color);
}
.system-message {
background-color: #f1f8e9;
border-left: 4px solid #8bc34a;
}
.recording-indicator {
position: absolute;
top: 15px;
right: 15px;
background-color: rgba(220, 53, 69, 0.8);
color: white;
padding: 5px 10px;
border-radius: 20px;
font-size: 0.8rem;
display: none;
}
.recording-indicator.active {
display: flex;
align-items: center;
}
.recording-indicator .blink {
width: 10px;
height: 10px;
background-color: #fff;
border-radius: 50%;
margin-right: 5px;
animation: blink 1s infinite;
}
@keyframes blink {
0% { opacity: 1; }
50% { opacity: 0.3; }
100% { opacity: 1; }
}
.mode-switch {
margin-bottom: 20px;
}
.nav-tabs .nav-link {
color: var(--text-color);
border: none;
padding: 10px 20px;
border-radius: var(--border-radius) var(--border-radius) 0 0;
}
.nav-tabs .nav-link.active {
color: var(--primary-color);
background-color: var(--card-bg);
border-bottom: 3px solid var(--primary-color);
font-weight: 600;
}
.tab-content {
padding: 20px;
background-color: var(--card-bg);
border-radius: 0 0 var(--border-radius) var(--border-radius);
}
.settings-panel {
padding: 15px;
background-color: #f8f9fa;
border-radius: var(--border-radius);
margin-top: 15px;
}
.footer {
text-align: center;
margin-top: 30px;
padding: 20px 0;
color: #6c757d;
font-size: 0.9rem;
}
.voice-record-btn {
width: 60px;
height: 60px;
border-radius: 50%;
background-color: var(--primary-color);
color: white;
display: flex;
justify-content: center;
align-items: center;
cursor: pointer;
transition: all 0.2s ease;
box-shadow: 0 2px 5px rgba(0,0,0,0.2);
margin: 0 auto;
}
.voice-record-btn:hover {
background-color: var(--secondary-color);
transform: scale(1.05);
}
.voice-record-btn:active {
background-color: #dc3545;
transform: scale(0.95);
}
.voice-record-btn i {
font-size: 24px;
}
.voice-record-label {
text-align: center;
margin-top: 10px;
font-size: 14px;
color: #6c757d;
}
.video-size-control {
margin-top: 15px;
}
.recording-pulse {
animation: pulse 1.5s infinite;
}
@keyframes pulse {
0% {
box-shadow: 0 0 0 0 rgba(220, 53, 69, 0.7);
}
70% {
box-shadow: 0 0 0 15px rgba(220, 53, 69, 0);
}
100% {
box-shadow: 0 0 0 0 rgba(220, 53, 69, 0);
}
}
</style>
</head>
<body>
<div class="dashboard-container">
<div class="row">
<div class="col-12">
<h1 class="text-center mb-4">livetalking数字人交互平台</h1>
</div>
</div>
<div class="row">
<!-- 视频区域 -->
<div class="col-lg-8">
<div class="card">
<div class="card-header d-flex justify-content-between align-items-center">
<div>
<span class="status-indicator status-disconnected" id="connection-status"></span>
<span id="status-text">未连接</span>
</div>
</div>
<div class="card-body p-0">
<div class="video-container">
<video id="video" autoplay playsinline></video>
<div class="recording-indicator" id="recording-indicator">
<div class="blink"></div>
<span>录制中</span>
</div>
</div>
<div class="controls-container">
<div class="row">
<div class="col-md-6 mb-3">
<button class="btn btn-primary w-100" id="start">
<i class="bi bi-play-fill"></i> 开始连接
</button>
<button class="btn btn-danger w-100" id="stop" style="display: none;">
<i class="bi bi-stop-fill"></i> 停止连接
</button>
</div>
<div class="col-md-6 mb-3">
<div class="d-flex">
<button class="btn btn-outline-primary flex-grow-1 me-2" id="btn_start_record">
<i class="bi bi-record-fill"></i> 开始录制
</button>
<button class="btn btn-outline-danger flex-grow-1" id="btn_stop_record" disabled>
<i class="bi bi-stop-fill"></i> 停止录制
</button>
</div>
</div>
</div>
<div class="row">
<div class="col-12">
<div class="video-size-control">
<label for="video-size-slider" class="form-label">视频大小调节: <span id="video-size-value">100%</span></label>
<input type="range" class="form-range" id="video-size-slider" min="50" max="150" value="100">
</div>
</div>
</div>
<div class="settings-panel mt-3">
<div class="row">
<div class="col-md-12">
<div class="form-check form-switch mb-3">
<input class="form-check-input" type="checkbox" id="use-stun">
<label class="form-check-label" for="use-stun">使用STUN服务器</label>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<!-- 右侧交互 -->
<div class="col-lg-4">
<div class="card">
<div class="card-header">
<ul class="nav nav-tabs card-header-tabs" id="interaction-tabs" role="tablist">
<li class="nav-item" role="presentation">
<button class="nav-link active" id="chat-tab" data-bs-toggle="tab" data-bs-target="#chat" type="button" role="tab" aria-controls="chat" aria-selected="true">对话模式</button>
</li>
<li class="nav-item" role="presentation">
<button class="nav-link" id="tts-tab" data-bs-toggle="tab" data-bs-target="#tts" type="button" role="tab" aria-controls="tts" aria-selected="false">朗读模式</button>
</li>
</ul>
</div>
<div class="card-body">
<div class="tab-content" id="interaction-tabs-content">
<!-- 对话模式 -->
<div class="tab-pane fade show active" id="chat" role="tabpanel" aria-labelledby="chat-tab">
<div class="asr-container mb-3" id="chat-messages">
<div class="asr-text system-message">
系统: 欢迎使用livetalking,请点击"开始连接"按钮开始对话。
</div>
</div>
<form id="chat-form">
<div class="input-group mb-3">
<textarea class="form-control" id="chat-message" rows="3" placeholder="输入您想对数字人说的话..."></textarea>
<button class="btn btn-primary" type="submit">
<i class="bi bi-send"></i> 发送
</button>
</div>
</form>
<!-- 按住说话按钮 -->
<div class="voice-record-btn" id="voice-record-btn">
<i class="bi bi-mic-fill"></i>
</div>
<div class="voice-record-label">按住说话,松开发送</div>
</div>
<!-- 朗读模式 -->
<div class="tab-pane fade" id="tts" role="tabpanel" aria-labelledby="tts-tab">
<form id="echo-form">
<div class="mb-3">
<label for="message" class="form-label">输入要朗读的文本</label>
<textarea class="form-control" id="message" rows="6" placeholder="输入您想让数字人朗读的文字..."></textarea>
</div>
<button type="submit" class="btn btn-primary w-100">
<i class="bi bi-volume-up"></i> 朗读文本
</button>
</form>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="footer">
<p>Made with ❤️ by Marstaos | Frontend & Performance Optimization</p>
</div>
</div>
<!-- 隐藏的会话ID -->
<input type="hidden" id="sessionid" value="0">
<script src="client.js"></script>
<script src="srs.sdk.js"></script>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script>
<script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
<script>
$(document).ready(function() {
$('#video-size-slider').on('input', function() {
const value = $(this).val();
$('#video-size-value').text(value + '%');
$('#video').css('width', value + '%');
});
function updateConnectionStatus(status) {
const statusIndicator = $('#connection-status');
const statusText = $('#status-text');
statusIndicator.removeClass('status-connected status-disconnected status-connecting');
switch(status) {
case 'connected':
statusIndicator.addClass('status-connected');
statusText.text('已连接');
break;
case 'connecting':
statusIndicator.addClass('status-connecting');
statusText.text('连接中...');
break;
case 'disconnected':
default:
statusIndicator.addClass('status-disconnected');
statusText.text('未连接');
break;
}
}
// 添加聊天消息
function addChatMessage(message, type = 'user') {
const messagesContainer = $('#chat-messages');
const messageClass = type === 'user' ? 'user-message' : 'system-message';
const sender = type === 'user' ? '您' : '数字人';
const messageElement = $(`
<div class="asr-text ${messageClass}">
${sender}: ${message}
</div>
`);
messagesContainer.append(messageElement);
messagesContainer.scrollTop(messagesContainer[0].scrollHeight);
}
// 开始/停止按钮
$('#start').click(function() {
updateConnectionStatus('connecting');
start();
$(this).hide();
$('#stop').show();
// 添加定时器检查视频流是否已加载
let connectionCheckTimer = setInterval(function() {
const video = document.getElementById('video');
// 检查视频是否有数据
if (video.readyState >= 3 && video.videoWidth > 0) {
updateConnectionStatus('connected');
clearInterval(connectionCheckTimer);
}
}, 2000); // 每2秒检查一次
// 60秒后如果还是连接中状态,就停止检查
setTimeout(function() {
if (connectionCheckTimer) {
clearInterval(connectionCheckTimer);
}
}, 60000);
});
$('#stop').click(function() {
stop();
$(this).hide();
$('#start').show();
updateConnectionStatus('disconnected');
});
// 录制功能
$('#btn_start_record').click(function() {
console.log('Starting recording...');
fetch('/record', {
body: JSON.stringify({
type: 'start_record',
sessionid: parseInt(document.getElementById('sessionid').value),
}),
headers: {
'Content-Type': 'application/json'
},
method: 'POST'
}).then(function(response) {
if (response.ok) {
console.log('Recording started.');
$('#btn_start_record').prop('disabled', true);
$('#btn_stop_record').prop('disabled', false);
$('#recording-indicator').addClass('active');
} else {
console.error('Failed to start recording.');
}
}).catch(function(error) {
console.error('Error:', error);
});
});
$('#btn_stop_record').click(function() {
console.log('Stopping recording...');
fetch('/record', {
body: JSON.stringify({
type: 'end_record',
sessionid: parseInt(document.getElementById('sessionid').value),
}),
headers: {
'Content-Type': 'application/json'
},
method: 'POST'
}).then(function(response) {
if (response.ok) {
console.log('Recording stopped.');
$('#btn_start_record').prop('disabled', false);
$('#btn_stop_record').prop('disabled', true);
$('#recording-indicator').removeClass('active');
} else {
console.error('Failed to stop recording.');
}
}).catch(function(error) {
console.error('Error:', error);
});
});
$('#echo-form').on('submit', function(e) {
e.preventDefault();
var message = $('#message').val();
if (!message.trim()) return;
console.log('Sending echo message:', message);
fetch('/human', {
body: JSON.stringify({
text: message,
type: 'echo',
interrupt: true,
sessionid: parseInt(document.getElementById('sessionid').value),
}),
headers: {
'Content-Type': 'application/json'
},
method: 'POST'
});
$('#message').val('');
addChatMessage(`已发送朗读请求: "${message}"`, 'system');
});
// 聊天模式表单提交
$('#chat-form').on('submit', function(e) {
e.preventDefault();
var message = $('#chat-message').val();
if (!message.trim()) return;
console.log('Sending chat message:', message);
fetch('/human', {
body: JSON.stringify({
text: message,
type: 'chat',
interrupt: true,
sessionid: parseInt(document.getElementById('sessionid').value),
}),
headers: {
'Content-Type': 'application/json'
},
method: 'POST'
});
addChatMessage(message, 'user');
$('#chat-message').val('');
});
// 按住说话功能
let mediaRecorder;
let audioChunks = [];
let isRecording = false;
let recognition;
// 检查浏览器是否支持语音识别
const isSpeechRecognitionSupported = 'webkitSpeechRecognition' in window || 'SpeechRecognition' in window;
if (isSpeechRecognitionSupported) {
recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();
recognition.continuous = true;
recognition.interimResults = true;
recognition.lang = 'zh-CN';
recognition.onresult = function(event) {
let interimTranscript = '';
let finalTranscript = '';
for (let i = event.resultIndex; i < event.results.length; ++i) {
if (event.results[i].isFinal) {
finalTranscript += event.results[i][0].transcript;
} else {
interimTranscript += event.results[i][0].transcript;
$('#chat-message').val(interimTranscript);
}
}
if (finalTranscript) {
$('#chat-message').val(finalTranscript);
}
};
recognition.onerror = function(event) {
console.error('语音识别错误:', event.error);
};
}
// 按住说话按钮事件
$('#voice-record-btn').on('mousedown touchstart', function(e) {
e.preventDefault();
startRecording();
}).on('mouseup mouseleave touchend', function() {
if (isRecording) {
stopRecording();
}
});
// 开始录音
function startRecording() {
if (isRecording) return;
navigator.mediaDevices.getUserMedia({ audio: true })
.then(function(stream) {
audioChunks = [];
mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = function(e) {
if (e.data.size > 0) {
audioChunks.push(e.data);
}
};
mediaRecorder.start();
isRecording = true;
$('#voice-record-btn').addClass('recording-pulse');
$('#voice-record-btn').css('background-color', '#dc3545');
if (recognition) {
recognition.start();
}
})
.catch(function(error) {
console.error('无法访问麦克风:', error);
alert('无法访问麦克风,请检查浏览器权限设置。');
});
}
function stopRecording() {
if (!isRecording) return;
mediaRecorder.stop();
isRecording = false;
// 停止所有音轨
mediaRecorder.stream.getTracks().forEach(track => track.stop());
// 视觉反馈恢复
$('#voice-record-btn').removeClass('recording-pulse');
$('#voice-record-btn').css('background-color', '');
// 停止语音识别
if (recognition) {
recognition.stop();
}
// 获取识别的文本并发送
setTimeout(function() {
const recognizedText = $('#chat-message').val().trim();
if (recognizedText) {
// 发送识别的文本
fetch('/human', {
body: JSON.stringify({
text: recognizedText,
type: 'chat',
interrupt: true,
sessionid: parseInt(document.getElementById('sessionid').value),
}),
headers: {
'Content-Type': 'application/json'
},
method: 'POST'
});
addChatMessage(recognizedText, 'user');
$('#chat-message').val('');
}
}, 500);
}
// WebRTC 相关功能
if (typeof window.onWebRTCConnected === 'function') {
const originalOnConnected = window.onWebRTCConnected;
window.onWebRTCConnected = function() {
updateConnectionStatus('connected');
if (originalOnConnected) originalOnConnected();
};
} else {
window.onWebRTCConnected = function() {
updateConnectionStatus('connected');
};
}
// 当连接断开时更新状态
if (typeof window.onWebRTCDisconnected === 'function') {
const originalOnDisconnected = window.onWebRTCDisconnected;
window.onWebRTCDisconnected = function() {
updateConnectionStatus('disconnected');
if (originalOnDisconnected) originalOnDisconnected();
};
} else {
window.onWebRTCDisconnected = function() {
updateConnectionStatus('disconnected');
};
}
// SRS WebRTC播放功能
var sdk = null; // 全局处理器,用于在重新发布时进行清理
function startPlay() {
// 关闭之前的连接
if (sdk) {
sdk.close();
}
sdk = new SrsRtcWhipWhepAsync();
$('#video').prop('srcObject', sdk.stream);
var host = window.location.hostname;
var url = "http://" + host + ":1985/rtc/v1/whep/?app=live&stream=livestream";
sdk.play(url).then(function(session) {
console.log('WebRTC播放已启动,会话ID:', session.sessionid);
}).catch(function(reason) {
sdk.close();
console.error('WebRTC播放失败:', reason);
});
}
});
</script>
</body>
</html>
\ No newline at end of file
... ...