README.md 4.7 KB

Raw Blame History Permalink


Real time interactive streaming digital human， realize audio video synchronous dialogue. It can basically achieve commercial effects.

实时交互流式数字人，实现音视频同步对话。基本可以达到商用效果

ernerf 效果 musetalk 效果 wav2lip 效果


为避免与 3d 数字人混淆，原项目 metahuman-stream 改名为 livetalking，原有链接地址继续可用


News


2024.12.8 完善多并发，显存不随并发数增加
2024.12.21 添加 wav2lip、musetalk 模型预热，解决第一次推理卡顿问题。感谢@heimaojinzhangyz
2024.12.28 添加数字人模型 Ultralight-Digital-Human。 感谢@lijihua2017
2025.2.7 添加 fish-speech tts
2025.2.21 添加 wav2lip256 开源模型 感谢@不蠢不蠢
2025.3.2 添加腾讯语音合成服务


Features


支持多种数字人模型: ernerf、musetalk、wav2lip、Ultralight-Digital-Human
支持声音克隆
支持数字人说话被打断
支持全身视频拼接
支持 rtmp 和 webrtc
支持视频编排：不说话时播放自定义视频
支持多并发


1. Installation

Tested on Ubuntu 20.04, Python3.10, Pytorch 1.12 and CUDA 11.3


1.1 Install dependency

conda create -n nerfstream python=3.10
conda activate nerfstream
#如果cuda版本不为11.3(运行nvidia-smi确认版本)，根据<https://pytorch.org/get-started/previous-versions/>安装对应版本的pytorch
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
#如果需要训练ernerf模型，安装下面的库
# pip install "git+https://github.com/facebookresearch/pytorch3d.git"
# pip install tensorflow-gpu==2.8.0
# pip install --upgrade "protobuf<=3.20.1"


安装常见问题FAQ

linux cuda 环境搭建可以参考这篇文章 https://zhuanlan.zhihu.com/p/674972886


2. Quick Start


下载模型

百度云盘https://pan.baidu.com/s/1yOsQ06-RIDTJd3HFCw4wtA 密码: ltua

GoogleDriver https://drive.google.com/drive/folders/1FOC_MD6wdogyyX_7V1d4NDIO7P9NlSAJ?usp=sharing

将 wav2lip256.pth 拷到本项目的 models 下, 重命名为 wav2lip.pth;

将 wav2lip256_avatar1.tar.gz 解压后整个文件夹拷到本项目的 data/avatars 下
运行

python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1 --preload 2


使用 GPU 启动模特 3 号：python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar3 --preload 2
  用浏览器打开 http://serverip:8010/webrtcapi.html , 先点‘start',播放数字人视频；然后在文本框输入任意文字，提交。数字人播报该段文字

  服务端需要开放端口 tcp:8010; udp:1-65536 

  如果需要商用高清 wav2lip 模型，可以与我联系购买


快速体验

https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking1.3 用该镜像创建实例即可运行成功


如果访问不了 huggingface，在运行前

export HF_ENDPOINT=https://hf-mirror.com


3. More Usage

使用说明: https://livetalking-doc.readthedocs.io/


4. Docker Run

不需要前面的安装，直接运行。

docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/codewithgpu2/lipku-metahuman-stream:2K9qaMBu8v


代码在/root/metahuman-stream，先 git pull 拉一下最新代码，然后执行命令同第 2、3 步

提供如下镜像


autodl 镜像: https://www.codewithgpu.com/i/lipku/metahuman-stream/base

autodl 教程

ucloud 镜像: https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_livetalking1.3

可以开放任意端口，不需要另外部署 srs 服务.

ucloud 教程


5. TODO


 添加 chatgpt 实现数字人对话

 声音克隆

 数字人静音时用一段视频代替

 MuseTalk

 Wav2Lip

 Ultralight-Digital-Human


如果本项目对你有帮助，帮忙点个 star。也欢迎感兴趣的朋友一起来完善该项目.


知识星球: https://t.zsxq.com/7NMyO 沉淀高质量常见问题、最佳实践经验、问题解答
微信公众号：数字人技术