纯CPU运行的Kokoro-82M本地部署
Kokoro TTS是由hexgrad团队开发的轻量级开源文本转语音模型,特别适合CPU环境部署。本文介绍了其安装配置过程:首先安装CPU版PyTorch,然后手动安装espeak-ng并配置环境变量,再安装其他依赖项。模型可通过huggingface-cli下载,测试代码展示了如何在CPU上加载模型、配置语音参数并合成语音。测试结果显示系统能高效生成语音文件并计算实时因子,最终输出保存为WAV格
·
Kokoro简介
Kokoro TTS 简介
Kokoro TTS是由 hexgrad 团队开发的一款轻量级、高性能的开源文本转语音(TTS)模型,以其卓越的效率和出色的语音合成质量在业界脱颖而出。特别适合在仅有CPU的环境下部署
安装CPU版本的PyTorch
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cpu
手动安装espeak-ng依赖
下载地址: https://github.com/espeak-ng/espeak-ng/releases
安装完成之后需要配置一下环境变量, 否则启动的时候会报找不到espeak
$env:PHONEMIZER_ESPEAK_LIBRARY = "c:\Program Files\eSpeak NG\libespeak-ng.dll"
$env:PHONEMIZER_ESPEAK_PATH = "c:\Program Files\eSpeak NG"
setx PHONEMIZER_ESPEAK_LIBRARY "c:\Program Files\eSpeak NG\libespeak-ng.dll"
setx PHONEMIZER_ESPEAK_PATH "c:\Program Files\eSpeak NG"
其他依赖安装
pip install kokoro
pip install ordered-set
pip install cn2an
pip install pypinyin_dict
模型下载
# 设置镜像地址(可选,加速下载)
export HF_ENDPOINT=https://hf-mirror.com
# 下载模型(只需要v1.1版本)
huggingface-cli download --resume-download hexgrad/Kokoro-82M-v1.1-zh --local-dir ./ckpts/kokoro-v1.1
测试代码
import torch
import time
from kokoro import KPipeline, KModel
import soundfile as sf
# 设置设备为CPU
device = 'cpu'
# 加载声音文件
voice_zf = "zf_001"
voice_zf_tensor = torch.load(f'ckpts/kokoro-v1.1/voices/{voice_zf}.pt', weights_only=True)
# 模型路径配置
repo_id = 'hexgrad/Kokoro-82M-v1.1-zh'
model_path = 'ckpts/kokoro-v1.1/kokoro-v1_1-zh.pth'
config_path = 'ckpts/kokoro-v1.1/config.json'
# 加载模型到CPU
model = KModel(model=model_path, config=config_path, repo_id=repo_id).to(device).eval()
def speed_callable(len_ps):
speed = 0.8
if len_ps <= 83:
speed = 1
elif len_ps < 183:
speed = 1 - (len_ps - 83) / 500
return speed * 1.1
# 创建管道
zh_pipeline = KPipeline(lang_code='z', repo_id=repo_id, model=model)
# 测试文本
sentence = '你好,这是一个语音合成测试。'
# 语音合成
start_time = time.time()
generator = zh_pipeline(sentence, voice=voice_zf_tensor, speed=speed_callable)
result = next(generator)
wav = result.audio
# 计算处理时间
speech_len = len(wav) / 24000
print('生成语音长度: {}秒, 实时因子: {}'.format(speech_len, (time.time() - start_time) / speech_len))
# 保存音频文件
sf.write('output_cpu.wav', wav, 24000)
print('语音文件已保存为: output_cpu.wav')
更多推荐



所有评论(0)