简介
使用FunAsr进行语音识别,并转为文本或字幕文件。
功能
语音识别
语音时间戳预测
语音转字幕
直接生成 srt 字幕文件
测试内容
1
00:00:00,130 --> 00:00:02,970
今天给小朋友们讲个超有趣的
2
00:00:02,970 --> 00:00:07,030
故事小熊笨笨特别爱画画可他
3
00:00:07,110 --> 00:00:09,190
画啥都不像有一回画
4
00:00:09,270 --> 00:00:13,250
苹果画的像土豆画大树树干
5
00:00:13,350 --> 00:00:16,730
歪歪扭扭伙伴们看了画笑笨笨
6
00:00:16,730 --> 00:00:19,950
可难过了晚上笨笨对着画
7
00:00:19,950 --> 00:00:23,430
哭眼泪掉进画里神奇的事儿
8
00:00:23,430 --> 00:00:27,200
发生了画里的苹果变得又大
9
00:00:27,220 --> 00:00:31,380
又红大树也变得笔直茂盛笨笨
10
00:00:31,380 --> 00:00:35,260
明白了只要用心不放弃就能
11
00:00:35,280 --> 00:00:38,660
画出漂亮画小朋友们你们画画的
12
00:00:38,720 --> 00:00:40,395
时候也要这样哦
官方仓库
模型结构
iic
├── speech_fsmn_vad_zh-cn-16k-common-pytorch
│ ├── README.md
│ ├── am.mvn
│ ├── config.yaml
│ ├── configuration.json
│ ├── example
│ ├── fig
│ └── model.pt
├── speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
│ ├── README.md
│ ├── am.mvn
│ ├── asr_example_hotword.wav
│ ├── config.yaml
│ ├── configuration.json
│ ├── example
│ ├── fig
│ ├── model.pt
│ ├── seg_dict
│ └── tokens.json
└── speech_timestamp_prediction-v1-16k-offline
├── README.md
├── am.mvn
├── config.yaml
├── configuration.json
├── example
├── model.pt
├── seg_dict
└── tokens.json
https://modelscope.cn/models/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files
报错
这个节点分支默认装不上。
pip install funasr
安装不用编译的版本
pip install aliyun-python-sdk-core-v3==2.13.10