kaldi 在 mac 下的初体验

背景

Kaldi 是使用人数最多的语音识别开源工具，而且在不断的更新[2]。

更多的背景介绍见[2]，本文尝试编译 Kaldi 并且跑通一些小的例子。

源码编译

下载

git clone https://github.com/kaldi-asr/kaldi

编译 tools

参考 tools/INSTALL 文件

安装依赖包

brew install automake autoconf python3

检查下依赖情况

[note@abeffect tools]$ sh extras/check_dependencies.sh
extras/check_dependencies.sh: all OK.

编译

make

结果

Warning: IRSTLM is not installed by default anymore. If you need IRSTLM
Warning: use the script extras/install_irstlm.sh
All done OK.

安装扩展

在 extras 目录下有多个扩展，可以选择性的安装。

编译 src

参考 src/INSTALL 文件

  cd src
  ./configure --shared
  make depend -j 8
  make -j 8

结果

echo Done
Done

可执行文件见 latbin 目录

[note@abeffect src]$ ls latbin/ | grep -v cc$ | grep -v o$
Makefile
lattice-1best
lattice-add-penalty
lattice-add-trans-probs
lattice-align-phones
lattice-align-words
lattice-align-words-lexicon
lattice-arc-post
lattice-best-path
lattice-boost-ali
lattice-combine
lattice-compose
lattice-confidence
lattice-copy
lattice-copy-backoff
lattice-depth
lattice-depth-per-frame
lattice-determinize
lattice-determinize-non-compact
lattice-determinize-phone-pruned
lattice-determinize-phone-pruned-parallel
lattice-determinize-pruned
lattice-determinize-pruned-parallel
lattice-difference
lattice-equivalent
lattice-expand-ngram
lattice-interp
lattice-limit-depth
lattice-lmrescore
lattice-lmrescore-const-arpa
lattice-lmrescore-kaldi-rnnlm
lattice-lmrescore-kaldi-rnnlm-pruned
lattice-lmrescore-pruned
lattice-lmrescore-rnnlm
lattice-mbr-decode
lattice-minimize
lattice-oracle
lattice-project
lattice-prune
lattice-push
lattice-rescore-mapped
lattice-rmali
lattice-scale
lattice-to-ctm-conf
lattice-to-fst
lattice-to-mpe-post
lattice-to-nbest
lattice-to-phone-lattice
lattice-to-post
lattice-to-smbr-post
lattice-union
linear-to-nbest
nbest-to-ctm
nbest-to-lattice
nbest-to-linear
nbest-to-prons

使用

原来 voxforge 例子中的语音库需要 12.6G 的空间，需要预留 20G 的空间来做实验[6]。

所以还是先从 yesno 来入门吧。

yesno

运行

[note@abeffect kaldi]$ cd egs/yesno/s5/
[note@abeffect s5]$ ./run.sh

结果

......
steps/diagnostic/analyze_lats.sh: see stats in exp/mono0a/decode_test_yesno/log/analyze_lattice_depth_stats.log
local/score.sh --cmd utils/run.pl data/test_yesno exp/mono0a/graph_tgpr exp/mono0a/decode_test_yesno
local/score.sh: scoring with word insertion penalty=0.0,0.5,1.0
%WER 0.00 [ 0 / 232, 0 in , 0 del, 0  ub ] exp/mono0a/decode_te t_ye no/wer_10_1.0

可视化

确保安装了 graphviz

brew install graphviz

语言模型可视化

../../../tools/openfst-1.6.5/bin/fstprint ./data/lang_test_tg/G.fst
0	0	2	2	2.30258512
0	0	3	3	2.30258512
0	2.30258512

../../../tools/openfst-1.6.5/bin/fstdraw ./data/lang_test_tg/G.fst | dot -T ps > g.ps

词典文件可视化

L.fst 文件

$ ../../../tools/openfst-1.6.5/bin/fstprint ./data/lang_test_tg/L.fst
0	1	0	0	0.693147182
0	1	1	0	0.693147182
1	1	1	1
1	1	3	2	0.693147182
1	2	3	2	0.693147182
1	1	2	3	0.693147182
1	2	2	3	0.693147182
1
2	1	1	0

../../../tools/openfst-1.6.5/bin/fstdraw ./data/lang_test_tg/L.fst | dot -T ps > 1.ps

L_disambig 文件

$ ../../../tools/openfst-1.6.5/bin/fstprint ./data/lang_test_tg/L_disambig.fst
0	1	0	0	0.693147182
0	2	1	0	0.693147182
1	1	1	1
1	1	3	2	0.693147182
1	3	3	2	0.693147182
1	1	2	3	0.693147182
1	3	2	3	0.693147182
1	1	4	4
1
2	1	5	0
3	2	1	0

timit

没有下载到完整的 timit 语料，放弃。。

原理

简单补充一下 yesno 的原理。

原始音频

共 60 个 wav 文件，如 waves_yesno/0_0_0_0_1_1_1_1.wav，内容为 yes, no 组成。

kaldi 在 mac 下的初体验

背景

源码编译

下载

编译 tools

安装依赖包

编译

安装扩展

编译 src

使用

yesno

运行

结果

可视化

语言模型可视化

词典文件可视化

timit

原理

原始音频

参考

相关帖子

Google Speech-to-Text API 初体验

MKV 文件格式学习

阿里云录音文件识别初体验

wav 文件格式学习

安卓版中找不到 AI 智能设置

AI 绘画：我们的世界又回来了

AI 绘图：不破楼兰终不还（附提示词）

欢迎来到这里！