OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
AI & ML interests
voice-conversion speech-separation speech-enhancement speech-translation speech-synthesis speech-recognition spoken-language-understanding
Recent Activity
Organization Card
ESPnet: end-to-end speech processing toolkit
ESPnet is an end-to-end speech processing toolkit covering end-to-end speech recognition, text-to-speech, speech translation, speech enhancement, speaker diarization, spoken language understanding, and so on. ESPnet uses pytorch as a deep learning engine and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for various speech processing experiments.
Citing ESPnet
@inproceedings{watanabe2018espnet,
author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
title={{ESPnet}: End-to-End Speech Processing Toolkit},
year={2018},
booktitle={Proceedings of Interspeech},
pages={2207--2211},
doi={10.21437/Interspeech.2018-1456},
url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}
@inproceedings{hayashi2020espnet,
title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={7654--7658},
year={2020},
organization={IEEE}
}
@inproceedings{inaguma-etal-2020-espnet,
title = "{ESP}net-{ST}: All-in-One Speech Translation Toolkit",
author = "Inaguma, Hirofumi and
Kiyono, Shun and
Duh, Kevin and
Karita, Shigeki and
Yalta, Nelson and
Hayashi, Tomoki and
Watanabe, Shinji",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-demos.34",
pages = "302--311",
}
@inproceedings{li2020espnet,
title={{ESPnet-SE}: End-to-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
author={Chenda Li and Jing Shi and Wangyou Zhang and Aswin Shanmugam Subramanian and Xuankai Chang and Naoyuki Kamo and Moto Hira and Tomoki Hayashi and Christoph Boeddeker and Zhuo Chen and Shinji Watanabe},
booktitle={Proceedings of IEEE Spoken Language Technology Workshop (SLT)},
pages={785--792},
year={2021},
organization={IEEE},
}
@article{arora2021espnet,
title={ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet},
author={Arora, Siddhant and Dalmia, Siddharth and Denisov, Pavel and Chang, Xuankai and Ueda, Yushi and Peng, Yifan and Zhang, Yuekai and Kumar, Sujay and Ganesan, Karthik and Yan, Brian and others},
journal={arXiv preprint arXiv:2111.14706},
year={2021}
}
</details>
spaces
8
pinned
Running
TheESPnetLeaderBoard
🥇
ESPnet Leaderboard
Runtime error
Voice Assistant Demo
📊
Running
SingingSDS
🎶
Generate text with a customizable interface
Running
on
Zero
10
OWSM V4 Demo
🌍
This is a demo for OWSM-V4 CTC and medium model.
Running
1
Svs
💻
Generate singing voice from lyrics, duration, and pitch
Running
1
TTS
🌖
Greet someone by name
models
653
espnet/powsm_ctc
Automatic Speech Recognition
•
Updated
•
29
•
2
espnet/powsm
Automatic Speech Recognition
•
Updated
•
126
•
9
espnet/xun_tal_zh_adult_teach_branchformer
Automatic Speech Recognition
•
Updated
•
1
espnet/xeus_ckpts
Updated
espnet/mixdata_svs_visinger2_spkemb_lang_pretrained
Text-to-Audio
•
Updated
•
3
•
1
espnet/aceopencpop_svs_visinger2_40singer_pretrain
Text-to-Audio
•
Updated
•
9
espnet/visinger2-zh-jp-multisinger-svs
Text-to-Audio
•
Updated
•
4
espnet/mixdata_svs_visinger2_spkemb_lang_pretrained_avg
Text-to-Audio
•
Updated
•
6
espnet/kosp2e-asr-ko
Automatic Speech Recognition
•
Updated
espnet/OpenBEATS-Large-NsynthPitch
Updated
•
1
datasets
24
espnet/data_part2
Viewer
•
Updated
•
178M
•
140
espnet/v2_data_jc
Updated
•
152
espnet/v2_data
Updated
•
4
espnet/librispeech_arkive
Updated
•
1
espnet/yodas_owsmv4
Viewer
•
Updated
•
4
•
963
•
17
espnet/yodas-granary
Viewer
•
Updated
•
67.6M
•
33.6k
•
26
espnet/kising_score_segments
Viewer
•
Updated
•
833
•
13
espnet/yodas2
Updated
•
55.5k
•
45
espnet/DSUChallenge2024
Viewer
•
Updated
•
218k
•
86
•
1
espnet/mms_ulab_v2
Viewer
•
Updated
•
20.7k
•
231
•
25