nelfproject
/

NeLF_S2T_Pytorch

Automatic Speech Recognition

Model card Files Files and versions

NeLF_S2T_Pytorch / README.md

Jakobkee's picture

Update README.md

601ba27 verified about 1 month ago

|

history blame contribute delete

2.18 kB

	---
	language:
	- nl
	pipeline_tag: automatic-speech-recognition
	license: cc-by-nc-4.0
	---

	# Model

	This repository contains the third version of our Automatic Speech Recognition and Subtitle Generation model for Flemish Dutch.
	Compared to the [second version](https://huggingface.co/nelfproject/ASR_subtitles_v2), the model is a fully Pytorch-based model without dependency on Kaldi-features, facilitating simple deployment and finetuning.

	The model has been trained on 300 hours of verbatim annotated Flemish data from CGN (with 3-fold noise augmentation), 700 hours of Netherlands Dutch data from CGN, as well as 14000 hours of weakly-supervised subtitled Flemish broadcast media data.
	Additionally, we enriched the training data by generating contextualised verbatim pseudo-labels conditioned on the subtitles, to improve rare word recognition in verbatim transcripts.

	The model can generate both an exact verbatim transcription with annotation tags as well as a fully formatted and cleaned up subtitle transcription. It outputs both modalities with separate decoders.
	The model consists of 180M parameters and requires 2-6GB GPU RAM for inference.

	Version: August 2025

	# Usage

	This repository only hosts the pre-trained model itself and the configuration files. To download this repository, follow the instructions by Huggingface.

	Usage of this model and an example test file to perform inference - fully integrated with a VAD pipeline - can be found on [Github](https://github.com/nelfproject/NeLF_Speech2Text_Pytorch).
	We incorporated the [Silero VAD](https://github.com/snakers4/silero-vad) in this pipeline, which is uploaded as part of the model files.

	The model is released under a Creative Commons Non-Commercial license.

	# Citation

	If you use this model, please cite the research paper:
	```bibtex
	@article{poncelet2024,
	author = "Poncelet, Jakob and Van hamme, Hugo",
	title = "Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling",
	year={2024},
	journal={arXiv preprint arXiv:2502.03212},
	url = {https://arxiv.org/abs/2502.03212}
	```

	# Contact
	Jakob Poncelet: [email protected]