Shuting Ge1,2, Jin Ren2,3,*, Yihua Shi4, Yujun Zhang1, Shunzhi Yang2, Jinfeng Yang2
CMC-Computers, Materials & Continua, Vol.78, No.3, pp. 3215-3245, 2024, DOI:10.32604/cmc.2023.046746
- 26 March 2024
Abstract In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues,… More >