Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR DIRECTLY SYNTHESIZING SPEECH FROM TONGUE ULTRASONIC IMAGES
Document Type and Number:
WIPO Patent Application WO/2024/087337
Kind Code:
A1
Abstract:
A method for directly synthesizing speech from tongue ultrasonic images. By means of cross-modal deep learning technology, acoustic features are obtained from tongue ultrasonic video data, Mel frequency spectra are obtained on the basis of the acoustic features, and finally the frequency spectra are decoded into speech, so that end-to-end speech synthesis is implemented. A cross-modal learning model is based on an "encoder-decoder" framework, and mapping relationships between temporal image sequences obtained on the basis of tongue ultrasonic videos and features of the Mel frequency spectra are established. The speech synthesis method is not prone to be affected by the environment and has better confidentiality, and the quality of reconstructed speech is higher.

Inventors:
GUO SHIFENG (CN)
REN WEIMIN (CN)
FENG WEI (CN)
LI YEHAI (CN)
YI ZHENGKUN (CN)
GAO FEI (CN)
TIAN QIONG (CN)
Application Number:
PCT/CN2022/138196
Publication Date:
May 02, 2024
Filing Date:
December 09, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SHENZHEN INST ADV TECH (CN)
International Classes:
G10L15/25
Foreign References:
CN114974206A2022-08-30
CN110097610A2019-08-06
CN113111812A2021-07-13
CN110428812A2019-11-08
CN111883107A2020-11-03
CN112381040A2021-02-19
CN114329036A2022-04-12
JP2007018006A2007-01-25
Attorney, Agent or Firm:
BEIJING ZHONG XUN TONG DA INTELLECTUAL PROPERTY AGENCY CO., LTD. (CN)
Download PDF: