METHOD FOR DIRECTLY SYNTHESIZING SPEECH FROM TONGUE ULTRASONIC IMAGES

Title:

METHOD FOR DIRECTLY SYNTHESIZING SPEECH FROM TONGUE ULTRASONIC IMAGES

Document Type and Number:

WIPO Patent Application WO/2024/087337

Kind Code:

A1

Abstract:

A method for directly synthesizing speech from tongue ultrasonic images. By means of cross-modal deep learning technology, acoustic features are obtained from tongue ultrasonic video data, Mel frequency spectra are obtained on the basis of the acoustic features, and finally the frequency spectra are decoded into speech, so that end-to-end speech synthesis is implemented. A cross-modal learning model is based on an "encoder-decoder" framework, and mapping relationships between temporal image sequences obtained on the basis of tongue ultrasonic videos and features of the Mel frequency spectra are established. The speech synthesis method is not prone to be affected by the environment and has better confidentiality, and the quality of reconstructed speech is higher.

Inventors:

GUO SHIFENG (CN)
REN WEIMIN (CN)
FENG WEI (CN)
LI YEHAI (CN)
YI ZHENGKUN (CN)
GAO FEI (CN)
TIAN QIONG (CN)

Application Number:

PCT/CN2022/138196

Publication Date:

May 02, 2024

Filing Date:

December 09, 2022

Export Citation:

Click for automatic bibliography generation Help

Assignee:

SHENZHEN INST ADV TECH (CN)

International Classes:

G10L15/25

Foreign References:

CN114974206A	2022-08-30
CN110097610A	2019-08-06
CN113111812A	2021-07-13
CN110428812A	2019-11-08
CN111883107A	2020-11-03
CN112381040A	2021-02-19
CN114329036A	2022-04-12
JP2007018006A	2007-01-25

Attorney, Agent or Firm:

BEIJING ZHONG XUN TONG DA INTELLECTUAL PROPERTY AGENCY CO., LTD. (CN)

Download PDF:

View/Download PDF PDF Help

Previous Patent: ALL-GENETICALLY ENCODED NMN PROTEIN PROBE BASED ON RESONANCE ENERGY TRANSFER, AND USE THEREOF

Next Patent: THERMOSENSITIVE THIN FILM, INFRARED DETECTOR, AND MANUFACTURING METHOD FOR INFRARED DETECTOR