Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SPEECH ENHANCEMENT MODEL TRAINING METHOD AND APPARATUS, ENHANCEMENT METHOD, ELECTRONIC DEVICE, STORAGE MEDIUM, AND PROGRAM PRODUCT
Document Type and Number:
WIPO Patent Application WO/2024/027295
Kind Code:
A1
Abstract:
A speech enhancement model training method and apparatus, an enhancement method, an electronic device, a storage medium, and a program product. The speech enhancement model training method comprises: on the basis of a noisy speech feature of a noisy speech signal, calling a speech enhancement model to perform processing to obtain a plurality of first predicted mask values in an auditory domain (101); obtaining a first amplitude and a first phase corresponding to each frequency of the noisy speech signal, and a second amplitude and a second phase corresponding to the each frequency of a pure speech signal (102); determining a phase difference at the each frequency on the basis of the first phase and the second phase corresponding to the each frequency, and correcting the corresponding second amplitude on the basis of the phase difference at the each frequency to obtain the corrected second amplitude corresponding to the each frequency(103); determining a loss value on the basis of the plurality of first predicted mask values, the first amplitude corresponding to the each frequency, and the corrected second amplitude (104); and updating parameters of the speech enhancement model on the basis of the loss value (105).

Inventors:
FANG XUEFEI (CN)
YANG DONG (CN)
CAO MUYONG (CN)
Application Number:
PCT/CN2023/096246
Publication Date:
February 08, 2024
Filing Date:
May 25, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TENCENT TECH SHENZHEN CO LTD (CN)
International Classes:
G10L25/30; G06N3/04; G06N3/08; G10L21/0208; G10L21/0316
Domestic Patent References:
WO2022012195A12022-01-20
Foreign References:
CN114974299A2022-08-30
CN102169694A2011-08-31
CN112700786A2021-04-23
CN113436643A2021-09-24
US20210201928A12021-07-01
CN202210917051A2022-08-01
Other References:
LI ZHENG, LI HONGYAN: "Two-stage speech enhancement algorithm based on time-frequency mask optimization", ELECTRONIC DESIGN ENGINEERING, SHIJIE ZHISHI CHUBANSHE, CN, vol. 30, no. 4, 20 February 2022 (2022-02-20), CN , pages 17 - 21, XP093136271, ISSN: 1674-6236, DOI: 10.14022/j.issn1674-6236.2022.04.004
HAKAN ERDOGAN, JOHN R. HERSHEY, SHINJI WATANABE, JONATHAN LE ROUX: "Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks", 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 1 April 2015 (2015-04-01) - 24 April 2015 (2015-04-24), pages 708 - 712, XP055315025, ISBN: 978-1-4673-6997-8, DOI: 10.1109/ICASSP.2015.7178061
JAMAL NOREZMI, FUAD N., SHA’BANI MNAH, HELMY ABD WAHAB MOHD, ZULKARNAIN SYED IDRUS SYED: "Binary Time-Frequency Mask for Improved Malay Speech Intelligibility at Low SNR Condition", IOP CONFERENCE SERIES: MATERIALS SCIENCE AND ENGINEERING, INSTITUTE OF PHYSICS PUBLISHING LTD., GB, vol. 917, no. 1, 1 September 2020 (2020-09-01), GB , pages 012049, XP093136268, ISSN: 1757-8981, DOI: 10.1088/1757-899X/917/1/012049
Attorney, Agent or Firm:
CHINA PAT INTELLECTUAL PROPERTY OFFICE (CN)
Download PDF: