SPEECH ENHANCEMENT MODEL TRAINING METHOD AND APPARATUS, ENHANCEMENT METHOD, ELECTRONIC DEVICE, STORAGE MEDIUM, AND PROGRAM PRODUCT

Title:

SPEECH ENHANCEMENT MODEL TRAINING METHOD AND APPARATUS, ENHANCEMENT METHOD, ELECTRONIC DEVICE, STORAGE MEDIUM, AND PROGRAM PRODUCT

Document Type and Number:

WIPO Patent Application WO/2024/027295

Kind Code:

A1

Abstract:

A speech enhancement model training method and apparatus, an enhancement method, an electronic device, a storage medium, and a program product. The speech enhancement model training method comprises: on the basis of a noisy speech feature of a noisy speech signal, calling a speech enhancement model to perform processing to obtain a plurality of first predicted mask values in an auditory domain (101); obtaining a first amplitude and a first phase corresponding to each frequency of the noisy speech signal, and a second amplitude and a second phase corresponding to the each frequency of a pure speech signal (102); determining a phase difference at the each frequency on the basis of the first phase and the second phase corresponding to the each frequency, and correcting the corresponding second amplitude on the basis of the phase difference at the each frequency to obtain the corrected second amplitude corresponding to the each frequency(103); determining a loss value on the basis of the plurality of first predicted mask values, the first amplitude corresponding to the each frequency, and the corrected second amplitude (104); and updating parameters of the speech enhancement model on the basis of the loss value (105).

Inventors:

FANG XUEFEI (CN)
YANG DONG (CN)
CAO MUYONG (CN)

Application Number:

PCT/CN2023/096246

Publication Date:

February 08, 2024

Filing Date:

May 25, 2023

Export Citation:

Click for automatic bibliography generation Help

Assignee:

TENCENT TECH SHENZHEN CO LTD (CN)

International Classes:

G10L25/30; G06N3/04; G06N3/08; G10L21/0208; G10L21/0316

Domestic Patent References:

WO2022012195A1

2022-01-20

Foreign References:

CN114974299A	2022-08-30
CN102169694A	2011-08-31
CN112700786A	2021-04-23
CN113436643A	2021-09-24
US20210201928A1	2021-07-01
CN202210917051A	2022-08-01

Other References:

LI ZHENG, LI HONGYAN: "Two-stage speech enhancement algorithm based on time-frequency mask optimization", ELECTRONIC DESIGN ENGINEERING, SHIJIE ZHISHI CHUBANSHE, CN, vol. 30, no. 4, 20 February 2022 (2022-02-20), CN , pages 17 - 21, XP093136271, ISSN: 1674-6236, DOI: 10.14022/j.issn1674-6236.2022.04.004
HAKAN ERDOGAN, JOHN R. HERSHEY, SHINJI WATANABE, JONATHAN LE ROUX: "Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks", 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 1 April 2015 (2015-04-01) - 24 April 2015 (2015-04-24), pages 708 - 712, XP055315025, ISBN: 978-1-4673-6997-8, DOI: 10.1109/ICASSP.2015.7178061
JAMAL NOREZMI, FUAD N., SHA’BANI MNAH, HELMY ABD WAHAB MOHD, ZULKARNAIN SYED IDRUS SYED: "Binary Time-Frequency Mask for Improved Malay Speech Intelligibility at Low SNR Condition", IOP CONFERENCE SERIES: MATERIALS SCIENCE AND ENGINEERING, INSTITUTE OF PHYSICS PUBLISHING LTD., GB, vol. 917, no. 1, 1 September 2020 (2020-09-01), GB , pages 012049, XP093136268, ISSN: 1757-8981, DOI: 10.1088/1757-899X/917/1/012049

Attorney, Agent or Firm:

CHINA PAT INTELLECTUAL PROPERTY OFFICE (CN)

Download PDF:

View/Download PDF PDF Help

Previous Patent: HOT WIRE CHEMICAL VAPOR DEPOSITION APPARATUS, SILICON-BASED THIN FILM DEPOSITION METHOD AND SOLAR CE...

Next Patent: MEASUREMENT REPORTING METHOD, TERMINAL, AND NETWORK DEVICE