Title:
STREAMING LONG-FORM SPEECH RECOGNITION
Document Type and Number:
WIPO Patent Application WO/2024/082167
Kind Code:
A1
Abstract:
Systems and methods are provided for accessing a factorized neural transducer comprising a first set of layers for predicting blank tokens and a second set of layers for predicting vocabulary tokens. The first set of layers comprises a blank predictor, an encoder, and a joint network and the second set of layers comprising a vocabulary predictor which is a separate predictor from the blank predictor. A context encoder is added to the factorized neural transducer which encodes long-form transcription history for generating a long-form context embedding, such that the factorized neural transducer is further configured to perform long-form automatic speech recognition, at least in part, by using the long-form context embedding to augment a prediction of vocabulary tokens.
Inventors:
WU YU (US)
LI JINYU (US)
LIU SHUJIE (US)
GONG XUN (US)
LI JINYU (US)
LIU SHUJIE (US)
GONG XUN (US)
Application Number:
PCT/CN2022/126111
Publication Date:
April 25, 2024
Filing Date:
October 19, 2022
Export Citation:
Assignee:
MICROSOFT TECHNOLOGY LICENSING LLC (US)
WU YU (CN)
LI JINYU (US)
LIU SHUJIE (CN)
GONG XUN (CN)
WU YU (CN)
LI JINYU (US)
LIU SHUJIE (CN)
GONG XUN (CN)
International Classes:
G10L15/16
Foreign References:
US20220122586A1 | 2022-04-21 |
Other References:
CHEN XIE ET AL: "Factorized Neural Transducer for Efficient Language Model Adaptation", ICASSP 2022 - 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 23 May 2022 (2022-05-23), pages 8132 - 8136, XP034156754, DOI: 10.1109/ICASSP43922.2022.9746908
Attorney, Agent or Firm:
SHANGHAI PATENT & TRADEMARK LAW OFFICE, LLC (CN)
Download PDF:
Previous Patent: TRANSMISSION ASSEMBLY, ICE MAKING APPARATUS AND REFRIGERATOR
Next Patent: CLOSED LOOP POWER CONTROL FOR SOUNDING REFERENCE SIGNALS
Next Patent: CLOSED LOOP POWER CONTROL FOR SOUNDING REFERENCE SIGNALS