STREAMING LONG-FORM SPEECH RECOGNITION - MICROSOFT TECHNOLOGY LICENSING LLC

Title:

STREAMING LONG-FORM SPEECH RECOGNITION

Document Type and Number:

WIPO Patent Application WO/2024/082167

Kind Code:

A1

Abstract:

Systems and methods are provided for accessing a factorized neural transducer comprising a first set of layers for predicting blank tokens and a second set of layers for predicting vocabulary tokens. The first set of layers comprises a blank predictor, an encoder, and a joint network and the second set of layers comprising a vocabulary predictor which is a separate predictor from the blank predictor. A context encoder is added to the factorized neural transducer which encodes long-form transcription history for generating a long-form context embedding, such that the factorized neural transducer is further configured to perform long-form automatic speech recognition, at least in part, by using the long-form context embedding to augment a prediction of vocabulary tokens.

Inventors:

WU YU (US)
LI JINYU (US)
LIU SHUJIE (US)
GONG XUN (US)

Application Number:

PCT/CN2022/126111

Publication Date:

April 25, 2024

Filing Date:

October 19, 2022

Export Citation:

Click for automatic bibliography generation Help

Assignee:

MICROSOFT TECHNOLOGY LICENSING LLC (US)
WU YU (CN)
LI JINYU (US)
LIU SHUJIE (CN)
GONG XUN (CN)

International Classes:

G10L15/16

Foreign References:

US20220122586A1

2022-04-21

Other References:

CHEN XIE ET AL: "Factorized Neural Transducer for Efficient Language Model Adaptation", ICASSP 2022 - 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 23 May 2022 (2022-05-23), pages 8132 - 8136, XP034156754, DOI: 10.1109/ICASSP43922.2022.9746908

Attorney, Agent or Firm:

SHANGHAI PATENT & TRADEMARK LAW OFFICE, LLC (CN)

Download PDF:

View/Download PDF PDF Help

Previous Patent: TRANSMISSION ASSEMBLY, ICE MAKING APPARATUS AND REFRIGERATOR

Next Patent: CLOSED LOOP POWER CONTROL FOR SOUNDING REFERENCE SIGNALS