SPEAKING OBJECT DETECTION IN MULTI-HUMAN-MACHINE INTERACTION SCENARIO

Title:

SPEAKING OBJECT DETECTION IN MULTI-HUMAN-MACHINE INTERACTION SCENARIO

Document Type and Number:

WIPO Patent Application WO/2024/032159

Kind Code:

A1

Abstract:

Disclosed are an apparatus and method for speaking object detection in a multi-human-machine interaction scenario. In one example of the method, after video frame data with a timestamp and audio frame data with a timestamp are collected in real time, corresponding information, such as a text semantic feature, a human voice audio feature, and a facial feature of a person, can be obtained by means of speech recognition, text feature extraction, audio feature extraction and facial feature extraction. Then, a speaker at the current moment in a crowd can be recognized on the basis of a first multi-modal feature obtained by means of fusing the facial feature of the person and the human voice audio feature; and a speaking object of the speaker at the current moment in the crowd can also be recognized on the basis of a second multi-modal feature obtained by means of fusing a scenario feature, the text semantic feature, the facial feature of the person and the human voice audio feature, and whether the speaking object is a robot can be determined, so as to effectively improve the performance of the robot during a human-machine interaction process.

More Like This:

WO/2023/046344	METHOD FOR IMAGING A USER'S BODY PART, ANTI-REPLAY METHOD FOR A MOVEMENT OF A BODY PART AND ASSOCIATED COMPUTER PROGRAM
JP7336681	Face authentication system and face authentication method
WO/2023/079367	SYSTEM AND METHOD FOR DIGITAL FINGERPRINTING OF MEDIA CONTENT

Inventors:

WANG WEN (CN)
LIN ZHEYUAN (CN)
WAN MINHONG (CN)
ZHU SHIQIANG (CN)
ZHANG CHUNLONG (CN)
LI TE (CN)

Application Number:

PCT/CN2023/101635

Publication Date:

February 15, 2024

Filing Date:

June 21, 2023

Export Citation:

Click for automatic bibliography generation Help

Assignee:

ZHEJIANG LAB (CN)

International Classes:

G06V40/16; G06N3/08; G06V10/80; G06V10/82; G10L17/06; H04N5/92

Foreign References:

CN115376187A	2022-11-22
CN114819110A	2022-07-29
CN107230476A	2017-10-03
CN113408385A	2021-09-17
CN114519880A	2022-05-20
CN111078010A	2020-04-28

Attorney, Agent or Firm:

BEIJING BESTIPR INTELLECTUAL PROPERTY LAW CORPORATION (CN)

Download PDF:

View/Download PDF PDF Help

Previous Patent: DEEP IMAGE WATERMARKING METHOD BASED ON MIXED FREQUENCY-DOMAIN CHANNEL ATTENTION

Next Patent: METHOD AND APPARATUS FOR PERFORMING MPCVD ON INNER SURFACE OF TUBULAR MATERIAL