Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SPEAKING OBJECT DETECTION IN MULTI-HUMAN-MACHINE INTERACTION SCENARIO
Document Type and Number:
WIPO Patent Application WO/2024/032159
Kind Code:
A1
Abstract:
Disclosed are an apparatus and method for speaking object detection in a multi-human-machine interaction scenario. In one example of the method, after video frame data with a timestamp and audio frame data with a timestamp are collected in real time, corresponding information, such as a text semantic feature, a human voice audio feature, and a facial feature of a person, can be obtained by means of speech recognition, text feature extraction, audio feature extraction and facial feature extraction. Then, a speaker at the current moment in a crowd can be recognized on the basis of a first multi-modal feature obtained by means of fusing the facial feature of the person and the human voice audio feature; and a speaking object of the speaker at the current moment in the crowd can also be recognized on the basis of a second multi-modal feature obtained by means of fusing a scenario feature, the text semantic feature, the facial feature of the person and the human voice audio feature, and whether the speaking object is a robot can be determined, so as to effectively improve the performance of the robot during a human-machine interaction process.

Inventors:
WANG WEN (CN)
LIN ZHEYUAN (CN)
WAN MINHONG (CN)
ZHU SHIQIANG (CN)
ZHANG CHUNLONG (CN)
LI TE (CN)
Application Number:
PCT/CN2023/101635
Publication Date:
February 15, 2024
Filing Date:
June 21, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ZHEJIANG LAB (CN)
International Classes:
G06V40/16; G06N3/08; G06V10/80; G06V10/82; G10L17/06; H04N5/92
Foreign References:
CN115376187A2022-11-22
CN114819110A2022-07-29
CN107230476A2017-10-03
CN113408385A2021-09-17
CN114519880A2022-05-20
CN111078010A2020-04-28
Attorney, Agent or Firm:
BEIJING BESTIPR INTELLECTUAL PROPERTY LAW CORPORATION (CN)
Download PDF: