24 Feb 2022

Data-driven Spoken Language Understanding and Visual Question Answering

ABOUT EVENT

Virtual Registration: http://go.hawaii.edu/VvQ

In-Person Registration: https://forms.gle/W8haqtSeP7yC4pjw9

Seminar Abstract:

Spoken language understanding (SLU) and question answering are the most important skills in today’s AI assistants, like Google Assistant, Siri, Alexa or the Samsung’s Bixby. In order to understand complex spoken human language, an SLU system needs to leverage large scale of user data for its learning, and perform multi-task concurrently including domain classification, intent detection and slot tagging. Furthermore, to handle more complex scenarios like visual based question answering (VQA), a system needs to learn from big data of multi-modalities, like image, spoken language and even voice signals.

In the first part of this talk, I would like to introduce a universal multi-model deep learning-based SLU system trained from large scale data, which can perform multiple SLU tasks concurrently, including domain classification, intent detection and slot tagging.

In the second part of this talk, I would like to present a multi-modal VQA system which can answer users’ questions based on a given image by leveraging big data signals from multiple modalities including image, spoken language, etc.

Speaker Bio:

Dr. Yu Wang is a senior researcher at the AI Center at Samsung Research America. Before joining Samsung Research America, He obtained his Ph.D. from Yale University. The main contributions of his doctoral dissertation are the designs of multiple-model neural network structures for boosting the performance of machine learning systems, especially deep learning systems for different AI tasks.

Dr. Wang has a broad set of research interests spanning deep learning, spoken/natural language understanding, multi-modal learning, etc. His most recent work focuses on developing advanced multi-modal machine learning models for AI Chatbots, including spoken language understanding (SLU) systems and multi-modal visual question answering systems, which can understand data signals from different modalities, like image, natural language and even voice signals. His works have achieved state-of-the-art performance on different benchmark AI datasets and are also well recognized by publications in top-tier AI conferences and journals.

Share This Event