Tightly Connecting Vision and Language

Thursday, December 12, 2019, 11:00 am - 12:00 pm PDTiCal
CR# 689
This event is open to the public.
NL Seminar
Soravit (Beer) Changpinyo -Google AI
Video Recording:

Abstract: Remarkable progress has been made at the intersection of vision and language. While showing great promise, current vision and language models do not function well in the wild. In this talk, I will present our recent efforts aiming to bridge this gap for the tasks of image captioning and visual question answering. I will first describe several practical limitations of current benchmarks as a yardstick for grounded language understanding and visual reasoning. Then, I will describe our simple approach to transfer learning, where we leverage large-scale ultrafine-grained data as a means to address the long tail of language. Finally, given these results, I will outline future directions and survey a variety of on-going work along the line of making vision and language research useful.

Bio: Soravit (Beer) Changpinyo is a Software Engineer at Google AI. His research interests are in machine learning with applications to computer vision and natural language processing. Prior to joining Google, he was a PhD candidate and an Annenberg Fellow at the University of Southern California, advised by Fei Sha.

« Return to Upcoming Events