Authors Daniel Y. Fu, Mayee F. Chen, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, Christopher Ré Abstract Weak supervision is a popular method for building machine learning models without relying on ground truth annotations. Instead, it generates probabilistic training labels by estimating the accuracies of multiple noisy labeling sources (e.g., heuristics, crowd workers). Existing approaches
CategoryResearch
Multi-Resolution Weak Supervision for Sequential Data
Authors Frederic Sala, Paroma Varma, Jason Fries, Daniel Y. Fu, Shiori Sagawa, Saelig Khattar, Ashwini Ramamoorthy, Ke Xiao, Kayvon Fatahalian, James R. Priest, Christopher Ré Abstract Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision
Scene Graph Prediction with Limited Labels
Authors Vincent Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Re, Li Fei-Fei Image: Our semi-supervised method automatically generates probabilistic relationship labels to train any scene graph model. Abstract Visual knowledge bases such as Visual Genome power numerous applications in computer vision, like visual question answering and captioning, but suffer from sparse, incomplete relationships. All
Visual Relationships as Functions: Enabling Few-Shot Scene Graph Prediction
Authors Apoorva Dornadula, Austin Narcomey, Ranjay Krishna, Michael Bernstein, Li Fei-Fei Image: We introduce a scene graph approach that formulates predicates as learned functions, which result in an embedding space for objects that is effective for few-shot. Our formulation treats predicates as learned semantic and spatial functions, which are trained within a graph convolution network.
View-Dependent Video Textures for 360° Video
Authors Sean J. Liu, Maneesh Agrawala, Stephen DiVerdi, Aaron Hertzmann Image: In 360◦ video, viewers can look anywhere at any time. In the opening scene of Invasion!, a rabbit emerges from a cave (a). In sequential playback, a viewer looking at the cave (green box) will see the rabbit emerge, whereas a viewer not looking
AI-based Request Augmentation to Increase Crowdsourcing Participation
Authors Junwon Park, Ranjay Krishna, Pranav Khadpe, Li Fei-Fei, Michael Bernstein Abstract To support the massive data requirements of modern supervised machine learning (ML) algorithms, crowdsourcing systems match volunteer contributors to appropriate tasks. Such systems learn “what” types of tasks contributors are interested to complete. In this paper, instead of focusing on “what” to ask,
Video Event Specification using Programmatic Composition
Authors Daniel Y. Fu, Will Crichton, James Hong, Xinwei Yao, Haotian Zhang, Anh Truong, Avanika Narayan, Maneesh Agrawala, Christopher Ré, Kayvon Fatahalian Abstract Many real-world video analysis applications require the ability to identify domain-specific events, such as interviews and commercials in TV news broadcasts, or action sequences in film. Unfortunately, pre-trained models to detect all
Text-based Editing of Talking-head Video
Authors O. Fried, A. Tewari, M. Zollhöfer, A. Finkelstein, E. Shechtman, D. B Goldman, K. Genova, Z. Jin, C. Theobalt and M. Agrawala Image: We propose a novel text-based editing approach for talking-head video. Given an edited transcript, our approach produces a realistic output video in which the dialogue of the speaker has been modified
Puppet Dubbing
Given an audio file and a puppet video, we produce a dubbed result in which the puppet is saying the new audio phrase with proper mouth articulation. Specifically, each syllable of the input audio matches a closed-open-closed mouth sequence in our dubbed result. We present two methods, one semi-automatic appearance-based and one fully automatic audio-based,