Visual Relationships as Functions: Enabling Few-Shot Scene Graph Prediction

Authors Apoorva Dornadula, Austin Narcomey, Ranjay Krishna, Michael Bernstein, Li Fei-Fei Image: We introduce a scene graph approach that formulates predicates as learned functions, which result in an embedding space for objects that is effective for few-shot. Our formulation treats predicates as learned semantic and spatial functions, which are trained within a graph convolution network.

Read More

Scene Graph Prediction with Limited Labels

Authors Vincent Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Re, Li Fei-Fei Image: Our semi-supervised method automatically generates probabilistic relationship labels to train any scene graph model. Abstract Visual knowledge bases such as Visual Genome power numerous applications in computer vision, like visual question answering and captioning, but suffer from sparse, incomplete relationships. All

Read More

AI-based Request Augmentation to Increase Crowdsourcing Participation

Authors Junwon Park, Ranjay Krishna, Pranav Khadpe, Li Fei-Fei, Michael Bernstein Abstract To support the massive data requirements of modern supervised machine learning (ML) algorithms, crowdsourcing systems match volunteer contributors to appropriate tasks. Such systems learn “what” types of tasks contributors are interested to complete. In this paper, instead of focusing on “what” to ask,

Read MoreRead More

Puppet Dubbing

Given an audio file and a puppet video, we produce a dubbed result in which the puppet is saying the new audio phrase with proper mouth articulation. Specifically, each syllable of the input audio matches a closed-open-closed mouth sequence in our dubbed result. We present two methods, one semi-automatic appearance-based and one fully automatic audio-based,

Read More

Information Maximizing Visual Question Generation

Example questions generated for a set of images and answer categories. Incorrect questions are shown in grey and occur when no relevant question van be generated for a given image and answer category Authors Ranjay Krishna, Michael Bernstein, Li Fei-Fei Abstract Though image-to-sequence generation models have become overwhelmingly popular in human-computer communications, they suffer from

Read More

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Accuracy

A selection of individual workers’ accuracy over time during the question answering task. Each worker remains relatively constant throughout his or her entire lifetime. Authors Kenji Hata, Ranjay Krishna, Li Fei-Fei, Michael Bernstein Abstract Microtask crowdsourcing is increasingly critical to the creation of extremely large datasets. As a result, crowd workers spend weeks or months

Read More

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

An overview of the data needed to move from perceptual awareness to cognitive understanding of images. We present a dataset of images densely annotated with numerous region descriptions, objects, attributes, and relationships. Region descriptions (e.g. “girl feeding large elephant” and “a man taking a picture behind girl”) are shown (top). The objects (e.g. elephant), attributes

Read More