RoughCut (2016-17)

Roughcut seeks to develop a system to support the process of making a rough cut. Some of the primary challenges of generating a rough cut include planning, capturing, and annotating footage, finding relevant video clips, and pairing audio and visual content. We are taking initial steps toward building a system to address these challenges and make the process of producing a rough cut easier and more time efficient. In our project we will support video teams by encouraging the camera crew to capture new perspectives. Having multiple viewpoints enables editors to have more flexibility in how they construct the narrative. We plan to explore new ways to interact with drones as professional journalistic tools.


Producing video content has become one of the primary responsibilities of news organizations. Many traditional news organizations, such as The New York Times, are increasing the size of their video production teams to meet the world’s growing demand for audio visual media. But video production is a complex, slow task. The quality of the final edited video relies on having good footage, which is characterized by having adequate coverage, i.e., multiple camera angles, and relevant audio recordings that the editor can use to craft the narrative. One way of extending the number of shots we produce is through the use of drones. Drones enable the capture of additional shot types, such as aerial cutaways, that the editor can incorporate into the video sequence.

An editor must have access to good footage to build an initial video sequence, called a rough cut. This is quite labor intensive. Editing a single minute of a rough cut can take even a skilled editor around 90 minutes. If there are large amounts of footage, it can take an editor several days to organize the content. This leaves journalists little time to make the important artistic and stylistic decisions that contribute to expressive storytelling.

Once editors find the video segments of interest, they annotate the shots to make it easier to construct a video sequence. Labels editors add might include the speaker in the shot and the shot type, such as establishing, medium or close-up shots. Our system will use computer vision face detection techniques to identify the speaker and characterize the shot type automatically. We will also allow the editor to talk over the video to provide annotations. Having videos labeled will help reduce the amount of time it takes editors to find the raw footage they need to construct the narrative.

Through this project, we will make the process of capturing footage simpler and the process of editing videos easier and more efficient.

Established in 2012, the Institute is a collaboration between Columbia and Stanford Universities. Our mission is simple: Sponsor thinking, building and speculating on how stories are discovered and told in a networked, digitized world.
Join our mail list.
See us on YouTube.


Brown News