With the advent of cheap consumer photography and the rise of ubiquitous imaging devices like street cameras and drones, large video collections are of increasing interest and availability to journalists and academics. Video streaming websites like YouTube and LiveLeak present rich datasets covering both high activity events (protests, conflict zones) as well as more mundane affairs (traffic, C-SPAN). For example, with video streams of a protest, a journalist might ask: how many people attended the protest? When did someone start speaking? What are all the messages seen on the protest signs? Just as existing tools like LexisNexis enable researchers to organize, index, and search large corpora of text, journalists need the same kind of tool to explore video data. Esper is a system that facilitates exploration of large video collections by enabling researchers to easily organize and annotate their videos at scale. Esper is led by Will Crichton at Stanford Computer Science.