Our collaborative project won the Google GNI Innovation Challenge, securing $300,000 to build and test machine learning-based tools that help newsrooms analyze equity and representation in their work at scale.
Editor’s note: This post was co-written and edited by Michael Krisch, Deputy Director of the Brown Institute; Sarah Schmalbach, Product Director of the Lenfest Local Lab and Ana Graciela Méndez, Special Projects Editor for the Lab.
We are pleased to announce new support from Google’s GNI Innovation Challenge to expand our partnership with The Lenfest Lab and The Philadelphia Inquirer creating open-source content audit and analysis tools for local newsrooms. The tools are aimed at making content audits faster and less expensive, which could speed the advancement of diversity, equity and inclusion (DEI) in local coverage. Our cross-disciplinary partnership between a lab, a local newsroom and a university will leverage each of our organizations’ strengths in product and UX innovation, journalism and data and computation to address shared questions about the relationship between equity, quality, geography and service in local news.
What is a content audit?
Broadly speaking a content audit is the process of systematically analyzing and assessing all of the content you have created. The goal is typically to reveal strengths and weaknesses in your content and discover opportunities to improve. Today many newsrooms are undergoing content audits to assess the equity and representation of past news coverage in order to inform future work, and to gain insights that allow them to better serve, reflect and include communities.
Why are newsrooms undertaking content audits?
Journalism seeks to serve communities through fair reporting. To offer fair reporting, local news organizations have to understand the needs, concerns and makeup of communities in their coverage area to provide useful and impactful reporting. Part of that understanding requires organizations to have actionable information about the communities they exclude, misrepresent or underreport on in their coverage now, or in the past. Content audits can give organizations the context needed to know which communities have been covered, how community life has been characterized and what effect coverage may have had on residents.
Why do newsrooms need better tools for auditing content?
Traditional newsroom content audits have been done manually, meaning they can be costly, time-consuming, and potentially dated from the moment results are available. If not done internally, organizations can hire outside groups that assemble researchers to annotate content, which can include manually highlighting everything from the description of a community to the placement of a photo in a story.
The open-source tools we will develop leverage machine learning (ML) and natural language processing (NLP) technologies to help newsrooms automate the parts of the process that can be automated, allowing them to survey a far broader set of sources, content and practices. This will free up time for the more difficult task of analysis, including assessments of the culture, workflow and decision-making that results in published coverage.
Why speed and flexibility is key to this new wave of content audits
By being able to more efficiently identify issues of equity and representation in coverage through the use of these tools, news organizations will be able to start developing strategies sooner that address problems that are revealed.
It’s also important that any approaches to improving and operationalizing DEI practices be dynamic, able to continually adapt to changes in language and topics of concern among newsrooms and audiences. Our project will be developed with this flexibility in mind, allowing news organizations’ practices to grow alongside their communities.
Background on how our cross-disciplinary partnership started
Over the past year, the Lenfest Institute and the Brown Institute have collaboratively developed an automated approach to identifying and mapping locations found in news stories using a mix of natural language processing (NLP), deep learning and geolocation techniques. On top of that technology we built a proof-of-concept analysis tool for helping newsrooms audit their content and better understand which local communities are reported on and how. The tool provides insights into the geographic equity of coverage and the knowledge to pursue opportunities to fill gaps, fix problems and serve audiences with new products.
We have also partnered with The Philadelphia Inquirer to apply this location analysis prototype in two ways. First we used the underlying location identification model to build and test a new product, which is a page that organizes COVID-19 coverage by the counties mentioned in the stories, allowing readers to quickly find the coronavirus-related stories about the places that matter to them.
We’ve also worked with two Inquirer news desks to conduct early content audits, assessing the geographic representation of real estate and visuals coverage in 2020. More details on those initial audits are included later in this post.
What we plan to do with support from Google
Move beyond location analysis. With this new support from the Google GNI Innovation Challenge, we will begin by fine-tuning our location analysis tool and exploring how to support its use in newsrooms of all sizes and localities, from bigger cities and regions to smaller towns. We will also move beyond location analysis, looking at many more facets of content auditing, including the diversity of sources, image analysis, and which images and stories are published depending on the topic, author, or location of a story.
With this data-informed approach, newsrooms who use these auditing tools can begin to understand how communities are reflected in their coverage by examining how editorial decisions manifest themselves in language, location, sources and visuals in stories. The tools will assist newsrooms assess fairness by uncovering gaps in coverage, be it in a town or neighborhood, or with a specific community, related to gender, race, ethnicity or socio-economic background. They will highlight any topical or other coverage disparities measured relative to population, income, geographic distribution and other demographic benchmarks we develop in collaboration with newsrooms and researchers with expertise in equity and representation. All of these insights should point to opportunities for the newsroom and the business to address.
Keep humans in the loop.
We will also use a human-in-the-loop approach, meaning that these researchers and users within newsrooms will be involved in the training, tuning and testing of the algorithms, and the interpretation and application of the results.
While implementations of machine learning (ML) in newsrooms have been developed for optimizing subscriptions and automatically generating news stories, little has been done to automate tasks that support DEI efforts. Our project will develop an ML-based analysis suite that helps newsrooms reduce the time it takes to perform audits, identify problems faster and expand scope.
Support the transition from one-off audits to continuous accountability.
The primary outcome we hope to achieve is a transition from one-off equity audits to automated computational processes that assist newsrooms in shaping inclusive coverage and products to engage readers. We hope that a secondary outcome is a re-imagination of news products that build on this effort of more inclusive and representative news coverage. We imagine the launches of new products that are direct responses to the insights and data provided by the tools, and we also imagine opportunities to help reporters identify new and better story opportunities, reader-facing products, and business opportunities.
Our work in Philadelphia
Philadelphia is one of many cities in the US where large-scale inequities exist, and where local news coverage would benefit from this type of analysis. According to Philadelphia’s City Council’s 2020 Poverty Action plan, Philadelphia has the “highest overall poverty rate among the nation’s ten largest cities.” Also 2018 Census Bureau data shows that 24.5% of its population lives below the poverty line, nearly double the 13.1% national average. Studies that have looked into the city’s inequality indicator show that Philadelphia falls among the top 10 most unequal cities in the country and in the wake of a second wave of COVID-19, these issues will continue to be exacerbated.
Data points such as these are critical to analyzing the economic, health and general information availability for residents, and can be used to develop local benchmarks for equity and representation. By comparing one county to neighboring counties, or in comparing disparate census tracts, newsrooms have new opportunities to visualize and understand their own coverage.
To date, the mapping prototype we built has already allowed our partners at The Philadelphia Inquirer to analyze coverage and start making observations about its geographic equity.
Two geographic equity assessments at The Inquirer
The Philadelphia Inquirer Built Environment Desk
The Built Environment desk at The Inquirer includes coverage of real estate, architecture and transportation in the Philadelphia region. These types of stories often center around the places mentioned in the text, ensuring that for this first pass, the locations mapped would directly relate to analysis of equity and representation.
Here is a map showing one year’s worth of Built Environment story locations plotted on a map:
After reviewing the map Cynthia Henry, the Built Environment desk editor, shared thoughts about how the tool could inform coverage going forward:
We already have conversations like ‘Hey, we haven’t written about this neighborhood or area in a while. I wonder what’s going on there?’ … [The tool] could push us to question ourselves, expand our coverage, seek a broader range of sources, and lead us to stories that need to be told. — Cynthia Henry
The Philadelphia Inquirer Photo Desk
After hearing about the availability of the tool, Inquirer photographer Tim Tai asked our team to map photo assignment locations to provide insights into the geographic representation of the Inquirer’s visual coverage.
Here is a map showing most of the Inquirer’s 2020 photo assignments plotted out by Philadelphia neighborhood:
After reviewing the results, Tim explained what the map starts to show in terms of clusters and gaps in visual representation:
The graphical interface is really good and starts to show us the neighborhoods and counties we’re taking photos in and which ones we’re not. For example we’ve taken a lot of photos in Camden County, but not really in nearby Burlington County. We cover a lot of stuff along the Main Line, but not a lot in eastern Bucks County. We photograph in South Philly a lot, but much less often in Southwest Philly. The tool helps us start to answer basic questions we have such as: what geographic areas are photographers frequenting? What areas are being ignored? — Tim Tai
What’s next
In 2021 and with this new support, our teams will start seeking out additional newsroom partners, developing a board of technical and academic advisors and creating a research plan for assessing newsroom DEI needs and goals, including community panels that will play a role in goal-setting exercises. At the same time we’ll start building a test set of data and start identifying datasets we can use to develop benchmarks for equity and representation.
Get Connected
If you’re interested in the project and would like to be involved, please contact Michael Krisch and Sarah Schmalbach at mkrisch@columbia.edu or sarah@lenfestinstitute.org.
About our Partners
The Lenfest Local Lab is a multidisciplinary product and user experience innovation team located in Philadelphia supported by The Lenfest Institute for Journalism.
The Lenfest Institute for Journalism is a non-profit organization whose mission is to develop and support sustainable business models for great local journalism. The Institute was founded in 2016 by entrepreneur H.F. (Gerry) Lenfest with the goal of helping transform the news industry in the digital age to ensure high-quality local journalism remains a cornerstone of democracy.
The Philadelphia Inquirer has provided essential journalism to the region since 1829. The for-profit public benefit corporation is owned by the non-profit Lenfest Institute and produces Pulitzer Prize-winning journalism that changes lives and leads to lasting reforms. On multiple platforms — including newspapers, online and live events — The Inquirer reaches a growing audience of more than 10 million people a month.