UNLOCKING FILM LIBRARIES FOR DISCOVERY & SEARCH
Using cutting-edge tools for object, action, and speech recognition, we will unlock the treasure troves of film/video held by libraries.
Describe your project.
Where the library of the 20th century focused on texts, the 21st century library will be a rich mix of media, fully accessible to library patrons in digital form. Yet the tools that allow people to easily search film and video in the same way that they can search through the full text of a document are still beyond the reach of most libraries. How can we make the rich troves of film/video housed in thousands of libraries searchable and discoverable for the next generation? Dartmouth College’s Media Ecology Project and the Visual Learning Group propose to apply tools already being developed for object, action, and speech recognition to a rich collection of educational films held by Dartmouth Library and the Internet Archive. Using existing algorithms that recognize speech, audio, objects, locations, and actions, we will be able to explain what is happening in a collection of one thousand educational films. We will feed the resulting tags, transcripts and other enriched metadata into our Semantic Annotation Tool (SAT) which will generate annotations (built upon W3C open annotation standards) that can be attached to each film. What was once a roll of film, indexed only by its card catalog description, will now be searchable scene-by-scene, adding immense value for library patrons, scholars and the visually impaired.
Dartmouth College’s Visual Learning Group is already a leader in computer vision and machine learning, developing new tools for object and action recognition. This project brings together three cross-curricular groups at Dartmouth to collaborate on applying modern artificial intelligence and machine learning to historic film collections held by libraries. This project will pull together a variety of human- and machine-generated metadata on a selected set of digitized educational films and combine them into a single, searchable format.
Metadata sources explored by the project include:
-- existing metadata from catalogs
-- audio transcripts based on natural language processing
-- text recognition that reads and makes searchable titles, street signs and captions in films and videos
-- object recognition that applies searchable tags based on items appearing in a film and video
-- action recognition that identifies what is going on in a scene and turns that action into searchable text
By improving the cutting edge algorithms used to create time-coded subject-tags (e.g. http://vlg.cs.dartmouth.edu/c3d/), we aim to lay the foundation for a fully-searchable visual encyclopedia and to share our methods and open source code with film archives everywhere. Our goal is to unlock the rich troves of film held by libraries and make them findable and more useable—scene by scene, and frame by frame--so future generations can discover new layers of meaning and impact.
How does this project advance the library field?
Today, most public, university, and special collection libraries have public domain documentary films that are often digitized. But try to explore them and you’ll find only the basics: title, subject, synopsis. Libraries have made great strides in unlocking texts through optical character recognition; they are opening up audio items with voice-to-text transcription; but so far, libraries have not found ways to unlock moving images and annotate them at scale. As noted by the American Library Association, “Digital technologies have made it possible for almost anyone to create and share visual media. Yet the pervasiveness of images and visual media does not necessarily mean that individuals are able to critically view, use, and produce visual content.” Harnessing machine learning and metadata can greatly aid in making this possible for the library users of today.
Based on Dartmouth’s research in machine learning for image recognition and the ongoing development of our Semantic Annotation Tool (SAT) for film and video, we propose to construct dynamic, scalable annotation formats that will vastly improve searchability, arbitrary scene access and topic/location/action reference for a vast corpus of educational films that the Internet Archive is currently digitizing and making widely available to the public. We will make this digital tool and workflow a shareable resource: the code will be open source and available for adoption by other libraries and universities.
This project also advances one of the core missions of libraries: enhancing patrons’ visual literacy. Our visual culture is one of the most important aspects of society--as foundational to knowledge as text. We believe this project has the potential to place libraries at the forefront of visual literacy, opening up a new era of visual studies. At a time when humans are bombarded with more and more digital imagery, it has never been more important to develop the tools and scholarship that provide context and references for what we are seeing.
Who is the audience and what are their information needs?
This project aims to improve directly the search and browsing of educational films and video for the millions of users accessing digitized films and born-digital material on the Internet Archive and Dartmouth’s online audiovisual collections. The Internet Archive serves some 2-3 million unique visitors each day.
It will also provide open source tools for librarians and the public to apply to their own film content, making it possible to find and share well-curated, fully annotated digital videos.
Making video accessible in a more granular, data-rich fashion will also greatly benefit users with special needs. The Semantic Annotation Tool is being developed with accessibility in mind, but good accessibility requires deep and broad metadata that is rarely available for video. The algorithmic tools developed for this project are a perfect source for automatically generating the metadata needed to support media accessibility at large scales. Film and videos treated in this way can become a vital resource for the print disabled community—those with dyslexia, impaired vision and blindness. According to statistics on the National Federation of the Blind website, in 2012 the number of Americans between 16 and 75+ who reported a visual disability totaled 6,670,300; some 659,000 of them were age 4 to 20 years old.
Please list your team members and their qualifications.
Lorenzo Torresani is Associate Professor of Computer Science and Director of the Visual Learning Group at Dartmouth College. His research interests are in computer vision and machine learning. His current work is primarily focused on learning representations for image and video recognition. For his work he has received a National Science Foundation CAREER Award, and a Google Faculty Research Award.
Mark Williams is Associate Professor of Film and Media at Dartmouth College. He heads the Media Ecology Project for which he received an award for Scholarly Innovation and Advancement at Dartmouth and an NEH Tier 1 Research and Development grant to build the Semantic Annotation Tool (SAT). With Michael Casey, he received an NEH Digital Humanities Start-Up Grant to build the ACTION toolset for film analysis.
John Bell is a Lead Application Developer in Dartmouth’s Information Technology Services department, where he is the architect for the Media Ecology Project and the College’s institutional repository, the Dartmouth Academic Commons. His previous work includes contributions to Scalar, a semantic web publishing platform from the Alliance for Networking Visual Culture, and The Variable Media Questionnaire, an online system used to preserve ephemeral art. In addition, he is Assistant Professor of Digital Curation at the University of Maine and Senior Researcher at the Still Water Network Art and Culture Lab.
Dimitrios Latsis is a film archivist and curator and currently the CLIR-Mellon Postdoctoral Fellow in Visual Data Curation at the Internet Archive in San Francisco, CA. He received his Ph.D. in Film Studies from the University of Iowa in 2015 and he has been a fellow at the Smithsonian Institution. Latsis’ work in visual culture, film and art history has appeared in numerous journals. He is currently working on digital archiving, metadata curation and educational partnerships for the Internet Archive's collection of educational films.
Nicholas Giudice is currently an Associate Professor of Spatial Informatics in the University of Maine’s School of Computing and Information Science, with joint appointments in UMaine’s National Center for Geographic Information and Analysis (NCGIA), the Department of Psychology, and the Intermedia program. He is the director of the Virtual Environments and Multimodal Interaction (VEMI) laboratory, where he uses a combination of behavioral experiments and usability studies to investigate human spatial cognition with and without vision, to determine the optimal information requirements for the design of multimodal interfaces, and as a testbed for evaluation and usability research for navigational and information access technologies for blind, low-vision, and ‘situationally blind’ (e.g., texting while walking) users. Dr. Giudice is himself congenitally blind and has a long history of both designing and using assistive technology. Dr. Giudice is a current board member on three blindness-related organizations.
Organization name and location (City, State).
Dartmouth College in Hanover, New Hampshire.
What are the obstacles to implementing your idea, and how will you address them?
In many ways this is an integration project that brings together several recently developed technologies to have a major impact on the way audiovisual materials are organized and accessed online. The good news is that many of the biggest obstacles to completion–basic software for machine vision, data storage, and sharing–have already been addressed. Another significant obstacle, the potential for rights conflicts surrounding video, has also been addressed in the project design by drawing our source videos from non-theatrical and orphaned video collections.
The integration challenge falls in two areas: integration of methodology, and integration of technology. Methodologically, the development of a machine learning system that integrates object, scene, action, and speech recognition to perform robust automatic tagging of video would be breaking new ground. While each of these recognition areas has witnessed dramatic improvements over the last few years, these methods have been developed in a disjointed fashion by the research community rather than coherently, as part of a complete system. Furthermore, components have been tested in lab-controlled conditions on simplified datasets. Taking part in the Knight News Challenge allows us to take that lab-developed software and apply it to a public good: new knowledge and discovery tools for public film and video archives. Working with the broader set of “real world” materials available through the Internet Archive and combining human and machine annotation are the best ways to test iteratively and refine a unified annotation algorithm.
The largest remaining obstacle to implementation is interoperability: getting everyone on the same technological page. Though the software components needed already exist as stand-alone projects, they need to be modified to fit our specific goals and to communicate with each other. The solution to this challenge lies in the design of the overall system as a set of unique but connected technologies; each individual component is designed to be good at one specific thing, and only the data gets shared between them. If the machine vision component needs to be better at recognizing certain actions, then the team at Dartmouth can make those changes drawing on the interface work done at UMaine and the metadata work done at Internet Archive. It is only critical that the metadata model be made consistent, and we have high level metadata experts who will expand upon existing standards to ensure its quality.
Finally, it will be critical to investigate how this new tool can “scale” to larger collections of library-curated and user-contributed content. During the last phase of this project, we plan to test our tools against a much larger set of Internet Archive videos and refine them into an open-access solution that can be readily applied by libraries to their own digital content to make it more searchable and accessible.
How will you spread the word about your project? Who are you trying to reach?
Information pertaining to this project can be quickly disseminated using social media and academic forums as applicable, as well as by enlisting the tremendous megaphone embodied in our project partner, The Internet Archive. Film Curator Dimitrios Latsis, and Mark Williams of Dartmouth’s Media Ecology Project (MEP) will conduct outreach to their peers in the academic library world, where the largest repositories of digitally accessible film are well known and documented. We will work with larger associations such as the National FIlm Preservation Foundation to hold trainings.
The machine vision research developed for this project will be disseminated to the scientific community by means of journal publications and conference presentations and to the library community at venues such as AMIA, ALA, SAA, DLF and CNI and the Internet Archive’s Library Leaders Forum. All software and models will be made publicly available for non-commercial applications to video archives and scholars. Furthermore, a part of the film library from our project partner (Internet Archive) will be used to define a machine vision benchmark and a series of recognition grand challenges that will further promote research advances in automatic tagging of real-world video.
The media archive that will be produced via this project will not only make video accessible to thousands of researchers, but will also create a new avenue for blind and low-vision [BLV] people to access information that would otherwise be hidden to them. The target audiences are libraries, universities, corporate media, and accessibility champions. The Internet Archive coordinates with the National Library Service for the Blind and Physically Handicapped who would help disseminate information about these new discovery aids.
Today, universities and colleges are required by law to make every educational tool fully accessible to any student. With the proliferation of online courses that are being taught through web-based video, a tool via which BLV students can access this information accurately and efficiently is not only necessary, but is also legally required. Widespread implementation of this annotation tool could, in the future, save universities, companies, and organizations millions in litigation from disability lawsuits.
How much do you think your project will cost, and what are the major expenses?
Our total budget for this project is $350,000. That cost, which is almost entirely dedicated to project-specific software or metadata development salaries, will be split among the three institutions and collaborators as follows:
• Dartmouth College: $195,000 for the Visual Learning Group and Media Ecology Project, which will fund graduate and undergraduate student researchers, summer salary for the two primary investigators, and specialized hardware required for video analysis.
• Internet Archive: $75,000 to support collections curation and metadata development, as well as the incorporation of annotations into the Internet Archive’s storage system.
• University of Maine: $80,000 for the Virtual Environment and Multimodal Interaction Lab, which will fund undergraduate and graduate student developer/researchers who will extend the Semantic Annotation Tool’s interface and data sharing capabilities.