hide You are viewing an archived web page, collected at the request of Jillian using Archive-It. This page was captured on 17:26:58 Nov 08, 2016, and is part of the News Challenge collection. The information on this web page may be out of date. See All versions of this archived page. Loading media information

Indigenous Digital Archive: Opensource tools for creating effective access to & collaboration with mass-digitized archival documents

OCR can only take you so far with some of the most important documents shaping the 19th & 20th century; we'll build collaborative tools.

Photo of Anna Naruta-Moya
3 6

Written by

Describe your project.

To build tools enabling efficient access to and collaboration with mass digitized archival documents and photos, the Indigenous Digital Archive project will create an opensource toolkit for the popular opensource online content system Omeka. Our toolkit for online digital engagement and collaboration builds on existing international standardization in application programming interface standards (APIs) for image interoperability (http://IIIF.io) and the Open Annotation format. Our software developer, Digirati, has laid important groundwork in this area in building interoperable viewers for the British Library and others (the Universal Viewer) and in computer-aided tagging (http://digirati.com/powertagging).

For our use case, our initial subject is 19th and 20th century public government documents that due to typescript quality or inclusion of handwriting or photos are highly resistant to computer recognition (OCR). The first focus is records of US government Indian boarding schools. We will draw on new work in user interface design, particularly the intriguing explorations in building user friendly “generous interfaces” full of effective visualizations and informative leads for users.

In addition to other crowdsourcing volunteers, a cohort of members of New Mexico’s 23 tribes will inform interface design and conduct sustained user testing over a year of collaborative work online and in person.

Having identified open public records related to Native land claims and the government boarding schools of the late 19th to early 20th century as both a priority for and otherwise unavailable to Native communities in our region, we will acquire digital images of at least 140,000 pages (140 reels) of previously microfilmed records held by the US National Archives, ingest them into a hosted server instance of Omeka, apply the new opensource toolkit layer, and make them available through a new online environment characterized by rich interaction and collaboration tools and an enhanced user experience made possible by using generous interface design principles to replace the standard search box.

We also have a commitment to work with the Digital Public Library of America to increase discoverablity through http://dp.la and improve the user experience through DPLA working with our International Image Interoperability Framework (IIIF) endpoints, helping develop this capacity of DPLA.

How does this project advance the library field?

There is a need to be able to access large quantities of archival documents, such as government records, without waiting for the bottleneck of a staffmember performing detailed tasks like indexing. Indeed, given scarce resources, this will often never happen for many important information-bearing records. Additionally, there is interest in repositories sharing with researchers the authority of describing the records, and drawing on that expertise.

The bulk of records created in the 1800s and into the 1900s, a time when many government institutions were taking shape and having powerful impacts, are highly resistant to automated Optical Character Recognition (OCR). Whether due to irregular typescript, handwriting, or other characteristics, this means they can't be effectively OCR'd and keyword searched.

Crowdsourcing has already become one approach to improving access to data, or creating machine-readable versions of analog data. Current interfaces are designed to collect a limited range of structured data according to a particular research design, and are becoming increasingly sophisticated at directing the workflow. (e.g. a “line-at-a-time, queue-oriented, multi-track transcription workflow.”) However, these are not tools for exploring collections beyond a specific kind of encounter. Sometimes the narrow task orientation and gamification of some interfaces might mean that a user is presented with a single image of text that is completely interesting to them, only to see it whisked away after they've completed a transcription task, with no clue of where it came from, or ability to see the whole item. In these applications, you're often creating data for someone else, rather than yourself and your peers.

Tools are still needed to enable work with mass digitized documents. For mass digitized archival documents, access needs are not always met by transcription. This is not only because full transcription or OCR correction is usually much more time consuming than selecting a tag (a name, event, concept, or place) that would be meaningful for someone looking for the content, but also because often times what would be used as a keyword does not actually appear in that text. (For example, a derivative, alternate, or misspelled form of a name is used, or what would receive a keyword tag of “boarding school deaths” appears in euphemistic language.)

The need to create online access to allow collections to reach a wider group of users means that repositories do continue to look to mass digitization as part of their strategies. This project would create a toolkit layer based on international interoperability standards that allows online collaboration to provide navigation points, keyword tagging, and annotation. IIIF and Open Annotation standards mean the results will be extendable to other and future systems, and best situated for long term digital preservation.

Who is the audience and what are their information needs?

The use case of the IDA project will address the absence of access to open public government records relating to the build up to and operations of the US government boarding and day schools for Indians in the period of the Indian Wars up to the reforms of the “Indian New Deal” in the 1930s, and records related to tribal land claims in the same period. These records are not currently available within New Mexico, where they were created. The information is sought by Indigenous peoples of New Mexico and others. To take a small example, the New Mexico State Coordinator of Tribal Libraries, who often receives reference requests related to information the documents the project will make available, notes that now having even just a pilot project of documents of student names online (http://native-docs.org) fills a need no one has been able to respond to before in connecting people affected by these government policies generations onward with the information they’re seeking. Online access is essential as few can afford to take time off, travel, and support research during business hours at a repository. A recent court case shows the need for information from public docs of this era: a federal court ruled in favor of an Oklahoma tribe's claims for overdue payments from the federal government dating back to 1932 -- but their attorney noted they had to spend 6 years getting the documents from the US National Archives. New tools are needed to prevent this kind of bar to access to information.

The audience for our use case includes members of the 23 tribes of New Mexico plus Hopi (geographically separate but culturally and genealogically related), and other descendants of boarding school students separated from their home communities and sent to boarding schools in New Mexico.

Part of the design of the government Indian boarding school was to widely disperse and mix students to achieve fuller separation from their communities. This means, for example, that viewing the records from the boarding schools in New Mexico does not show all students from New Mexico, so a longer range goal is to have access in the Indigenous Digital Archive interface to the records of the government Indian boarding and day schools from across the US, as well as other government records in this period before government policy was changed to Indian Self Determination. The audience for our use case will include descendants of Indian boarding school attendees, staff, and other community members throughout the nation.

Creating effective access locally is particularly important at this time as this is a window of opportunity where tribal people in New Mexico have the benefit of understanding the records with the input of those who are elders today who were young children at the time of the creation of the later records, and others who still have first-hand stories from their parents or grandparents in the 1920s-1930s and even earlier.

Please list your team members and their qualifications.

George Oates (Good, Form & Spectacle), User Interface Design. George designed Flickr! George is a world leader in developing generous interface design. She's developing innovative user experiences in exploring data for the British Museum, the Wellcome Library, and has consulted for numerous cultural institutions including the Smithsonian and Historypin.

Tom Crane, Adam Christie, Edward Silverton, Software Engineering (Digirati). Digirati Tech Lead Tom Crane has worked on large projects for Microsoft, Sony, Oxford University Press, English Heritage, the Wellcome Library and many others, focusing on web publishing and content management. He shows how creative systems integration can be used to connect cultural heritage collection data, digitization output and content management systems, using linked data and semantic web technologies. He is an editor of the international IIIF specification. Adam and Edward, Senior Consultants, have developed apps and systems for clients including the British Library, Wellcome Trust, and Sun Microsystems. Digirati has debuted a new tool assisting management of computer-aided keyword tagging (http://digirati.com/powertagging).

Dr. Anna Naruta-Moya, Project Director. Formerly archivist for the US National Archives and for the Hoover Institution Archives of Stanford University, she has experience in the paper version of “big data.” A 2015 Getty Institute Summer Fellow, UC Berkeley PhD, member of the Society of American Archivists Archival Standards Committee, Academy of Certified Archivists, SAA Digital Archives Specialist, Research Associate Prof Univ. of New Mexico. She is married to the Tewa artist Daniel Moya (P’o Suwae Ge Owingeh), raised on the reservation by his grandmother, a 2nd generation early gov't Indian boarding school student (of "the Starving Years").

Caren Gala (Nambe), Communications Coordinator. Caren has played major roles in planning, organizing, and executing major events such as the Southwestern Association for Indian Arts Santa Fe Indian Market and the International Folk Art Market.

Dr. Robert Sanderson, Technical Advisory Panel member, is Information Standards Architect for Stanford University Digital Library Systems and Services, and an editor of the IIIF and Open Annotation international standards.

Glen Robson, Technical Advisory Panel member, is Head of Systems Unit for the National Library of Wales; adopter of IIIF and the Open Annotation W3C formats. This has allowed marked advances in usability of collections, allowing, for example, better interaction with digitized newspapers (http://newspapers.library.wales), and Cynefin: Mapping Wales' Sense of Place project (http://cynefin.archiveswales.org.uk), in which people volunteer to transcribe and geolocate entries in church tithing records to create a map and database that speaks to detailed land use and community histories.

Advisory Panel members and qualifications are detailed in the attachment “Advisory Panel”.

Organization name and location (City, State).

The Museum of Indian Arts and Culture (of the State of New Mexico), Santa Fe, New Mexico, leads the IDA project in collaboration with the Indian Pueblo Cultural Center (jointly operated by all 19 Pueblo tribes) and the State Library Tribal Libraries Program.
View more

Attachments (4)

User Stories.pdf

User stories for our use case of the Indigenous Digital Archive toolkit. Our use case will create access to open public records related to the government Indian Boarding schools and to land in the key period from the end of the Indian Wars in the 1800s into the brief period of reforms under the Indian New Deal, 1930s. And the toolkit we'll make will extend the popular opensource software Omeka to create more effective access to and collaboration with any kind of archival or print collections.

What's possible in User Interface.pdf

A list of what will be possible in the user interface helped scope our open software tool layer design. Also: Performance Goals and Intended Results.


Slides from a talk on the documents, the people affected, and the technology we’ll use. Quite a few images from the statement of problem and context of people affected. Our tech solutions rely on interoperability among systems and international standards like Open Annotation. No silos for us!

Advisory Panel.pdf

Thank you to the members of our highly skilled Advisory Panel! Read more about this great group here.


Join the conversation:

Photo of Olivier

Dear Anna,
I really like your project. I was wondering if you knew about this similar initiative from our friends down under: http://mukurtu.org/
Their project is an open source CMS and seems to stem from a traditional knowledge perspective while yours, historical archives... but both are open source.

Photo of Anna

Hi Oliver,

Thank you for your support and encouragement!

Good eye! I've long been a follower of Mukurtu, the opensource CMS project of Kimberly Christen Withey at Washington State University that stemmed originally from her dissertation fieldwork with Australian Aborigines. Mukurtu has built an effective system for gating and providing access to online material depending on one's tribal affiliation, clan, gender, age group, and other considerations. So that, for example, the online system could host images of objects and photographs from throughout a tribal museum's collections, and someone from, say, that group's Southern moiety viewing items in the system would not run the risk of seeing something restricted to people of the Northern moiety. Just to take one aspect of a potential viewer's identity; and Mukurtu handles multiples aspects at once. Access can even be gated according to the season of the year, where that's relevant to the type of material. And now the project is also making available a great round of training for staff of tribal archives, museums, and libraries on stewardship of their digital heritage assets. https://www.imls.gov/node/60302/

I'm also really excited by a project the University of Massachusetts Amherst is planning with Mukurtu, spurred by the donation to Amherst of more than 1500 books by Native American authors from 1772 onward, to create a platform for culturally-appropriate gated access to Native-authored books and archival collections. (The Digital Atlas of Native American Intellectual Traditions (DANAIT), http://is.gd/DANAITplanning.) Like our Indigenous Digital Archive project, the DANAIT project will also work with the Digital Public Library of America http://dp.la. And the great news is that the DPLA provides a means to search content across the portals of different institutions and projects, so that once you've arrived at the DPLA, you don't necessarily already have to know about a particular project to find content of interest to you.

Different than the current Mukurtu and DANAIT projects is that the focus of the Indigenous Digital Archive is on open public records. (Although we will have an option for redaction, per our advisory board, in the event that there's content that warrants it.) We're working to address the problem of creating access to masses of government documents recording historical actions involving Native individuals, families, and communities. (No one should have to spend six years getting documents from the US National Archives!) Part of the strategy for this is to take advantage of the international standards that have evolved in the last 5 years for the International Image Interoperability Framework (IIIF) and Open Annotation. The data modeling behind these standards has set the stage for and spurred work towards a toolkit layer allowing tagging or otherwise annotating, for example, a particular section of a page in a long sequence of digitized images, in an internationally accepted format that transcends a particular software system. This is perfect for addressing the issue of how to meaningfully navigate and collaborate with mass-digitized documents, like the lengthy sets of bureaucratic documents that came to typify 19th and 20th century federal government operations. We haven't yet had a toolkit that let you annotate or tag at any finer grain than the entire digital object (such as a whole pdf or sound file) or the catalog record.

We explored a bit the idea of using Mukurtu as the base CMS system. Mukurtu has just undergone a major upgrade, but hasn't yet incorporated IIIF compatibility, which we totally need. (But who knows – given the fact that the DPLA is interested in encouraging IIIF compatibility to improve the user experience, perhaps IIIF compatibility for Mukurtu will be one of the technical action items that emerges from the DANAIT planning process! And then the toolkit layer our project will develop would also be available to the Mukurtu system.)

The widely-adopted opensource CMS Omeka, on the other hand, has now already developed IIIF compatibility for its Omeka-S improved code, so that makes it an efficient starting point for us. Like Omeka, we're using IIIF for a whole host of reasons beyond the tagging capabilities, such as “future-proofing” our digital images, enabling easy quoting and sharing of images, and, as Cogapp's Andy Cummins puts it, many “advanced features for free” http://mw2016.museumsandtheweb.com/paper/iiif-unshackle-your-images/.

Possibly more detail than you wanted to know! :-) Thank you very much for taking a look at our project and for your thoughtful comments. It's good to get the feedback.

Best regards,

Photo of Olivier

Thank you Anna for these details!