Skip to Main Content

AM Impact: Digital Scholarship

A guide to the content and functionality in your AM Impact plan, with tips and tricks for searching and access to a range of support tools.

A close up of old, yellowed photographs arranged messily over each other on a table and an ink pen on top of them.From Indigenous journalism to classified government files, each AM collection contains rich text and data, including transcriptions and metadata. AM datasets have contributed to large-scale, multi-source data mining projects as well as serving as concise single datasets for individual research projects. This data is available to all researchers and librarians at AM Impact institutions. This page provides guidance on how to make a data request, and information on AM's support of Digital Humanities projects.

What data is available?

Employing artificial intelligence, machine learning and neural networks, Handwritten Text Recognition allows keyword searching across handwritten manuscript material. Across all primary source collections published on the AM Quartex platform, HTR transcipt technology also allows the downloading of uncorrected transcripts in .txt format.

Collections such as East India Company and Colonial Caribbean have millions of handwritten pages that could be used for text and data mining; other collections, such as Literary Print Culture and Life at Sea, have hundreds of thousands of pages. Mexico in History is a landmark collection in being the first AM database almost entirely in Spanish with HTRT applied to it. 

A handwritten page next to an uncleaned text file transcript

A11's response to 1983 Summer directive. Digitised from University of Sussex Special Collections in Mass Observation Project.

Optical character recognition (OCR) is a technology that changes printed and typed documents into machine-readable files. Collections like the ones contained in AM Archives Direct - a unique digitisation of key Foreign Office and other British government file classes from The National Archives in London - contains millions of pages of typed and printed dispatches, memoranda, faxes, telegrams and newspaper articles from around the world. Other collections, such as Interwar Culture and Indigenous Newspapers in North America, contain newspapers and periodicals that also benefit from OCR technology and transcripts that can be used for text and data mining projects.

A typed telegram next to a uncleaned text file transcript

Case detailing alleged crimes by Nelson Mandela, African National Congress (ANC) leader, imprisoned in South Africa, Winnie Mandela, his wife, and Zindzi Mandela, his daughter, 1990. Digitized from The National Archives, UK, in Apartheid South Africa, 1948-1994.

The first audio descriptions published by AM were for silent film collections like Victorians on Film and British Newsreels, but since the publication of Hindi Cinema in 2024, audio descriptions has become possible for non-silent films too, with descriptive audio playing in the gaps of the original audio track.

Audio descriptions aim to provide a clear summary of on-screen activity, so visually impaired users can interrogate the videos fully. However, these descriptions also provide additional transcripts and textual data that can be requested for use for text and data mining projects.

A still from an old film showing a young man magician in a suit about to lift a cone to reveal something underneath it, next to a transcript that has highlighted the words "kitten"

The Magic Extinguisher, 1901. Digitized from The British Film Institute in Victorians on Film.

Each AM collection contains metadata: the data that describes the source in the catalogue from the archive where the physical copy of the source is held. In addition to the archive's own metadata, the Editorial team at AM also enriches the metadata in consultation with an expert academic board to aid discovery of material.

Common metadata categories include author, title, date, document type, place, language and archive collection and sub-collection names. Often individuals and organisations are also captured in the metadata. 

An image of a crowd gathered around a tree in Africa, with an image of the metadata associated with the image next to it.

[A crowd listening], Photographs, n.d. Digitised from Bodleian Libraries in Africa and the New Imperialism: European Borders on the African Continent, 1870-1914.

All primary source collections published by AM include a number of research and teaching tools, from guides to archival collections to academic essays and digital exhibitions. In addition, a number of AM collections contain visualisations of archival data, from interactive maps of sea journeys and commodity trade to a visualisation of more than 5,000 manuscript items from The Florence Nightingale Papers, revealing developing themes in her correspondence over time in Medical Services and Warfare.

Eighteenth Century Drama database includes an open-access feature called The London Stage Database, extracting textual data from playbills, newspapers, theatrical diaries and more. It serves as a master directory of actors, plays, theatres and more in London between 1660-1800. The database is an analysis tool to illustrate trends via data associations and visualisations, and is cross-searchable, providing researchers with new pathways into digital materials.

All the data used to create these secondary features is also available to institutions for their own teaching and research projects.

Images of Florence Nightingale's letters and manuscripts and a still from a coxcomb data visualisation.

The Nightingale Papers - Interactive Browsing Tool in Medical Services and Warfare.

Text and Data Mining: policy, restrictions and licence agreement

AM recognises the benefits that Data Mining has for new research in the Humanities and Social Sciences and we are committed to enabling these research methods on the following principles: 

  1. We allow Data Mining/Text Analysis by "Authorised Users" for fair use/academic research
  2. Secure transfer of data to a university server can be made via FTP on submission of the information form.
  3. Data can be extracted from the main collection website by automated software if we are informed about this so we can monitor server performance and reserve the right to restrict this operation if it impacts standard online usage for our customers generally.
  4. We are committed, where possible, to apply text analysis and data visualisation functionality within our latest products.
  5. Data mining as an activity is no different from all other usage of our products. It has to conform to all the standard requirements in our licence agreement e.g. it is carried out by Authorised Users under Fair Use academic purposes.

Extract of Standard User Licence Agreement:

Subject to all other provisions of our User Licence Agreement and save for the circumstances (as set out in section III of this Agreement) in which the Licensor’s prior written consent is required, the Licensee and the Authorised Users may use the Licensed Materials to perform and engage in text mining /data mining activities in relation to the Licensed Materials for legitimate academic research and other non-commercial educational purposes, without obtaining the Licensor’s prior written consent.

Electronic analysis of data from our products is permitted as outlined above; however there are two key elements that mean we have to have additional processes in place to ensure the following:

  1. Performance of live product websites for standard usage are not damaged by automated data mining software crawling online websites.
  2. Large volumes Data extracted or full data sets provided from the products are stored in a secure way that does not risk the availability of that data to unauthorized/open usage and therefore risk breaching User Licence agreement

As a result, any significant automated data extraction or provision of large volumes of data is unauthorised without receiving written request and in offline data supply; permission being granted in writing. As long as suitable assurances as to the purpose and security of the research is assured on completion of a form then this provision will not be unreasonably withheld.

Extract of relevant section of standard user licence agreement:

Section III

In order to protect the integrity of server performance for the Licensee’s customers, automated extraction of data directly from the Licensed Materials online (for example only, by the use of data mining software) is only permitted after notification to the Licensor for performance monitoring purposes, and if such automatic extraction of data does not affect the performance of the Licensor’s servers. In the event that the Licensor’s servers are negatively impacted, the Licensor reserves the right to decline and prevent access to the Licensed Materials to stop any disruption to the Licensor’s business.

As standard with no further permissions: 

  1. Secure transfer of data to a university server can be made via FTP on submission of the information form.
  2. An offline copy of data provided on a hard drive for secure local storage and analysis. Under current agreements this is limited to a 3 year storage period after which time a renewal can be requested or if project complete, the original data (not any research material) deleted.

Extract of relevant part of licence:

On submission to the Licensor of completed form outlined in Appendix A, an offline copy of data from the Licensed Materials for Data/Text Mining purposes can be made available to be securely hosted locally and accessed by Authorized Users. Local hosting for each Data/ Text Mining purpose must not exceed five years unless further written consent is provided by Licensor; after which agreed period the data must be returned or confirmed as destroyed within 15 days.

Licensor and copyright holder of Licensed Materials must be acknowledged in published text analysis research results derived from the Licensed Materials.

Request data

At AM, we recognise the benefits that Data Mining has for new research in the Humanities and Social Sciences. Our aim has been to make the process of requesting data as simple as possible, and to make that data available for free.

In order to make a text and data mining request, simply follow the link above to fill in a form. The form will go to our Customer Support team, who will then process the request and send the data to you via FTP.

Two young women looking at a laptop in a classroom, next to a vase of flowers.

Webinar: Digital Humanities with AM

Watch a recording of a webinar with Dr Ben Lacey, Head of Engagment at AM, for an overview of how you can use your AM collections for Digital Humanities scholarship. In the recording, Ben provides an overview of how to request data from the collections for use in text and data mining. He will also present a case study of a project that used a full-text data set as well as examples of how instructors have applied the content of these collections in a more introductory way, focusing on digital presentation of student work.

A man wearing noise cancelling headphones sitting in a cafe, staring intently at a laptop, leaning his chins in his hands, which are folded together.