Gale Digital Scholar Lab
Gale Digital Scholar Lab is a cloud-based platform that enables students and researchers to access content and OCR data from Gale Primary Sources and analyse these archives with text and data mining tools.
Users at any level will be able to work easily and efficiently with large corpora of text-data, organising custom data sets and digital tools that reflect the unique needs of individual researchers or entire classrooms.
In the Gale Digital Scholar Lab, you can:
- Access a broad range of manuscripts and material from The University of Manchester’s Gale Primary Sources holdings
- View the original Primary Source document and OCR text side-by-side - the Primary Source highlights the keywords used to perform the search and from where this document was derived.
- View the OCR confidence rating of a document and learn how the OCR text was generated for these collections.
- Construct custom content sets from the range of Gale Primary Sources available
- Analyse content sets with powerful text mining tools
- Organise and manage your research
- Export tabular data, and visualisations in standard formats
Infrastructure designed for digital scholars
The complex nature of the research workflow is often a barrier for entry into digital projects. The Gale Digital Scholar Lab brings content, tools, and organisation to complex projects in an accessible way.
The workflow to create a project is illustrated on the landing page where the main components of the Lab are displayed. The workflow begins with building a personalised archive or content set through the database-style search functionality, and continues to the analysis of your content set using a variety of text-mining analysis methods, and finally to the desktop view where you can manage, organize and share the research accomplished in the Lab.
Search interface and search results
In the search results, you can see a snapshot view of the high-level metadata about each Copy for Web Promotion for University of Manchester document. The metadata includes many facets that you would expect to see about each document (such as the collection from where it came, the date of pub, the author), the initial lines of OCR text for each document is also incorporated, as well as the OCR confidence rating. OCR Confidence is a metadata field surfacing the OCR algorithm’s confidence in its results. NB. OCR Confidence is not the same as OCR accuracy. The OCR process can be affected by several things, including the age or condition of the document, the age of the digital archive, or the quality of the scan. Surfacing the OCR Confidence score allows researchers to quickly identify documents where there might be an issue with the underlying OCR and tag them for further close reading to check the quality of the OCR. From the search results page, you can further limit your search by selecting relevant databases, or subjects for your specific content set.
Creation of custom sets
Having the ability to custom-curate a digital archive on the fly is one of the biggest timesaving aspects of the Lab. With potential access to up to 166M+ pages* of unique Gale Primary Sources, users can quickly search, filter and sort the archives to which their institution has access and then create a custom content set to assist research. The speed and ease of creating personalised corpora is what really sets the Gale Digital Scholar Lab apart.
The organisation and formatting of content sets can be labour-intensive and requires significant computational infrastructure for hosting and use. The Gale Digital Scholar Lab allows faculty, students, and staff to focus on their core research questions, instead of spending months, or even years, downloading, cleaning, preparing and formatting texts for analysis.
*The number of pages will change based on the University of Manchester’s Gale Primary Source holdings and what Gale Primary Sources are available in the Lab.
Supports a broad range of users
With a database-style interface, the Gale Digital Scholar Lab provides a familiar navigational style for students who are new to text data mining as well as seasoned scholars. Adding Value to Gale Primary Sources collections For librarians, the Gale Digital Scholar Lab gives staff a clear path to advertise their Gale Primary Sources collections to a new user: the digital scholar. It also brings a broad message of support to scholarly research across the campus, and new awareness of the library as the centre of scholarly information.