Skip to Main Content

Text and Data Mining

A How Do I Guide that covers resources for Text and Data Mining through the University Library

Library resources

The University provides access to a range of scholarly resources that can be used for the purposes of text and data mining. If you are interested in using a particular resource for TDM, you must adhere to responsible research conduct and first confirm that TDM is permitted by the publisher. One way of determining this information is through the library search interface. When viewing an individual resource's library search page, you can select "show license" beside a publisher under the "Full text availability" heading in the "View Online" section. If text and data mining of the resource is explicitly permitted or prohibited it will say so here.

 

This document lists publishers accessible through the University Library that permit TDM of their materials. Publishers of academic journals, books, and other scholarly works each have their own agreements with the University library, which dictate if students and researchers are permitted to perform TDM using their resources or not. When accessing a resource through the library search interface, you can see its publisher in the "details" section. You can also search for the publisher's name in the 'any field' section of advanced search.

Unless otherwise specified, any output produced by TDM using resources from these publishers must only be used for academia, research, or education, and not for any commercial purposes whatsoever. Some publishers not listed here may also allow TDM of their materials in specific circumstances or on request. These permissions are subject to change based on the licensing agreements between publishers and the University of Adelaide. This document includes information about how to access text and data for mining, and note that in some cases you may need to request a key or token.

The following publishers allow text and data mining of their published materials available through the University of Adelaide library:

Open access resources

There are numerous open access resources that can be used for TDM purposes. It is just as important when accessing these resources to abide by ethical guidelines as well as any licensing requirements.

The most common licenses declared for open access resources are Creative Commons licenses, most often either of the following:

  • CC BY license: commercial and non-commercial use permitted
  • CC BY-NC-SA license: non-commercial use permitted only.

Both licenses require that appropriate credit is given, a link to the license is provided, and any changes made are indicated.

Most sources of open access materials, such as Trove and the Internet Archive, do not have any one license covering all of their resources. Whether or not you are permitted to use any individual resource for the purposes of TDM is dependent on its respective copyright and licensing restrictions. The following is a selection of sources of open access text and data:

SOURCE DETAILS
Australian Data Archive (ADA) Archive collecting and preserving Australian and international research data, to be used for educational and research purposes. Access requires login and agreement to terms and conditions.
CORE Provides aggregated access to international open access repositories and journals, based in the UK. Data can be downloaded directly or accessed using the CORE API.
Crossref DOI registration agency allowing access to full text documents from participating publishers, using the Crossref REST API.
data.gov.au Source of Australian government data.
Hathi Digital Trust Public domain works made available for download by researchers. Access is available with approval using a variety of methods, including the Bibliographic API.
Internet Archive Collects millions of resources, including texts and e-books.
Language Data Commons of Australia Searchable collections of spoken and written languages relevant to Australia. This data portal includes recording transcription of spoken English, collections of Indigenous languages, talkback radio, and oral histories from various periods.
Trove Provides access to numerous resources from Australian libraries, museums, and other institutions, in partnership with the National Library of Australia and various other organisations. The Trove API is available for downloading large amounts of data.