Skip to Main Content

Text and Data Mining

A How Do I Guide that covers resources for Text and Data Mining through the University Library

Introduction

Text and data mining (TDM) describes a range of processes used to analyse sets of text or data to find patterns and relationships. These processes are performed by automated computer tools, which are able to process large amounts of text or data, meaning that no reading or viewing of the materials involved is required. Thus, TDM is considered ‘non-consumptive’ research.

The findings of TDM can be useful for research and studies across various disciplines, and can be combined with other research methods in numerous different ways. TDM is also useful for commercial purposes, such as those relevant to businesses and enterprises. However, commercial use of TDM is outside of the scope of this guide and will not be covered in detail here.

The University of Adelaide Library provides access to a number of resources that can be used for TDM. However, TDM is a new and developing research method, and many publishers do not permit the use of their materials for TDM. It is important to be conscious of copyright and licensing when selecting resources to use for TDM, and to abide by ethical research practices.

See the glossary of terms for clarification on some of the specialised language used throughout this guide.


Data mining vs. text mining: What's the difference?

Text mining, often referred to as text analysis, involves the computational analysis of unstructured, natural texts, such as literature, to find insights on the patterns, relationships, and trends in the language. The unstructured texts used for text analysis are often supported by structured metadata, which gives context to the texts, such as their publication dates and authors. This combination of unstructured text and structured data means that text mining is sometimes referred to as text and data mining or text data mining. The following video provides a brief overview of how text mining works:

 

Data mining is the process of using computational analysis tools to discover patterns, relationships, and trends in structured sets of data. These datasets are very large, and need to be organised into specific, defined formats before mining processes can be performed. See the video below for further information about data mining and some of the techniques involved:


Further reading

What can TDM be used for?

TDM processes can be used to investigate a particular research question or topic, further investigate a particular text or dataset, or to research text and data mining methodologies themselves. Data mining is often relevant to STEM research, while text mining is often useful for research in the humanities, though the increasing intersection between disciplines means that either process can be utilised for various purposes across different fields of study. Beyond academia, TDM can also be useful for businesses and enterprises for commercial and non-commercial purposes.

Examples of TDM in use