1st Edition

Data Analytics for Discourse Analysis with Python The Case of Therapy Talk

By Dennis Tay Copyright 2024
    190 Pages 26 B/W Illustrations
    by Routledge

    This concise volume, using examples of psychotherapy talk, showcases the potential applications of data analytics for advancing discourse research and other related disciplines.

    The book provides a brief primer on data analytics, defined as the science of analyzing raw data to reveal new insights and support decision making. Currently underutilized in discourse research, Tay draws on the case of psychotherapy talk, in which clients’ concerns are worked through via verbal interaction with therapists, to demonstrate how data analytics can address both practical and theoretical concerns. Each chapter follows a consistent structure, offering a streamlined walkthrough of a key technique, an example case study, and annotated Python code. The volume shows how techniques such as simulations, classification, clustering, and time series analysis can address such issues as incomplete data transcripts, therapist–client (a)synchrony, and client prognosis, offering inspiration for research, training, and practitioner self-reflection in psychotherapy and other discourse contexts.

    This volume is a valuable resource for discourse and linguistics researchers, particularly for those interested in complementary approaches to qualitative methods, as well as active practitioners.


    Defining data analytics

    Data analytics for discourse analysis

    The case of psychotherapy talk

    Outline of the book

    Quantifying language and implementing data analytics

    Quantification of language: word embedding

    Quantification of language: LIWC scores

    Introduction to Python and basic operations


    Chapter 2 Monte Carlo simulations

    Introduction to MCS: bombs, birthdays, and casinos

    The birthday problem

    Spinning the casino roulette

    Case study: Simulating missing or incomplete transcripts

                            Step 1: Data and LIWC scoring

                            Step 2: Simulation runs with a train-test approach

                            Step 3: Analysis and validation of aggregated outcomes

    Python code used in this chapter


    Chapter 3 Cluster analysis

    Introduction to cluster analysis: creating groups for objects

                            Agglomerative hierarchical clustering (AHC)

                            k-means clustering

    Case study: Measuring linguistic (a)synchrony between therapists and clients

                            Step 1: Data and LIWC scoring

                            Step 2: k-means clustering and model validation

                            Step 3: Qualitative analysis in context

    Python code used in this chapter


    Chapter 4 Classification

    Introduction to classification: predicting groups from objects

    Case study: Predicting therapy types from therapist-client language            

                            Step 1: Data and LIWC scoring

                            Step 2: k-NN and model validation

    Python code used in this chapter


    Chapter 5 Time series analysis

    Introduction to time series analysis: squeezing juice from sugarcane

    Structure and components of time series data

    Time series models as structural signatures

    Case study: Modeling and forecasting psychotherapy language across sessions      

                            Step 1: Inspect series

                            Step 2: Compute (P)ACF

                            Step 3: Identify candidate models

                            Step 4: Fit model and estimate parameters

                            Step 5: Evaluate predictive accuracy, model fit, and residual diagnostics

                            Step 6: Interpret models in context

    Python code used in this chapter



    Data analytics as a rifle and a spade

    Applications in other discourse contexts

    Combining data analytic techniques in a project

    Final words: invigorate, collaborate, and empower


    Dennis Tay is Professor at the Department of English and Communication, the Hong Kong Polytechnic University. He is Co-Editor-in-Chief of Metaphor and the Social World, Associate Editor of Metaphor and Symbol, Academic Editor of PLOS One, and Review Editor of Cognitive Linguistic Studies. His recent Routledge publication is Time Series Analysis of Discourse: Method and Case Studies (2020).