Latent Dimensions of Religion and Spirituality: A Longitudinal - - PowerPoint PPT Presentation

latent dimensions of religion and spirituality a
SMART_READER_LITE
LIVE PREVIEW

Latent Dimensions of Religion and Spirituality: A Longitudinal - - PowerPoint PPT Presentation

Latent Dimensions of Religion and Spirituality: A Longitudinal Correlated Topic Model Seong-Hyeon (Sung) Kim 1 , Nathaniel R. Strenger 2 , & Narae Lee 1 1 Fuller Graduate School of Psychology, Pasadena, California, USA 2 Pastoral Counseling


slide-1
SLIDE 1

Latent Dimensions of Religion and Spirituality: A Longitudinal Correlated Topic Model

Seong-Hyeon (Sung) Kim1, Nathaniel R. Strenger2, & Narae Lee1

1Fuller Graduate School of Psychology, Pasadena, California, USA 2Pastoral Counseling Center, Dallas, Texas, USA

slide-2
SLIDE 2

Overview

  • Religion & Spirituality (R/S)
  • Research Questions
  • Topic models
  • Automated text analysis
  • Topics: Latent dimensions of text
  • Topic proportions as compositional data
  • Ternary diagrams
  • Topic correlations
slide-3
SLIDE 3

Religion & Spirituality (R/S)

  • Definitions
  • Religion: “the search for significance that occurs

within the context of established institutions that are designed to facilitate spirituality” (Pargament et al., 2013, p. 15).

  • Spirituality: “the search for the sacred” (Pargament

et al., 2013, p. 14).

Pargament, K. I., Mahoney, A., Exline, J. J., Jones, J. W., & Shafranske, E. P. (2013). Envisioning an integrative paradigm for the psychology of religion and spirituality. In

  • K. I. Pargament, J. J. Exline, & J. W. Jones (Eds.), APA handbook of psychology, religion, and spirituality (Vol 1): Context, theory, and research (pp. 3–19). Washington,

DC: American Psychological Association. https://doi.org/10.1037/14045-001

slide-4
SLIDE 4

Religion & Spirituality (R/S)

  • Gorsuch (1984) introduced factor analysis as a tool to

investigate the dimension of R/S.

  • He had criticized the over-supply of R/S measures.
  • Our research introduces topic modeling as a tool to

identify the fundamental dimensions or building blocks

  • f R/S that had been conceptualized in the R/S

measures.

Gorsuch, R. L. (1984). Measurement: The boon and bane of investigating religion. American Psychologist, 39(3), 228–236. https://doi.org/10.1037/0003-066X.39.3.228

slide-5
SLIDE 5

Automated Text Analysis

  • Quantitative (NOT qualitative) text analysis
  • Three Different Types
  • 1. Dictionary method: Pre-defined set of categories
  • 2. Supervised learning: Outcome categories known

(e.g., spam mail sorting)

  • 3. Unsupervised learning: e.g., topic modeling

(outcome categories unknown)

slide-6
SLIDE 6

Topic Modeling

  • Identify topics, the latent dimensions, in the text data
  • Machine (statistical) learning + computer science +

statistics

  • Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan,

2003): Basic and popular, but does not allow topic correlations

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.

slide-7
SLIDE 7

TASA Corpus: 37,000 Texts & 300 Topics

Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440.

slide-8
SLIDE 8

Example: Steyvers & Griffiths (2007)

  • 2 topics
  • Each gives approximately equal probability to
  • Topic 1: “money,” “loan,” and “bank”
  • Topic 2: “river,” “stream,” and “bank”
  • 16 documents were created by arbitrarily mixing the

two topics

  • Let’s analyze this collection of documents with LDA

(Blei et al., 2003)

. Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. In T. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of Latent Semantic Analysis (pp.424-440). Hillsdale, NJ: Erlbaum.

slide-9
SLIDE 9

Steyvers & Griffiths (2007)

slide-10
SLIDE 10

Example: 16 Documents

slide-11
SLIDE 11

Term Distributions for Topics

Topic 1

Word Probability

bank .390 money .314 loan .287 river .009 stream .000 Topic 2

Word Probability

stream .391 bank .345 river .240 money .012 loan .012

slide-12
SLIDE 12

Topic Distribution for Documents

slide-13
SLIDE 13

Matrix Factorization

Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440.

slide-14
SLIDE 14

LDA & Beyond

  • Limitations of LDA
  • Fails to model correlation between topics
  • Stems from the implicit independence assumption

in the Dirichlet distribution on the topic proportions in documents

  • Topics are usually correlated in texts.
slide-15
SLIDE 15

LDA & Beyond

  • Correlated Topic Model (CTM, Blei & Lafferty, 2007)
  • Replaces the Dirichlet in LDA with “more flexible

logistic normal distribution” (p. 19).

  • This paper cites Aitchison & Shen (1980),

Aitchison (1982), & Aitchison (1985).

Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1(1), 17–35. https://doi.org/10.1214/07-AOAS114 Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B (Methodological), 44(2), 139–177. Aitchison, J. (1985). A general class of distributions on the simplex. Journal of the Royal Statistical Society. Series B (Methodological), 47(1), 136-146. Atchison, J., & Shen, S. M. (1980). Logistic-normal distributions: Some properties and uses. Biometrika, 67(2), 261-272.

slide-16
SLIDE 16

Structural Topic Model (STM)

  • Our research used STM based on CTM
  • Allows topic correlations
  • Allows covariates (i.e., predictors of topic

proportions)

  • We collected 255 R/S measures published from 1929

and 2016 to identify the latent dimensions of text.

slide-17
SLIDE 17

Atkins, D. C., Rubin, T. N., Steyvers, M., Doeden, M. A., Baucom, B. R., & Christensen, A. (2012). Topic Models: A Novel Method for Modeling Couple and Family Text Data. Journal of Family Psychology, 26, 816-27. doi: 10.1037/a0029607

slide-18
SLIDE 18

Preprocessing

  • R ‘tm’ package (Feinerer & Hornik, 2017)
  • Items of 255 R/S measures
  • Preprocessed texts
  • Removed stop words, numbers, and punctuations.
  • e.g., a/an, the, to, for, at, she/he, I, ., or ?.
  • Lemmatized words
  • e.g., educate, educated, or educating  educate

Feinerer, I. & Hornik, K. (2015). tm: Text Mining Package (Version 0.6-2) [Computer software]. Retrieved from https://CRAN.R-project.org/package=tm.

slide-19
SLIDE 19

Preprocessing

  • Created a document-term matrix
  • Dimensions: 255 × 5617
  • Included
  • unigrams
  • bigrams (e.g., Jesus Christ)
  • trigrams (e.g., religious (and/or) spiritual

belief)

  • Deleted low-frequency terms (< 3)
slide-20
SLIDE 20

Model Estimation

  • R ‘stm’ package (Roberts, Stewart, & Tingley, 2017)
  • Topics
  • Latent dimensions of text data
  • Comparable to principal components or factors
  • Estimated based on word co-occurrences across

documents

  • Structural topic modeling
  • Estimate covariates’ effect on topic proportions
  • Current analysis: Decade of publication as a predictor

1950’s through 2010’s

Roberts, M. E., Stewart, B. M., & Tingley, D. (2016). stm: R Package for Structural Topic Models (Version 1.1.3) [Computer software]. Retrieved from http://www.structuraltopicmodel.com

slide-21
SLIDE 21

Top 50 Frequent Terms

slide-22
SLIDE 22

Diagnostic Indexes

slide-23
SLIDE 23

3 Topics Identified

  • Topic 1: Spirituality

spirituality, spiritual belief, religious spiritual, wilderness, never experience, spiritual experience, connect, illness, transcendent, transcendent spiritual

  • Topic 2: Religion

church member, loving, teaching church, dealing, dealing life, local religious, join, local religious group, question meaning life, religious denomination

  • Topic 3: Judeo-Christianity

christian, allah, miracle, god will, god god, punish, client, god feel, patient, writing

slide-24
SLIDE 24

The estimated regression lines and their 95% confidence intervals are plotted.

Longitudinal Change of Expected Topic Proportions from 1950’s to 2010’s

slide-25
SLIDE 25

Created using R ‘compositions’ package (van der Boogaart, Tolosana, & Bren, 2015)

Van den Boogaart, K. G., Tolosana, R. & Bren, M. (2015). compositions: R Package for Compositional Data Analysis (Version 1.40-1) [Computer software]. Retrieved from https://cran.r-project.org/web/packages/compositions/index.html

slide-26
SLIDE 26

Normal Distribution on the Simplex

slide-27
SLIDE 27

Topic Correlations

  • 1. exp(-var(z)): Buccianti & Pawlowsky-Glahn (2005)
  • Z = ilr transformed parts
  • 0 (1) → low (high) variability of ratios between parts
  • e.g., .0016 for Topics 1 and 2
  • 2. exp(-τ2/2): van den Boogaart & Tolosano-Delgado (2013)
  • τ: Variation
  • Interpret this as a correlation coefficient
  • Very small between topics

Buccianti, A., & Pawlowsky-Glahn, V. (2005). New perspectives on water chemistry and compositional data analysis. Mathematical Geology, 37(7), 703-727. Van den Boogaart, K. G., & Tolosana-Delgado, R. (2013). Analyzing compositional data with R (Vol. 122). Heidelberg: Springer.

slide-28
SLIDE 28

THANK YOU