Detecting Depression with Audio/Text Sequence Modeling of Interviews - - PowerPoint PPT Presentation

detecting depression with audio text sequence modeling of
SMART_READER_LITE
LIVE PREVIEW

Detecting Depression with Audio/Text Sequence Modeling of Interviews - - PowerPoint PPT Presentation

Detecting Depression with Audio/Text Sequence Modeling of Interviews Tuka Alhanai, Mohammad Ghassemi, and James Glass Massachusetts Institute of Technology Cambridge MA USA 2 nd September, Interspeech 2018 Email: tuka@mit.edu Website:


slide-1
SLIDE 1

Detecting Depression with Audio/Text Sequence Modeling of Interviews

Tuka Alhanai, Mohammad Ghassemi, and James Glass Massachusetts Institute of Technology Cambridge MA USA 2nd September, Interspeech 2018

Email: tuka@mit.edu Website: talhanai.com Github: github.com/talhanai

slide-2
SLIDE 2

How have you been feeling lately?

Healthy: “I’ve been feeling good lately” Depressed: “*sigh* stressed [um] lately I’ve been really sad and I don’t know why”

2

slide-3
SLIDE 3

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

3

slide-4
SLIDE 4

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

4

slide-5
SLIDE 5

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

5

slide-6
SLIDE 6

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

6

slide-7
SLIDE 7

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

7

slide-8
SLIDE 8

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

8

slide-9
SLIDE 9

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

9

slide-10
SLIDE 10

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

10

slide-11
SLIDE 11

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

11

slide-12
SLIDE 12

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

12

slide-13
SLIDE 13

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

Yang, Le, et al. "Decision tree based depression classification from audio video and language information." Proceedings of the 6th International Workshop

  • n Audio/Visual Emotion Challenge. ACM, 2016.

13

slide-14
SLIDE 14

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

Yang, Le, et al. "Decision tree based depression classification from audio video and language information." Proceedings of the 6th International Workshop

  • n Audio/Visual Emotion Challenge. ACM, 2016.

14

slide-15
SLIDE 15

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

Yang, Le, et al. "Decision tree based depression classification from audio video and language information." Proceedings of the 6th International Workshop

  • n Audio/Visual Emotion Challenge. ACM, 2016.

15

slide-16
SLIDE 16

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

Yang, Le, et al. "Decision tree based depression classification from audio video and language information." Proceedings of the 6th International Workshop

  • n Audio/Visual Emotion Challenge. ACM, 2016.

16

slide-17
SLIDE 17

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

Yang, Le, et al. "Decision tree based depression classification from audio video and language information." Proceedings of the 6th International Workshop

  • n Audio/Visual Emotion Challenge. ACM, 2016.

17

slide-18
SLIDE 18

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

Yang, Le, et al. "Decision tree based depression classification from audio video and language information." Proceedings of the 6th International Workshop

  • n Audio/Visual Emotion Challenge. ACM, 2016.

18

slide-19
SLIDE 19

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

19

slide-20
SLIDE 20

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

keyword: “therapy” “sad” “down”

Williamson, James R., et al. "Detecting depression using vocal, facial and semantic communication cues." Proceedings of the 6th International Workshop

  • n Audio/Visual Emotion Challenge. ACM, 2016.

20

slide-21
SLIDE 21

Current Solutions

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

keyword: “therapy” “sad” “down”

Williamson, James R., et al. "Detecting depression using vocal, facial and semantic communication cues." Proceedings of the 6th International Workshop

  • n Audio/Visual Emotion Challenge. ACM, 2016.

[Topic 1 features, Topic 2 features, Topic 3 features]

Gong, Yuan, and Christian Poellabauer. "Topic Modeling Based Multi-modal Depression Detection." Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. ACM, 2017.

21

slide-22
SLIDE 22

Our Solution

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-dependent Feature engineering

22

slide-23
SLIDE 23

Our Solution

  • 1. Clinical:

Interview/questionnaire

  • 2. Automated:

Context-independent Feature engineering free

23

slide-24
SLIDE 24

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. Q: What'd you study at school? A: um I took up business and administration. Q: Cool are you still doing that? A: Yeah I am. Here and there, I’m on a break right now but I plan on going back in the uh next semester. Q: What’s your dream job? A: uh probably open up my own business.

24

slide-25
SLIDE 25

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. Q: What'd you study at school? A: um I took up business and administration. Q: Cool are you still doing that? A: Yeah I am. Here and there, I’m on a break right now but I plan on going back in the uh next semester. Q: What’s your dream job? A: uh probably open up my own business.

25

slide-26
SLIDE 26

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. Q: What'd you study at school? A: um I took up business and administration. Q: Cool are you still doing that? A: Yeah I am. Here and there, I’m on a break right now but I plan on going back in the uh next semester. Q: What’s your dream job? A: uh probably open up my own business.

26

slide-27
SLIDE 27

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. Q: What'd you study at school? A: um I took up business and administration. Q: Cool are you still doing that? A: Yeah I am. Here and there, I’m on a break right now but I plan on going back in the uh next semester. Q: What’s your dream job? A: uh probably open up my own business.

27

slide-28
SLIDE 28

Method

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

28

slide-29
SLIDE 29

Method

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

1/0 depression

29

slide-30
SLIDE 30

Method

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

bi - LSTM bi - LSTM

text

1/0 depression

30

slide-31
SLIDE 31

Method

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

bi - LSTM bi - LSTM bi - LSTM bi - LSTM bi - LSTM

text audio

1/0 depression

31

slide-32
SLIDE 32

Method

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

bi - LSTM bi - LSTM bi - LSTM bi - LSTM bi - LSTM concatenate

text audio

1/0 depression

32

slide-33
SLIDE 33

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

text audio

1/0 depression

33

Method

(context-free)

slide-34
SLIDE 34

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

text audio

0.2 0.6 0.2 0.4 0.4 0.2

1/0 depression

34 What do you like about LA? How have you been feeling lately? What’s your sleep been like?

P(depressed) P(depressed)

Method

(context-free)

slide-35
SLIDE 35

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

text audio

0.2 0.6 0.2 0.4 0.4 0.2

1/0 depression

0.4 0.5 0.3

x

0.4 0.7 0.2

x Q Q

35 What do you like about LA? How have you been feeling lately? What’s your sleep been like?

Method

(weighted)

slide-36
SLIDE 36

Data

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

36

  • Corpus:

Distress Analysis Interview (DAIC).

  • Recordings: 142 (train = 107, test = 35).
  • Structure: Wizard-of-Oz dialogue with 170 unique

questions and 8,050 responses.

  • Outcome: binary (28 depressed).
slide-37
SLIDE 37

Data

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

37

  • Corpus:

Distress Analysis Interview (DAIC).

  • Recordings: 142 (train = 107, test = 35).
  • Structure: Wizard-of-Oz dialogue with 170 unique

questions and 8,050 responses.

  • Outcome: binary (28 depressed).
slide-38
SLIDE 38

Data

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

38

  • Corpus:

Distress Analysis Interview (DAIC)

  • Recordings: 142 (train = 107, test = 35).
  • Structure: Wizard-of-Oz dialogue with 170 unique

questions and 8,050 responses.

  • Outcome: binary (28 depressed).
slide-39
SLIDE 39

Data

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

39

  • Corpus:

Distress Analysis Interview (DAIC).

  • Recordings: 142 (train = 107, test = 35).
  • Structure: Wizard-of-Oz dialogue with 170 unique

questions and 8,050 responses.

  • Outcome:

binary (28 depressed).

slide-40
SLIDE 40

Features

  • Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment (mean, max, min, med, std, skew, kurt).

  • Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

40

slide-41
SLIDE 41

Features

  • Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment (mean, max, min, med, std, skew, kurt).

  • Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

41

slide-42
SLIDE 42

Features

  • Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment (mean, max, min, med, std, skew, kurt).

  • Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

42

slide-43
SLIDE 43

Features

  • Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment (mean, max, min, med, std, skew, kurt).

  • Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

43

slide-44
SLIDE 44

Features

  • Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment (mean, max, min, med, std, skew, kurt).

  • Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

44

slide-45
SLIDE 45

Features

  • Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment. {mean, max, min, med, std, skew, kurt}

  • Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

45

slide-46
SLIDE 46

Features

  • Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment. {mean, max, min, med, std, skew, kurt}

  • Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

46

slide-47
SLIDE 47

Features

  • Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment. {mean, max, min, med, std, skew, kurt}

  • Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

47

slide-48
SLIDE 48

Features

  • Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment. {mean, max, min, med, std, skew, kurt}

  • Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

48

slide-49
SLIDE 49

Features

  • Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment. {mean, max, min, med, std, skew, kurt}

  • Text: (x 100)

Doc2Vec embeddings of question-response. {dim = 100, min-words = 3, context-win = 3, epochs = 50} 49

slide-50
SLIDE 50

Results

50

slide-51
SLIDE 51

Results

51

slide-52
SLIDE 52

Results

52

slide-53
SLIDE 53

Results

53

Overfitting on text.

slide-54
SLIDE 54

Results

54

Overfitting on text. Audio generalizes better.

slide-55
SLIDE 55

Results

55

slide-56
SLIDE 56

Results

56

Better generalization.

slide-57
SLIDE 57

Results

57

Better generalization.

slide-58
SLIDE 58

Results

58

Depression cues exist at relatively longer audio intervals.

7 30 timesteps

slide-59
SLIDE 59

Results

59

slide-60
SLIDE 60

Results

60

Complementary information.

slide-61
SLIDE 61

Results

61

Complementary information.

slide-62
SLIDE 62

Conclusion

  • 1. Sequence Modeling:

Improved generalization. Better than baseline(s).

  • 2. Modality Inputs to Model:

audio = 30 sequences text = 7 sequences Depression cues exist at longer speech intervals.

  • 3. Weighted Model:

Overfitting on text.

62

slide-63
SLIDE 63

Future Work

  • 1. Apply technique to larger number of subjects

with different conditions (dementia).

  • 2. Infer most predictive segments.
  • 3. Infer patterns being captured (speaking rate?

keywords?)

63

slide-64
SLIDE 64

Reference

  • T. Alhanai, MM. Ghassemi, and J. Glass, "Detecting Depression with

Audio/Text Sequence Modeling of Interviews," Proc. Interspeech, Hyderabad, India, September 2018.

Publication: https://groups.csail.mit.edu/sls/publications/2018/Alhanai_Interspeech-2018.pdf Github: https://github.com/talhanai/redbud-tree-depression

64