[PPT] - Detecting Depression with Audio/Text Sequence Modeling of Interviews PowerPoint Presentation

SLIDE 1

Detecting Depression with Audio/Text Sequence Modeling of Interviews

Tuka Alhanai, Mohammad Ghassemi, and James Glass Massachusetts Institute of Technology Cambridge MA USA 2nd September, Interspeech 2018

Email: tuka@mit.edu Website: talhanai.com Github: github.com/talhanai

SLIDE 2

How have you been feeling lately?

Healthy: “I’ve been feeling good lately” Depressed: “sigh stressed [um] lately I’ve been really sad and I don’t know why”

2

SLIDE 3

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

3

SLIDE 4

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

4

SLIDE 5

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

5

SLIDE 6

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

6

SLIDE 7

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

7

SLIDE 8

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

8

SLIDE 9

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

9

SLIDE 10

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

10

SLIDE 11

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

11

SLIDE 12

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

12

SLIDE 13

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

Yang, Le, et al. "Decision tree based depression classification from audio video and language information." Proceedings of the 6th International Workshop

n Audio/Visual Emotion Challenge. ACM, 2016.

13

SLIDE 14

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

Yang, Le, et al. "Decision tree based depression classification from audio video and language information." Proceedings of the 6th International Workshop

n Audio/Visual Emotion Challenge. ACM, 2016.

14

SLIDE 15

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

Yang, Le, et al. "Decision tree based depression classification from audio video and language information." Proceedings of the 6th International Workshop

n Audio/Visual Emotion Challenge. ACM, 2016.

15

SLIDE 16

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

Yang, Le, et al. "Decision tree based depression classification from audio video and language information." Proceedings of the 6th International Workshop

n Audio/Visual Emotion Challenge. ACM, 2016.

16

SLIDE 17

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

Yang, Le, et al. "Decision tree based depression classification from audio video and language information." Proceedings of the 6th International Workshop

n Audio/Visual Emotion Challenge. ACM, 2016.

17

SLIDE 18

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

Yang, Le, et al. "Decision tree based depression classification from audio video and language information." Proceedings of the 6th International Workshop

n Audio/Visual Emotion Challenge. ACM, 2016.

18

SLIDE 19

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

19

SLIDE 20

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

keyword: “therapy” “sad” “down”

Williamson, James R., et al. "Detecting depression using vocal, facial and semantic communication cues." Proceedings of the 6th International Workshop

n Audio/Visual Emotion Challenge. ACM, 2016.

20

SLIDE 21

Current Solutions

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

keyword: “therapy” “sad” “down”

Williamson, James R., et al. "Detecting depression using vocal, facial and semantic communication cues." Proceedings of the 6th International Workshop

n Audio/Visual Emotion Challenge. ACM, 2016.

[Topic 1 features, Topic 2 features, Topic 3 features]

Gong, Yuan, and Christian Poellabauer. "Topic Modeling Based Multi-modal Depression Detection." Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. ACM, 2017.

21

SLIDE 22

Our Solution

1. Clinical:

Interview/questionnaire

2. Automated:

Context-dependent Feature engineering

22

SLIDE 23

Our Solution

1. Clinical:

Interview/questionnaire

2. Automated:

Context-independent Feature engineering free

23

SLIDE 24

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. Q: What'd you study at school? A: um I took up business and administration. Q: Cool are you still doing that? A: Yeah I am. Here and there, I’m on a break right now but I plan on going back in the uh next semester. Q: What’s your dream job? A: uh probably open up my own business.

24

SLIDE 25

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. Q: What'd you study at school? A: um I took up business and administration. Q: Cool are you still doing that? A: Yeah I am. Here and there, I’m on a break right now but I plan on going back in the uh next semester. Q: What’s your dream job? A: uh probably open up my own business.

25

SLIDE 26

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. Q: What'd you study at school? A: um I took up business and administration. Q: Cool are you still doing that? A: Yeah I am. Here and there, I’m on a break right now but I plan on going back in the uh next semester. Q: What’s your dream job? A: uh probably open up my own business.

26

SLIDE 27

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. Q: What'd you study at school? A: um I took up business and administration. Q: Cool are you still doing that? A: Yeah I am. Here and there, I’m on a break right now but I plan on going back in the uh next semester. Q: What’s your dream job? A: uh probably open up my own business.

27

SLIDE 28

Method

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

28

SLIDE 29

Method

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

1/0 depression

29

SLIDE 30

Method

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

bi - LSTM bi - LSTM

text

1/0 depression

30

SLIDE 31

Method

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

bi - LSTM bi - LSTM bi - LSTM bi - LSTM bi - LSTM

text audio

1/0 depression

31

SLIDE 32

Method

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

bi - LSTM bi - LSTM bi - LSTM bi - LSTM bi - LSTM concatenate

text audio

1/0 depression

32

SLIDE 33

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

text audio

1/0 depression

33

Method

(context-free)

SLIDE 34

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

text audio

0.2 0.6 0.2 0.4 0.4 0.2

1/0 depression

34 What do you like about LA? How have you been feeling lately? What’s your sleep been like?

P(depressed) P(depressed)

Method

(context-free)

SLIDE 35

… Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion. …

text audio

0.2 0.6 0.2 0.4 0.4 0.2

1/0 depression

0.4 0.5 0.3

x

0.4 0.7 0.2

x Q Q

35 What do you like about LA? How have you been feeling lately? What’s your sleep been like?

Method

(weighted)

SLIDE 36

Data

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

36

Corpus:

Distress Analysis Interview (DAIC).

Recordings: 142 (train = 107, test = 35).
Structure: Wizard-of-Oz dialogue with 170 unique

questions and 8,050 responses.

Outcome: binary (28 depressed).

SLIDE 37

Data

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

37

Corpus:

Distress Analysis Interview (DAIC).

Recordings: 142 (train = 107, test = 35).
Structure: Wizard-of-Oz dialogue with 170 unique

questions and 8,050 responses.

Outcome: binary (28 depressed).

SLIDE 38

Data

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

38

Corpus:

Distress Analysis Interview (DAIC)

Recordings: 142 (train = 107, test = 35).
Structure: Wizard-of-Oz dialogue with 170 unique

questions and 8,050 responses.

Outcome: binary (28 depressed).

SLIDE 39

Data

Q: What are some things you really like about L.A.? A: I like the weather, I like the opportunities. Q: How easy was it for you to get used to living in L.A.? A: It took a minute, somewhat easy. Q: What are some things you don't really like about L.A.? A: Congestion.

39

Corpus:

Distress Analysis Interview (DAIC).

Recordings: 142 (train = 107, test = 35).
Structure: Wizard-of-Oz dialogue with 170 unique

questions and 8,050 responses.

Outcome:

binary (28 depressed).

SLIDE 40

Features

Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment (mean, max, min, med, std, skew, kurt).

Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

40

SLIDE 41

Features

Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment (mean, max, min, med, std, skew, kurt).

Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

41

SLIDE 42

Features

Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment (mean, max, min, med, std, skew, kurt).

Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

42

SLIDE 43

Features

Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment (mean, max, min, med, std, skew, kurt).

Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

43

SLIDE 44

Features

Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment (mean, max, min, med, std, skew, kurt).

Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

44

SLIDE 45

Features

Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment. {mean, max, min, med, std, skew, kurt}

Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

45

SLIDE 46

Features

Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment. {mean, max, min, med, std, skew, kurt}

Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

46

SLIDE 47

Features

Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment. {mean, max, min, med, std, skew, kurt}

Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

47

SLIDE 48

Features

Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment. {mean, max, min, med, std, skew, kurt}

Text: (x 100)

Doc2Vec embeddings of question-response. dim = 100, min-words = 3, context-win = 3, epochs = 50.

48

SLIDE 49

Features

Audio: (x 279)

Spectral energy, prosody, and voice quality. Higher order statistics of response segment. {mean, max, min, med, std, skew, kurt}

Text: (x 100)

Doc2Vec embeddings of question-response. {dim = 100, min-words = 3, context-win = 3, epochs = 50} 49

SLIDE 50

Results

50

SLIDE 51

Results

51

SLIDE 52

Results

52

SLIDE 53

Results

53

Overfitting on text.

SLIDE 54

Results

54

Overfitting on text. Audio generalizes better.

SLIDE 55

Results

55

SLIDE 56

Results

56

Better generalization.

SLIDE 57

Results

57

Better generalization.

SLIDE 58

Results

58

Depression cues exist at relatively longer audio intervals.

7 30 timesteps

SLIDE 59

Results

59

SLIDE 60

Results

60

Complementary information.

SLIDE 61

Results

61

Complementary information.

SLIDE 62

Conclusion

1. Sequence Modeling:

Improved generalization. Better than baseline(s).

2. Modality Inputs to Model:

audio = 30 sequences text = 7 sequences Depression cues exist at longer speech intervals.

3. Weighted Model:

Overfitting on text.

62

SLIDE 63

Future Work

1. Apply technique to larger number of subjects

with different conditions (dementia).

2. Infer most predictive segments.
3. Infer patterns being captured (speaking rate?

keywords?)

63

SLIDE 64

Reference

T. Alhanai, MM. Ghassemi, and J. Glass, "Detecting Depression with

Audio/Text Sequence Modeling of Interviews," Proc. Interspeech, Hyderabad, India, September 2018.

Publication: https://groups.csail.mit.edu/sls/publications/2018/Alhanai_Interspeech-2018.pdf Github: https://github.com/talhanai/redbud-tree-depression

64

Detecting Depression with Audio/Text Sequence Modeling of Interviews

How have you been feeling lately?

Healthy: “I’ve been feeling good lately” Depressed: “*sigh* stressed [um] lately I’ve been really sad and I don’t know why”

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

Hypothesis: Depression can be modeled using sequences of a spoken interaction.

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

keyword: “therapy” “sad” “down”

Current Solutions

Interview/questionnaire

Context-dependent Feature engineering

keyword: “therapy” “sad” “down”

[Topic 1 features, Topic 2 features, Topic 3 features]

Our Solution

Interview/questionnaire

Context-dependent Feature engineering

Our Solution

Interview/questionnaire

Context-independent Feature engineering free

Method

Method

Method

Method

Method

Method

(context-free)

Method

(context-free)

Method

(weighted)

Data

Distress Analysis Interview (DAIC).

questions and 8,050 responses.

Data

Distress Analysis Interview (DAIC).

questions and 8,050 responses.

Data

Distress Analysis Interview (DAIC)

questions and 8,050 responses.

Data

Healthy: “I’ve been feeling good lately” Depressed: “sigh stressed [um] lately I’ve been really sad and I don’t know why”