Robust adaptive discourse parsing for e-learning fora Nadine Lucas - - PowerPoint PPT Presentation

robust adaptive discourse parsing for e learning fora
SMART_READER_LITE
LIVE PREVIEW

Robust adaptive discourse parsing for e-learning fora Nadine Lucas - - PowerPoint PPT Presentation

Robust adaptive discourse parsing for e-learning fora Nadine Lucas & Emmanuel Giguet Cnrs Caen University France http://www.info.unicaen.fr/~nadine Outline Context Agora forum parsing principles Results Example:


slide-1
SLIDE 1

Robust adaptive discourse parsing for e-learning fora

Nadine Lucas & Emmanuel Giguet Cnrs Caen University France

http://www.info.unicaen.fr/~nadine

slide-2
SLIDE 2

Titre 2

Outline

  • Context
  • “Agora” forum parsing principles
  • Results
  • Example: parsing on the fly
  • Conclusion
slide-3
SLIDE 3

Titre 3

Main objectives

  • Follow-up of students’ fora (on-line

discussions)

– Monitoring the students’ participation – Detecting the cold start problem – Detecting building up of momentum in collective discussion

  • Reflection on past experience

– Tutor’s intervention

  • Give access to content (text itself)

Context

slide-4
SLIDE 4

Titre 4

What is the problem?

  • Large amount of textual data

– Scrolling and reading takes time

  • Yet, sentence parsing is not efficient

Context

slide-5
SLIDE 5

5

Words in sentences?

slide-6
SLIDE 6

6

Scale related to expectations

  • 15 fora going on at the same time on

a platform

– 53 threads in a forum and 166 posts

  • Have a look on how the forum is

faring

– Assess collaboration

  • Discourse parsing ?

– Meaning units ?

slide-7
SLIDE 7

Titre 7

Calico

  • Calico (French Ministry of Education)

– 2005-2008

  • Practitioners and researchers

– 10 teams

  • Exchange platform

– https://wims.crashdump.net/www/calico/

  • Agora forum parser is one among

many tools

Context

slide-8
SLIDE 8

8

Monitoring tools

QuickTimeª et un dŽcompresseur TIFF (non compressŽ) sont requis pour visionner cette image.

slide-9
SLIDE 9

Titre 9

E-learning

  • Students’ on-line discussions (BBs,

fora)

– Distance learning – Presence learning – Mixed

  • French, English, Spanish

Context

slide-10
SLIDE 10

10

French forum

slide-11
SLIDE 11

11

Agora

Agora

Input whole forum file html Conversion to XML Segmentation Chrono order Parsing Visualisation Output coloured hierarchy

slide-12
SLIDE 12

Titre 12

Agora parsing principles

  • On line discussion

– Collective discourse

  • Time line

– Rhythm

  • Projected interpretation grid

– Expository discourse + communication

  • Difference principles

Agora

slide-13
SLIDE 13

Titre 13

Rythm

  • Start versus discussion proper

– Coordination and subordination relations – By default three levels

Agora

slide-14
SLIDE 14

14

3 levels

tuning d i s c u s s i

  • n

moments rounds global

slide-15
SLIDE 15

Titre 15

Find the odd element in a series

  • Whole forum (at time T)

– Background pattern

  • Standard message length and structure
  • Standard exchange structure

– Salient features

  • Odd post(s) in a series
  • Border

Agora

slide-16
SLIDE 16

Titre 16

Relative saliency

  • Detection of similarities or differences

– Along time

  • related features, same patterns --> coordinate

– According to distributional saliency

  • new patterns --> subordinate or superordinate
  • hierarchy in inverse frequency

Agora

slide-17
SLIDE 17

17

slide-18
SLIDE 18

18

Relative difference

  • No exhaustive description
  • Just check differences

– Message groups homogeneity

  • Message size
  • Message structure

– Distribution of rare contrastive salient features

  • HTML labels
  • Smilies, punctuation

Agora

slide-19
SLIDE 19

Titre 19

Technical side

  • XMLForum exchange format
  • Segmentation
  • Chronological ordering
  • Parsing
  • Visualisation

Agora

slide-20
SLIDE 20

20

slide-21
SLIDE 21

21

Wrappers and snippets

slide-22
SLIDE 22

22

Shrunk vignette view

slide-23
SLIDE 23

Titre 23

Visualisation

  • Show compact view

– Tuning versus Discussion proper – Discussion divided in “moments”

  • Not topics
  • Zooming in

– Moments sub divided in rounds

  • All units expandable

– Showing full content

Agora

slide-24
SLIDE 24

24

Compact view

slide-25
SLIDE 25

Titre 25

Results

  • Show only main hierarchy

– Provide a kind of signature for fora

  • Compare fora at a glance

– on the same period or same task – for different classes or different groups

Results

slide-26
SLIDE 26

26

OS Proje cts 07 vs 08

slide-27
SLIDE 27

27

OS Conce pts ≠ OS Proje cts 07

slide-28
SLIDE 28

28

Results

Zooming on OS Proje cts 07

slide-29
SLIDE 29

29

Zooming on OS Proje cts 08

Results

slide-30
SLIDE 30

30

Zooming on OS Proje cts 08

Results

slide-31
SLIDE 31

31

Expanding a cell

Results

slide-32
SLIDE 32

Titre 32

Results

Agora

  • No need for dictionary
  • No costly description and storage of all

possible formats, labels etc…

  • Exploits differences in layout, labels

and punctuation distribution

  • Results reflect meaningful turns in

collective discussion

slide-33
SLIDE 33

Evolution in time

When does a collective discussion get momentum?

slide-34
SLIDE 34

34

Parsing on the fly

  • Forum in Computer Science
  • OS Projects 1st semester 08

– 53 threads in a forum and 166 posts

Example

slide-35
SLIDE 35

35

After 1 week

  • Tuning not performed yet

Example

slide-36
SLIDE 36

36

After 2 weeks

  • Tuning achieved

Example

slide-37
SLIDE 37

37

After 6 weeks

  • Six moments in discussion proper

Example

slide-38
SLIDE 38

38

After 14 weeks: end of term

  • 4 moments : re-arranged
slide-39
SLIDE 39

39

Interpretation

  • Detected higher level pattern

moment G1

  • Code exchange and collaboration

between students

slide-40
SLIDE 40

Titre 40

Summing up

  • Agora helps monitoring students’ discussion

– Works on text

  • gives access to content

– On line

  • Agora is robust

– Does not need external resources

  • Agora is adaptive

– Domain-free – Multilingual – Processes discussion lists as well

Conclusion

slide-41
SLIDE 41

Titre 41

but

  • Visualisation is too coarse

– Give number of masked items

  • [8 posts…] instead of […]

– Give duration of main functional segments

  • Give access to more significant text

– It is difficult to get an idea of the current discussion through snippets

Conclusion

slide-42
SLIDE 42

Titre 42

Further work

  • Tests on different formats
  • Test more languages
  • Large on-line discussions

– Monitoring virtual classes on many tasks

  • Visualisation

– Provide options

Discussion Conclusion

slide-43
SLIDE 43

Thank you

slide-44
SLIDE 44

44

<forum name="OS Projects"> <message id="155"><header><author>Mike Colagrosso</author> <datetime>11/09/2007 13:49</datetime> <subject>Code snippet from sed discussion</subject></header> <body><span class="postbody"></span><table width="90%" cellspacing="1" cellpadding="3" class="code" align="center"> <tr> <td class="row1"><span class="genmed"><b>Code:</b></span></td> </tr> <tr> <td class="row2"><span class="postbody"><font color="#006600">cat index.xml | grep enclosure | sed 's/^.*url=&quot;\&#40;&#91;^\&quot;&#93;*\&#41;&quot;.*$/ \1/'</font></span></td> </tr></table><span class="postbody"></span></body></message> <message id="156"><header><msgref id="155"/><author>AndyMan1</author> <datetime>16/09/2007 23:15</datetime> <subject></subject></header> <body><span class="postbody">I found this cool list of sed one- liners ( *mimes a cigar a la Groucho*). <br /><br />It has examples of doing all sorts of short commands with sed like double spacing a file, deleting every 8th line, print only lines that don't match regexp, etc.<br /><br />Nothing in it seemed to be too revealing in terms of our project. It has a few examples that might be useful as a starting point.<br /><br /><a href="http://sed.sourceforge.net/sed1line.txt"

slide-45
SLIDE 45

45

<forum name="OS Projects"> <message id="155"><header><author>Mike Colagrosso</author> <datetime>11/09/2007 13:49</datetime> <subject>Code snippet from sed discussion</subject></header> <body><span class="postbody"></span><table width="90%" cellspacing="1" cellpadding="3" class="code" align="center"> <tr> <td class="row1"><span class="genmed"><b>Code:</b></span></td> </tr> <tr> <td class="row2"><span class="postbody"><font color="#006600">cat index.xml | grep enclosure | sed 's/^.*url=&quot;\&#40;&#91;^\&quot;&#93;*\&#41;&quot;.*$/ \1/'</font></span></td> </tr></table><span class="postbody"></span></body></message> <message id="156"><header><msgref id="155"/><author>AndyMan1</author> <datetime>16/09/2007 23:15</datetime> <subject></subject></header> <body><span class="postbody">I found this cool list of sed one- liners ( *mimes a cigar a la Groucho*). <br /><br />It has examples of doing all sorts of short commands with sed like double spacing a file, deleting every 8th line, print only lines that don't match regexp, etc.<br /><br />Nothing in it seemed to be too revealing in terms of our project. It has a few examples that might be useful as a starting point.<br /><br /><a href="http://sed.sourceforge.net/sed1line.txt"

slide-46
SLIDE 46

46

Algorithm

Detect breaks Set wrappers Divide Detect background Process unit Group similar Set borders Calculate rank Get wrapped sub-unit

slide-47
SLIDE 47

Titre 47

Find a new set of features

  • Disappearance of common items

– Greetings – Images – …

  • Appearance of new items

– Quotes from other messages – Images – Code (for computer sciences) – …

Agora

slide-48
SLIDE 48

48

Example French forum

slide-49
SLIDE 49

49

slide-50
SLIDE 50

50

Results

slide-51
SLIDE 51

51

Original

slide-52
SLIDE 52

52

Comparison with activity graph

Discussion

slide-53
SLIDE 53

53

Start + 4 weeks

  • Three moments in discussion proper

Discussion