TRACER TUTORIAL: TEXT REUSE DETECTION SELECTION Mar co B uchler, - - PowerPoint PPT Presentation

tracer tutorial text reuse detection selection
SMART_READER_LITE
LIVE PREVIEW

TRACER TUTORIAL: TEXT REUSE DETECTION SELECTION Mar co B uchler, - - PowerPoint PPT Presentation

TRACER TUTORIAL: TEXT REUSE DETECTION SELECTION Mar co B uchler, Emily Franzini and Greta Franzini TABLE OF CONTENTS 1. Wha t is Selection? 2. Selection techniques 3. Hacking 4. Conclusion and revision 2/29 REMINDER: CURRENT APPROACH 3/29


slide-1
SLIDE 1

TRACER TUTORIAL: TEXT REUSE DETECTION SELECTION

Marco B¨ uchler, Emily Franzini and Greta Franzini

slide-2
SLIDE 2

TABLE OF CONTENTS

  • 1. What is Selection?
  • 2. Selection techniques
  • 3. Hacking
  • 4. Conclusion and revision

2/29

slide-3
SLIDE 3

REMINDER: CURRENT APPROACH

3/29

slide-4
SLIDE 4

WHAT IS SELECTION?

slide-5
SLIDE 5

QUESTION

What do you associate with Selection?

5/29

slide-6
SLIDE 6

A VISUALISATION OF FEATURING

From biometry:

6/29

slide-7
SLIDE 7

SOME VOCABULARY

7/29

slide-8
SLIDE 8

SOME DEFINITIONS

  • Global ”knowledge”: Information derived from the entire corpus;
  • Local ”knowledge”: Information derived from the reuse unit (e.g. a

sentence);

  • Global ”usage”: Selection is applied to e.g. the entire word list;
  • Local ”usage”: Selection is applied to the reuse unit (e.g. a sentence).

8/29

slide-9
SLIDE 9

SELECTION TECHNIQUES

slide-10
SLIDE 10

SELECTION: GLOBAL ”KNOWLEDGE” & GLOBAL ”USAGE”

10/29

slide-11
SLIDE 11

SELECTION: GLOBAL ”KNOWLEDGE” & LOCAL ”USAGE”

11/29

slide-12
SLIDE 12

SELECTION: LOCAL ”KNOWLEDGE” & GLOBAL ”USAGE”

12/29

slide-13
SLIDE 13

SELECTION: LOCAL ”KNOWLEDGE” & LOCAL ”USAGE”

13/29

slide-14
SLIDE 14

SELECTION: MATRIX STYLE

       A C D F G E B H I J K s1 1 1 1 1 1 s2 1 1 1 1 1 s3 1 1 1 1 1 s4 1 1 1 1 1 s5 1 1 1 1 1        = F        A C D F G E s1 1 1 1 1 s2 1 1 1 1 1 s3 1 1 1 1 1 s4 1 1 1 1 1 s5 1        = S

14/29

slide-15
SLIDE 15

HOW TO MAKE THE DIFFERENT SELECTION STRATEGIES COMPARABLE?

  • Different Selection strategies would require different parameters.
  • This makes comparisons between Selection strategies difficult.
  • For this reason, we introduce the Feature Density:

F = n

i=1

m′

j=1 sij

n

i=1

m

j=1 fij 15/29

slide-16
SLIDE 16

HACKING

slide-17
SLIDE 17

HACKING: CONFIGURATION

17/29

slide-18
SLIDE 18

HACKING

Tasks:

  • Run on your own texts with different Preprocessing and Featuring

techniques ...

  • eu.etrap.tracer.selection.localglobal.

LocalMaxFeatureFrequencySelectorImpl

  • With different feature densities of 0.4, 0.6, 0.75

18/29

slide-19
SLIDE 19

HACKING

Questions:

  • Run the aforementioned tasks. Compare the resulting ”tail

distributions” (you find all the information in the Selection folder in e.g. *.meta).

  • Compare the tail distribution between Featuring and Selection.

Which influence does the Selection strategy have?

  • Compare the .sel-files for the different Selection strategy (use

Microsoft Excel or OpenOffice to open the Selection file; sort by columns B and C).

19/29

slide-20
SLIDE 20

CONFIGURING THE SELECTION IMPL PARAMETER

Hint: The configuration file can be found in: $TRACER HOME/conf/tracer conf.xml

20/29

slide-21
SLIDE 21

FEATURE CORRELATION

Stimulus Response prob. Number of prob’s Co-occurrence Significance Butter Bread 60 Bread 51 Soft 40 Cheese 49 Milk 32 Sugar 29 Margarine 27 Milk 23 Cheese 20 Margarine 22 Fat 16 Farina 18 Yellow 14 Eggs 16 Bread and butter 8 Pound 14 Box/can 6 Meat 13 Eat 6

21/29

slide-22
SLIDE 22

CONTRASTIVE SEMANTICS

22/29

slide-23
SLIDE 23

GAP BETWEEN KNOWLEDGE AND EXPERIENCE

23/29

slide-24
SLIDE 24

CONCLUSION AND REVISION

slide-25
SLIDE 25

CHECK

How do Preprocessing and Featuring influence Selection?

25/29

slide-26
SLIDE 26

IMPORTANCE OF SELECTION

  • Quality of the digital representation of a reuse unit;
  • Speed for linking reuse units.

26/29

slide-27
SLIDE 27

FINITO!

27/29

slide-28
SLIDE 28

CONTACT

Team Marco B¨ uchler, Greta Franzini and Emily Franzini. Visit us http://www.etrap.eu contact@etrap.eu

28/29

slide-29
SLIDE 29

LICENCE

The theme this presentation is based on is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Changes to the theme are the work of eTRAP.

cba

29/29