A plagiarism detection procedure in three steps: selection, matches - PowerPoint PPT Presentation

A plagiarism detection procedure in three steps: selection, matches and “squares” Chiara Basile - basile@dm.unibo.it Mathematics Department University of Bologna, Italy PAN‘09 Workshop, San Sebastián - Donostia, 10/09/2009 Joint work with Dario Benedetto, Emanuele Caglioti, Giampaolo Cristadoro, Mirko Degli Esposti Chiara Basile (University of Bologna) Plagiarism detection in three steps San Sebastián, 10/09/2009 1 / 12

Introduction Once upon a time... 03/05/09 A group of mathematicians from the Universities of Bologna and Rome La Sapienza gets to know of the Plagiarism Competition and decides to try some preliminary experiments on the external plagiarism corpus using methods developed for different tasks, like authorship recognition and text categorization. Chiara Basile (University of Bologna) Plagiarism detection in three steps San Sebastián, 10/09/2009 2 / 12

Introduction Once upon a time... 03/05/09 A group of mathematicians from the Universities of Bologna and Rome La Sapienza gets to know of the Plagiarism Competition and decides to try some preliminary experiments on the external plagiarism corpus using methods developed for different tasks, like authorship recognition and text categorization. The competition deadline: 07/06/09 Chiara Basile (University of Bologna) Plagiarism detection in three steps San Sebastián, 10/09/2009 2 / 12

Introduction Once upon a time... 03/05/09 A group of mathematicians from the Universities of Bologna and Rome La Sapienza gets to know of the Plagiarism Competition and decides to try some preliminary experiments on the external plagiarism corpus using methods developed for different tasks, like authorship recognition and text categorization. The competition deadline: 07/06/09 - just one month... Chiara Basile (University of Bologna) Plagiarism detection in three steps San Sebastián, 10/09/2009 2 / 12

Introduction Once upon a time... 03/05/09 A group of mathematicians from the Universities of Bologna and Rome La Sapienza gets to know of the Plagiarism Competition and decides to try some preliminary experiments on the external plagiarism corpus using methods developed for different tasks, like authorship recognition and text categorization. The competition deadline: 07/06/09 - just one month... ...and a few documents: “just” 14,428! Chiara Basile (University of Bologna) Plagiarism detection in three steps San Sebastián, 10/09/2009 2 / 12

Introduction Once upon a time... 03/05/09 A group of mathematicians from the Universities of Bologna and Rome La Sapienza gets to know of the Plagiarism Competition and decides to try some preliminary experiments on the external plagiarism corpus using methods developed for different tasks, like authorship recognition and text categorization. The competition deadline: 07/06/09 - just one month... ...and a few documents: “just” 14,428! Therefore, two imperatives: Chiara Basile (University of Bologna) Plagiarism detection in three steps San Sebastián, 10/09/2009 2 / 12

Introduction Once upon a time... 03/05/09 A group of mathematicians from the Universities of Bologna and Rome La Sapienza gets to know of the Plagiarism Competition and decides to try some preliminary experiments on the external plagiarism corpus using methods developed for different tasks, like authorship recognition and text categorization. The competition deadline: 07/06/09 - just one month... ...and a few documents: “just” 14,428! Therefore, two imperatives: 1 be (not only computationally) fast Chiara Basile (University of Bologna) Plagiarism detection in three steps San Sebastián, 10/09/2009 2 / 12

Introduction Once upon a time... 03/05/09 A group of mathematicians from the Universities of Bologna and Rome La Sapienza gets to know of the Plagiarism Competition and decides to try some preliminary experiments on the external plagiarism corpus using methods developed for different tasks, like authorship recognition and text categorization. The competition deadline: 07/06/09 - just one month... ...and a few documents: “just” 14,428! Therefore, two imperatives: 1 be (not only computationally) fast 2 use heuristics Chiara Basile (University of Bologna) Plagiarism detection in three steps San Sebastián, 10/09/2009 2 / 12

Introduction Where do we come from? Various problems of classification and clustering of symbolic sequences (authorship attribution, classification of biological or genetic sequences, ...) Chiara Basile (University of Bologna) Plagiarism detection in three steps San Sebastián, 10/09/2009 3 / 12

Introduction Where do we come from? Various problems of classification and clustering of symbolic sequences (authorship attribution, classification of biological or genetic sequences, ...) The Gramsci Project C. Basile, D. Benedetto, E. Caglioti, M. Degli Esposti An example of mathematical authorship attribution Journal of Mathematical Physics 49 , 125211 (2008). Chiara Basile (University of Bologna) Plagiarism detection in three steps San Sebastián, 10/09/2009 3 / 12

Introduction Where do we come from? Various problems of classification and clustering of symbolic sequences (authorship attribution, classification of biological or genetic sequences, ...) faced using ideas coming from Information Theory, Dynamical Systems, Statistical Mechanics... Chiara Basile (University of Bologna) Plagiarism detection in three steps San Sebastián, 10/09/2009 3 / 12

Introduction Where do we come from? Various problems of classification and clustering of symbolic sequences (authorship attribution, classification of biological or genetic sequences, ...) faced using ideas coming from Information Theory, Dynamical Systems, Statistical Mechanics... and usually defining some similarity metric(s) to estimate the “distance” between couples of sequences. Chiara Basile (University of Bologna) Plagiarism detection in three steps San Sebastián, 10/09/2009 3 / 12

Introduction Where do we come from? Various problems of classification and clustering of symbolic sequences (authorship attribution, classification of biological or genetic sequences, ...) faced using ideas coming from Information Theory, Dynamical Systems, Statistical Mechanics... and usually defining some similarity metric(s) to estimate the “distance” between couples of sequences. Given two texts x , y their n -gram distance is: „ f x ( ω ) − f y ( ω ) « 2 1 d n ( x , y ) := X | D n ( x ) | + | D n ( y ) | f x ( ω ) + f y ( ω ) ω ∈ D n ( x ) ∪ D n ( y ) where: ◮ f x ( ω ) = frequency of the (character) n − gram ω in x ; ◮ D n ( x ) = set of all the n − grams with non-zero frequency in x . Chiara Basile (University of Bologna) Plagiarism detection in three steps San Sebastián, 10/09/2009 3 / 12

A plagiarism detection procedure in three steps: selection, matches - PowerPoint PPT Presentation

A plagiarism detection procedure in three steps: selection, matches and squares Chiara Basile - basile@dm.unibo.it Mathematics Department University of Bologna, Italy PAN09 Workshop, San Sebastin - Donostia, 10/09/2009 Joint work

07.01.2011 Topics Plagiarism Detection Software 2010 Plagiarism Plagiarism Detection

WHAT IS PLAGIARISM? According to plagiarism.org, following to be plagiarism: To submit

Instructor-Centric Source Code Plagiarism Detection and Plagiarism Corpus Jonathan Y. H. Poon,

Intrinsic Plagiarism Detection Intrinsic Plagiarism Detection Using Character n gram Profiles

plagiarism detection system Andrzej Sobecki, Marcin Kpa IKC 2017 Plagiarism detection problem

Stylometry in plagiarism detection and author profiling Paolo Rosso PRHLT Research Center

A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection Introduction

Who idea is it? Acknowledging and building on other work, or just plain plagiarism. Allison Mann

Plagiarism Detection in Open Access Publications Jens Brandt, Martin Gutbrod, Oliver Wellnitz,

Uncovering Plagiarism, Authorship, and Social Software Misuse PAN 2011 Results [pan.webis.de]

Towards the Exploitation of Statistical Language Models for Plagiarism Detection with Reference

External Plagiarism Detection using Information Retrieval and Sequence Alignment Rao Muhammad

INTRINSIC PLAGIARISM DETECTION PAN 2011 @ CLEF USING CHARACTER TRIGRAM DISTANCE SCORES U N D E

Whose idea is it? Acknowledging and building on other work, or just plain plagiarism? Lina Qiu,

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Raid On Code Pirate - A Plagiarism Detection System Supervisor Project Members Mr. Daya Sagar

A First Gaia look at the inner halo* Giuliano Iorio DIFA, University of Bologna IoA, Cambridge

Multivariate Option Pricing Using Copulae Carole Bernard (University of Waterloo) & Claudia

Collaborative Signal Processing for Energy-Efficient Self-Organizing Wireless Sensor Network

PACE April 22/23, 2013, Bologna PACE basic information 4 years 1/1/13 - 31/12/16

Spinor propagator in the worldline approach James P. Edwards ifm.umich.mx/ jedwards INFN

Cascadic Multilevel Methods for Cascadic Multilevel Methods for Large-Scale Ill-Posed Problems

Looking at 5G and Beyond from the Land of Guglielmo Marconi Prof. Giovanni E. Corazza

MAORY Multiconjugate Adaptive Optics RelaY for E-ELT PAOLO CILIEGI on behalf of the MAORY team

Sambuz

Useful Links

Newsletter

Mail Us

A plagiarism detection procedure in three steps: selection, matches - PowerPoint PPT Presentation

A plagiarism detection procedure in three steps: selection, matches and squares Chiara Basile - basile@dm.unibo.it Mathematics Department University of Bologna, Italy PAN09 Workshop, San Sebastin - Donostia, 10/09/2009 Joint work

07.01.2011 Topics Plagiarism Detection Software 2010 Plagiarism Plagiarism Detection

WHAT IS PLAGIARISM? According to plagiarism.org, following to be plagiarism: To submit

Instructor-Centric Source Code Plagiarism Detection and Plagiarism Corpus Jonathan Y. H. Poon,

Intrinsic Plagiarism Detection Intrinsic Plagiarism Detection Using Character n gram Profiles

plagiarism detection system Andrzej Sobecki, Marcin Kpa IKC 2017 Plagiarism detection problem

Stylometry in plagiarism detection and author profiling Paolo Rosso PRHLT Research Center

A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection Introduction

Who idea is it? Acknowledging and building on other work, or just plain plagiarism. Allison Mann

Plagiarism Detection in Open Access Publications Jens Brandt, Martin Gutbrod, Oliver Wellnitz,

Uncovering Plagiarism, Authorship, and Social Software Misuse PAN 2011 Results [pan.webis.de]

Towards the Exploitation of Statistical Language Models for Plagiarism Detection with Reference

External Plagiarism Detection using Information Retrieval and Sequence Alignment Rao Muhammad

INTRINSIC PLAGIARISM DETECTION PAN 2011 @ CLEF USING CHARACTER TRIGRAM DISTANCE SCORES U N D E

Whose idea is it? Acknowledging and building on other work, or just plain plagiarism? Lina Qiu,

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Raid On Code Pirate - A Plagiarism Detection System Supervisor Project Members Mr. Daya Sagar

A First Gaia look at the inner halo* Giuliano Iorio DIFA, University of Bologna IoA, Cambridge

Multivariate Option Pricing Using Copulae Carole Bernard (University of Waterloo) &amp; Claudia

Collaborative Signal Processing for Energy-Efficient Self-Organizing Wireless Sensor Network

PACE April 22/23, 2013, Bologna PACE basic information 4 years 1/1/13 - 31/12/16

Spinor propagator in the worldline approach James P. Edwards ifm.umich.mx/ jedwards INFN

Cascadic Multilevel Methods for Cascadic Multilevel Methods for Large-Scale Ill-Posed Problems

Looking at 5G and Beyond from the Land of Guglielmo Marconi Prof. Giovanni E. Corazza

MAORY Multiconjugate Adaptive Optics RelaY for E-ELT PAOLO CILIEGI on behalf of the MAORY team

Sambuz

Useful Links

Newsletter

Mail Us

Multivariate Option Pricing Using Copulae Carole Bernard (University of Waterloo) & Claudia