paragraph clustering for intrinsic plagiarism detection
play

Paragraph Clustering for Intrinsic Plagiarism Detection Using a - PowerPoint PPT Presentation

Paragraph Clustering for Intrinsic Plagiarism Detection Using a Stylistic Vector Space Model with Extrinsic Features Julian Brooke and Graeme Hirst University of T oronto Model Based on work on detecting stylistic inconsistency In


  1. Paragraph Clustering for Intrinsic Plagiarism Detection Using a Stylistic Vector Space Model with Extrinsic Features Julian Brooke and Graeme Hirst University of T oronto

  2. Model  Based on work on detecting stylistic inconsistency ◦ In particular, voice segmentation in poetry  Extrinsic features from lexicons and larger corpora ◦ Using LSA to derive a stylistic lexicon  Cluster by maximizing distance between authors ◦ Correct for imbalance in span size using expected difference between sums of random variables

  3. Results  Development evaluation ◦ Method works well overall ◦ Correcting for span difference is important  But poor performance in PAN multi- author evaluation on mixed novels ◦ Model is too conservative ◦ Few stylistic differences between novels ◦ Major stylistic differences between dialogue and narration, which confuses model

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend