automated detection of plagiarism based on whitespace and
play

Automated Detection of Plagiarism based on Whitespace and History - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich Automated Detection of Plagiarism based on Whitespace and History Markus Ongyerth December 4, 2017 Chair of Network Architectures and


  1. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Automated Detection of Plagiarism based on Whitespace and History Markus Ongyerth December 4, 2017 Chair of Network Architectures and Services Department of Informatics Technical University of Munich

  2. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Contents The Idea Implementation Evaluation Further Work M. Ongyerth – gitplag 2

  3. Chair of Network Architectures and Services Department of Informatics Technical University of Munich What we want to find struct icmp6_neighbor_solicit ␣ { struct ether_header ehdr; struct ipv6_hdr iphdr; struct neighbor_solicit_payload pay; } __attribute__ (( packed )); if(buffer[offset + 6] != (byte) 0b00111010 ){ return false; } if(buffer[offset + 7] != (byte) 0b11111111 ){ return false; } M. Ongyerth – gitplag 3

  4. Chair of Network Architectures and Services Department of Informatics Technical University of Munich What we want to find struct icmp6_neighbor_solicit ␣ { struct ether_header ehdr; struct ipv6_hdr iphdr; struct neighbor_solicit_payload pay; } __attribute__ (( packed )); if(buffer[offset + 6] != (byte) 0b00111010 ){ return false; } if(buffer[offset + 7] != (byte) 0b11111111 ){ return false; } M. Ongyerth – gitplag 3

  5. Chair of Network Architectures and Services Department of Informatics Technical University of Munich The GRNVS dataset 2016 2017 Assignment 3 4 2 3 Submissions 236 199 355 223 Avg Commits 29 15 27 42 Cases of plagiarism 8 4 4 1 Automatic tests triggered over git M. Ongyerth – gitplag 4

  6. Chair of Network Architectures and Services Department of Informatics Technical University of Munich The GRNVS dataset 2016 2017 Assignment 3 4 2 3 Submissions 236 199 355 223 Avg Commits 29 15 27 42 Cases of plagiarism 8 4 4 1 Automatic tests triggered over git M. Ongyerth – gitplag 4

  7. Chair of Network Architectures and Services Department of Informatics Technical University of Munich In the past • Checking for plagiarism with MOSS • Hand check the results • Search for “strong” evidence by hand M. Ongyerth – gitplag 5

  8. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Our two approaches Whitespace errors Identifier • Weird/broken indention • Unintuitive names • Multiple • Copies of typos • Trailing whitespace • ^ → ␣ → • 0b1000001 • → struct␣␣struct; • numericToTextFormat • → struct␣struct;␣$ • java.sql.Time M. Ongyerth – gitplag 6

  9. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Version control history • Perpetrator try to hide • They “destroy” evidence • The ( Git -) history preserves evidence M. Ongyerth – gitplag 7

  10. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Implementation 1. Read and tokenize submissions 2. Filter to viable tokens 3. Compare submission pairwise 4. Generate report / provide interactive interface M. Ongyerth – gitplag 8

  11. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Differences to other systems • Whitespace ⇐ usually ignored • Identifiers • History M. Ongyerth – gitplag 9

  12. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Differences to other systems • Whitespace • Identifiers ⇐ usually ignored • History M. Ongyerth – gitplag 9

  13. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Differences to other systems • Whitespace • Identifiers • History ⇐ usually not available M. Ongyerth – gitplag 9

  14. Chair of Network Architectures and Services Department of Informatics Technical University of Munich ROC graphs 1 0.8 Better than guessing Sensitivity 0.6 0.4 Worse than guessing 0.2 0 0 0.2 0.4 0.6 0.8 1 FPF M. Ongyerth – gitplag 10

  15. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Detection rate (2016 whitespace with git) 1 1 0.8 0.8 30,2 Sensitivity Sensitivity 0.6 0.6 20,2 15,2 0.4 0.4 5,2 All All 10,2 Viability=5 Viability=5 0.2 0.2 Viability=15 Viability=15 Viability=30 Viability=30 5,2 0 0 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 FPF FPF · 10 − 2 · 10 − 2 (a) Assignment 2 (b) Assignment 3 M. Ongyerth – gitplag 11

  16. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Detection rate (2017 identifier) 1 1 15 5 30 15 5 Identifier Identifier With Git With Git 0.8 0.8 20 Sensitivity Sensitivity 0.6 0.6 10 5 30 0.4 0.4 30 0.2 0.2 30 5 0 0 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 FPF FPF · 10 − 3 · 10 − 3 (c) Assignment 2 (d) Assignment 3 M. Ongyerth – gitplag 12

  17. Chair of Network Architectures and Services Department of Informatics Technical University of Munich It’s not perfect • Shared external file • Students worked together • Incomplete file filter M. Ongyerth – gitplag 13

  18. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Time requirements Assignment Git Whitespace Identifier No 8 s 10 s 3 Yes 18 s 24 s No 3 s 4 s 4 Yes 6 s 9 s M. Ongyerth – gitplag 14

  19. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Further work • Improve usable file detection • Create and evaluate other tokenizing mechanisms • Some implementation details M. Ongyerth – gitplag 15

  20. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Related work • Moss • Gitplag • (Docoloc) • Measuring Whitespace Pattern Sequences as an Indication of Pla- giarism (Baer et. Al) M. Ongyerth – gitplag 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend