determine the true author of anonymous documents
play

Determine the true author of anonymous documents 1 /31 Team Worden - PowerPoint PPT Presentation

Determine the true author of anonymous documents 1 /31 Team Worden Marc Barrowclift Stakeholder - Rachel Greenstadt Travis Dutko Advisor - Jeff Salvage Corey Everitt Jiakang Jin Eric Nordstrom Ivan "Frankie"


  1. Determine the true author of anonymous documents 1 /31

  2. Team Worden • Marc Barrowclift Stakeholder - Rachel Greenstadt • Travis Dutko Advisor - Jeff Salvage • Corey Everitt • Jiakang Jin • Eric Nordstrom • Ivan "Frankie" Orrego 2 /31

  3. What Is Worden? • Protect & identify authors of anonymous works • Real world example - JK Rowling’s Cuckoo’s Calling 3 /31

  4. What Is Worden? • Our project - Dramatically restructure JStylo • Why care? 4 /31

  5. Live Demo IDENTIFY ANONYMOUS DOCUMENT 5 /31

  6. UI Design & User studies Average Domain Expert Intermediate 6 /31

  7. Original Client / Server Architecture • IKVM.Net • Transpile .jar Dependencies into C# classes • Rapid prototyping due to familiarity 7 /31

  8. Final Client / Server Architecture • Data prep and packaging done on client side to meet deadlines • Angular.js MVC Client • Spring MVC Server 8 /31

  9. Client Architecture • 2 way data bound • Allows proper HTTP abstraction • Handles DOM manipulation • Control over information flow • Highly modularized 9 /31

  10. Server Architecture • Only utilizing the “C” in MVC 1. Picks up HTTP traffic 2. Repackages it 3. Pipes to JStylo 4. Returns a JSON response to controller • SaaS (Stylometry as a Service) - Independent of web browser 10 /31

  11. Backend System Architecture • Feature Extraction Engine - Reduces documents to data • Machine Learning Engine - Interprets data 11 /31

  12. Backend System Architecture • Feature Extraction Engine - Convert raw words into numeric data - Tools: JGAAP, Stanford POS tagger. 12 /31

  13. Backend System Architecture • Feature Extraction Engine Example: “Sell as a great software engineering projct” - Convert raw words into numeric data Word Bigrams • - Tools: JGAAP, Stanford POS tagger. Sell as great software as a software engineering a great engineering projct Misspellings • project → projct: count = 1 Letter Bigrams • in: 2 ro: 1 ng: 2 ll: 1 … se: 1 ea: 1 13 /31

  14. Backend System Architecture • Machine Learning Engine - Interprets/Classifies data - Tools: Weka, apache spark 14 /31

  15. Backend System Architecture Image Source: Wikipedia • Machine Learning Engine - Interprets/Classifies data - Tools: Weka, apache spark 15 /31

  16. Design & Construction • Open source development - Builds upon work from dozens of research students • Apache Spark machine learning library added • Refactoring - Separate each component into its own, independent module - Decouple third party library, WEKA 16 /31

  17. Before 17 /31

  18. After 18 /31

  19. Refactoring Progress 19 /31

  20. Design & Construction Cont. • Design Patterns - Creational: Builder (API), Singleton - Structural: Adapter (Machine Learning integration), Decorator (Feature Extraction Engine), Facade (API) • Testing - 76% of code touched was covered 20 /31

  21. Machine Learning Adapter 21 /31

  22. Feature Extraction Decorator 22 /31

  23. Annotated Demo TEST YOUR OWN DOCUMENT 23 /31

  24. 24 /31

  25. 25 /31

  26. 26 /31

  27. 27 /31

  28. 28 /31

  29. Impact • Enhance understanding of privacy vulnerabilities in a surveillance world • Education is enough to combat naïve attacks 1 • Evolving JStylo into the next stage of its lifecycle 1. https://www.princeton.edu/~aylinc/papers/Aylin_PETS12_anonymouth.pdf 29 /31

  30. Software Evolution • Increased Ease of Extension - Decoupling / Modularization - Industry Standard Design Pattern - New Machine Learning libraries - Better dependency management / updatability • New methods of processing - Web front end - Cluster-computing - JSON API 1. https://www.princeton.edu/~aylinc/papers/Aylin_PETS12_anonymouth.pdf 30 /31

  31. Software Evolution Cont. • Future Work - More machine learning libraries - Feature Extraction overhaul - Verification – solving new problems 1. https://www.princeton.edu/~aylinc/papers/Aylin_PETS12_anonymouth.pdf 31 /31

  32. A Special Thanks To Our Sponser For sponsoring Worden’s server

  33. Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend