bootstrapped authorship attribution in compression space
play

Bootstrapped Authorship Attribution in Compression Space Ramon de - PowerPoint PPT Presentation

Bootstrapped Authorship Attribution in Compression Space Ramon de Graaf Leiden Institute of Advanced Computer Science Cor Veenman Digital Technology and Biometrics Department Bootstrapped Authorship Attribution in Compression Space de Graaff


  1. Bootstrapped Authorship Attribution in Compression Space Ramon de Graaf Leiden Institute of Advanced Computer Science Cor Veenman Digital Technology and Biometrics Department Bootstrapped Authorship Attribution in Compression Space de Graaff & Veenman - PAN 2012 Poster Preview

  2. PAN Authorship Attribution Problem • Multi-class statistical pattern recognition problem – Proper feature representation • Dataset properties – Very few training document samples – Low number of authors – Large documents • Performance measure – Average precision, recall, and F1 score over all authors Bootstrapped Authorship Attribution in Compression Space de Graaff & Veenman - PAN 2012 Poster Preview

  3. Approach • Low dimensional feature representation – Compression Distances to Prototypes (CDP) >Compression distance measure (CDM) >Compressor: Prediction by Partial Matching (PPM) • Prototypes required to compute distance to – Draw one from each training document without replacement • To learn a statistical model, more samples required – Bootstrapping from the large training document Bootstrapped Authorship Attribution in Compression Space de Graaff & Veenman - PAN 2012 Poster Preview

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend