GLAD: Groningen Lightweight Authorship Detection PAN, Authorship - PowerPoint PPT Presentation

GLAD: Groningen Lightweight Authorship Detection PAN, Authorship verification, 2015 Manuela Hürlinmann, Benno Weck, Esther van den Berg, Simon Š uster, Malvina Nissim

The challenge given: a set of Known documents written by the same Author A_K , given: one Unknown document written by an unknown Author A_U, task: determine whether A_U = A_K

How can we recognise different authors?

How can we recognise different authors? Unusual word choice? Shorter sentences? More complex grammar?

How can we recognise different authors? individual_vector(feat1, feat2…) individual_vector(feat1, feat2…) individual_vector(feat1, feat2…)

How can we then differentiate between authors?

How can we then differentiate between authors? Different word choice? Different sentence length? Different grammar?

How can we then differentiate between authors? similarity_vector(feat1, feat2, …)

Our approach • machine learning approach training on PAN (2015) data • using SVM to do two-class classification task • a set of features • feature ablation studies to tune the system to each different language

The core aim - A lightweight system!

The aim Input in training training training any instance instance instance language

The aim Input in training training training any instance instance instance language Features should be easy to extract model

The aim Input in training training training any instance instance instance language Features should be easy to extract model Training & Testing time should be fast prediction

Our features

Our features similarity_vector(entropy_of_known, visual_features, …)

Our features To determine relevance: grouping

Our features Individual Individual Joint - = Vector_K(feat1,feat2) Vector_U(feat1,feat2) Vector_Joint(feat1,feat2)

Comparing features

Comparing features Results of ablation & single-feature experiments: Helpful features

Side note: • Punctuation Visual features • Line ending • Letter case • Ling length • Block size

Side note: • Punctuation Visual features • Line ending • Letter case • Ling length • Block size Con Not a • characteristic of the author Not a • linguistic feature

    Side note: • Punctuation Visual features • Line ending • Letter case • Ling length • Block size Pro Con Not a • Can be • characteristic author- “Pa-pa, pa-pa, pa-pa!   of the author specific for Not a • some genres Here, stop her. She’ll fall down. linguistic If it works… • Here, turn around. Walk this way. feature Ma-ma, ma-ma, ma-ma;   Oh, I think you are a darling. Mer-ry Christ-mas! Mer-ry Christmas.”

Comparing features Results of ablation & single-feature experiments: Harmful features

Comparing features Results of ablation & single-feature experiments: Features that are harmful, helpful, or helpful-depending-on-the-language

Comparing features Results of ablation & single-feature experiments: Differences are subtle

Resulting groups

Results

Results • Simple similarity features work

Results • Simple similarity features work in unison

Results • Simple similarity features work in unison independent of language (except greek)

Results • Simple similarity features work in unison independent of language (except greek) • System works fast (runtime av. 1 minute)

Final conclusion GLAD … is a light and fast language- independent system … allows language adaptation done via feature selection … involves innovative visual features which appear useful (especially for English data) and could be investigated further

GLAD: Groningen Lightweight Authorship Detection PAN, Authorship - PowerPoint PPT Presentation

GLAD: Groningen Lightweight Authorship Detection PAN, Authorship verification, 2015 Manuela Hrlinmann, Benno Weck, Esther van den Berg, Simon uster, Malvina Nissim The challenge given: a set of Known documents written by the same Author

Authorship & Publication August 4, 2009 Authorship Publication Authorship Each author

Authorship: why not just toss a coin? Benefits and responsibilities of authorship Tactics

Kernel Methods and String Kernels for Authorship Analysis Marius Popescu 1 Cristian Grozea 2 1

A Mathematical Study A Mathematical Study of Authorship Attribution of Authorship Attribution

Nieuwe middelen, nieuwe beloftes? Adriaan Voors, UMCG, Groningen University Medical Center

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Glad y Glad you came ou came Welcome to the Hunter elcome to the Hunters Cr s Crest est

Welcome! Were glad youre here. Welcome! Were glad youre here. Your audio is

Welcome! Were glad youre here. Welcome! Were glad youre here. Your audio is

The lightweight beam for Heavyweight applications The impact of this lightweight beam concept

The lightweight beam for Heavyweight applications The impact of this lightweight steel beam will

Its time to Think Lightweight! www.thinklightweight.com TO D A Y S TO P IC S 1.

Lightweight Cryptography and and RFID Security Svetla Nikova COSIC KUL COSIC, KULeuven and

Managing Research Integrity during the COVID-19 Emergency Authorship agreements Abigail Norris

A multitude of linguistically- rich features for authorship attribution Ludovic Tanguy, Assaf

Leveraging discourse information effectively for authorship attribution Elisa Ferracane, Su

Maggy - Open-Source Asynchronous Distributed Hyperparameter Optimization Based on Apache Spark

Laser drilling of a Copper Mesh Vincenzo Berardi U.O.S. Bari, Italy Dip. Interuniversitario di

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

SVS AVF Clinical Practice Guidelines Venous Ulcer SVS AVF

SIIM 2018 Cardiovascular Informatics: Imaging and Workflows Session Co-chairs: Bruce Bray and

Procdures dAblation et NACO Dr Walid AMARA GHI Le Raincy-Montfermeil Relations

Wit ith Im Image Clu lustering Jianwei Yang Devi Parikh Dhruv Batra Vir irgin inia ia

Invasive Fetal Therapy Stephen R. Carr Francois I. Luks Fetal Therapy Definitions: Fetal

GLAD: Groningen Lightweight Authorship Detection PAN, Authorship - PowerPoint PPT Presentation

GLAD: Groningen Lightweight Authorship Detection PAN, Authorship verification, 2015 Manuela Hrlinmann, Benno Weck, Esther van den Berg, Simon uster, Malvina Nissim The challenge given: a set of Known documents written by the same Author

Authorship &amp; Publication August 4, 2009 Authorship Publication Authorship Each author

Authorship: why not just toss a coin? Benefits and responsibilities of authorship Tactics

Kernel Methods and String Kernels for Authorship Analysis Marius Popescu 1 Cristian Grozea 2 1

A Mathematical Study A Mathematical Study of Authorship Attribution of Authorship Attribution

Nieuwe middelen, nieuwe beloftes? Adriaan Voors, UMCG, Groningen University Medical Center

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Glad y Glad you came ou came Welcome to the Hunter elcome to the Hunters Cr s Crest est

Welcome! Were glad youre here. Welcome! Were glad youre here. Your audio is

Welcome! Were glad youre here. Welcome! Were glad youre here. Your audio is

The lightweight beam for Heavyweight applications The impact of this lightweight beam concept

The lightweight beam for Heavyweight applications The impact of this lightweight steel beam will

Its time to Think Lightweight! www.thinklightweight.com TO D A Y S TO P IC S 1.

Lightweight Cryptography and and RFID Security Svetla Nikova COSIC KUL COSIC, KULeuven and

Managing Research Integrity during the COVID-19 Emergency Authorship agreements Abigail Norris

A multitude of linguistically- rich features for authorship attribution Ludovic Tanguy, Assaf

Leveraging discourse information effectively for authorship attribution Elisa Ferracane, Su

Maggy - Open-Source Asynchronous Distributed Hyperparameter Optimization Based on Apache Spark

Laser drilling of a Copper Mesh Vincenzo Berardi U.O.S. Bari, Italy Dip. Interuniversitario di

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

SVS AVF Clinical Practice Guidelines Venous Ulcer SVS AVF

SIIM 2018 Cardiovascular Informatics: Imaging and Workflows Session Co-chairs: Bruce Bray and

Procdures dAblation et NACO Dr Walid AMARA GHI Le Raincy-Montfermeil Relations

Wit ith Im Image Clu lustering Jianwei Yang Devi Parikh Dhruv Batra Vir irgin inia ia

Invasive Fetal Therapy Stephen R. Carr Francois I. Luks Fetal Therapy Definitions: Fetal

Authorship & Publication August 4, 2009 Authorship Publication Authorship Each author