A Mathematical Study A Mathematical Study
- f Authorship Attribution
- f Authorship Attribution
A Mathematical Study A Mathematical Study of Authorship Attribution - - PowerPoint PPT Presentation
Yang Wang Yang Wang Department of Mathematics Department of Mathematics Michigan State University Michigan State University A Mathematical Study A Mathematical Study of Authorship Attribution of Authorship Attribution Who Wrote Who Wrote
8/11/13 Amazon.com: Karen's review of The Cuckoo's Calling [LegacyTitleID: 21153... www.amazon.com/review/R8GYN2HLDXVFB/ref=cm_cr_dp_title?ie=UTF8&ASIN=0316206849&nodeID=283155&store=books 1/3
By Karen
Help other customers find the most helpful reviews
Was this review helpful to you?
Report abuse | Permalink
Track comments by e-mail
Tracked by 11 customers
Sort: Oldest first | Newest first
Reply to this post Reply to this post Reply to this post
Customer Review
1,775 of 1,918 people found the following review helpful Great Read!, July 7, 2013
This review is from: The Cuckoo's Calling [LegacyTitleID: 21153809]
This book is so well written that I suspect that some years down the road we will hear the author's name is a pseudonym of some famous
atmosphere in the gatherings. The Audible version had great accents. It is a wonderful mystery with a surprise ending, and I look forward to more by the same author. Comments
Showing 1-10 of 92 posts in this discussion
Initial post: Jul 13, 2013 4:00:08 P M P DTLena Ricken says: You're right. It was just revealed that J.K. Rowling is behind this book. "I had hoped to keep this secret a little longer," JKR said, "because being Robert Galbraith has been such a liberating experience. It has been wonderful to publish without hype or expectation and pure pleasure to get feedback under a different name."
P ermalink | Report abuse | Ignore this customer 112 of 121 people think this post adds to the discussion. Do you? P osted on Jul 13, 2013 4:00:44 P M P DTheartJESS says: http://www.hypable.com/2013/07/13/jk-rowling-ghost-writer-the-cuckoos-calling/ Looks like you were right!
P ermalink | Report abuse | Ignore this customer 39 of 47 people think this post adds to the discussion. Do you? P osted on Jul 13, 2013 5:11:07 P M P DTI have nothing useful to add but I'm really amused you mentioned this was probably a pseudonym just a few days before it was revealed this was indeed written by one of the best selling authors of the last decade :P
P ermalink | Report abuse | Ignore this customer 94 of 98 people think this post adds to the discussion. Do you? P osted on Jul 13, 2013 5:37:30 P M P DTKenneth Yates says: YOU WON
P ermalink | Report abuse | Ignore this customer 5 star: (650) 4 star: (330) 3 star: (140) 2 star: (74) 1 star: (82)Item Reviewer Review Details The Cuckoo's Calling
(1,276 customer reviews)$26.00 $15.19 98 used & new available from $14.24
Karen Location: San Jose, CA, United States Top Reviewer Ranking: 2,412 See all 22 reviews
Shop by
Department
Search
All
Go
Hello, Yang
Your Account
Your
Prime Cart
Wish
List
Yang's Amazon.com Today's Deals Gift Cards Sell Help
Two experts were sent by the Sunday Times sample books by P.D. James, Val McDermid, Ruth Rendell and
comparison.
I wish you would do this : run your eye over any part of those
and without paying any attention to the meaning. Then do the same with the Epistle to the Hebrews, and try to balance in your own mind the question whether the latter does not deal in longer words than the former. It has always run in my head that a little expenditure of money would settle questions of authorship in this way. The best mode of explaining what I would try will be to put down the results I should expect as if I had tried them. Count a large number of words in Herodotus say all the first book and count all the letters ; divide the second numbers by the first, giving the average number of letters to a word in that look. Do the same with the second book. I should expect a very close approximation. If Book I. gave 5 - 624 letters per word, it would not surprise me if Book IT. gave 5*619. I judge by other things.
8/11/13 Google Pressure Cookers and Backpacks, Get a Visit from the Feds - Yahoo! News news.yahoo.com/google-pressure-cookers-backpacks-visit-feds-140900667.html 1/3 Google Pressure Cookers and Backpacks, Get a Visit from the Feds The Atlantic Wire Funeral held for Pa. boy who was parents' best man Associated Press Jolie, Marvel superheroes bewitch Disney expo Associated Press Hawaii schools struggle to keep new teachers Associated Press Attend the K-12 2013/14 School Year Online in MI Plane in Connecticut crash was upside down, official says Reuters Crazy seized contraband (24
photos)
Usher to keep custody of his 2 young sons Associated Press Raonic beat fellow Canadian Pospisil in Montreal Associated Press 59 mins ago Hundreds search Idaho wilderness for missing teen Associated Press FBI fans through wilderness in search for teen Associated Press Reality TV meets real world, 'Mountain Man' style Associated Press Heavy Rains, Flood Threats Loom Over 15 States in US (video) ABC New s
VideosGreinke, Gonzalez lead Dodgers over tricky Rays Associated Press Netanyahu tells U.S. mediator Palestinians inciting against Israel Reuters Car in Calif. missing teen case found in Idaho Associated Press Vick, Brady sharp and Patriots beat Eagles Associated Press President Reduces Amount Homeowners Owe Assad sends air force to prevent rebel advances in home province Reuters After a Whale Trainer Is Injured, Man Who Videotaped It Stands by Marineland Takepart.com Endangered species thrive on US military ranges Associated Press 'Ocean's 16' Powerball Winner: 'I'm Still Up in the Clouds' Good Morning
America AdChoices K12 Online School Sponsored AdChoices Low erMyBills.com SponsoredRecommended for You
Like Dislike
Google Pressure Cookers and Backpacks, Get a Visit from the Feds
Philip Bump August 1, 2013Michele Catalano was looking for information online about pressure cookers. Her husband, in the same time frame, was Googling backpacks. Wednesday morning, six men from a joint terrorism task force showed up at their house to see if they were terrorists. Which begs the question: How'd the government know what they were Googling?
RELATED: We'll Never Know What Google's
Doing With the NSA Catalano (who is a professional writer) describes the tension of that visit. [T]hey were peppering my husband with questions. Where is he from? Where are his parents from? They asked about me, where was I, where do I work, where do my parents live. Do you have any bombs, they asked. Do you own a pressure cooker? My husband said no, but we have a rice
make quinoa. What the hell is quinoa, they asked. ... Have you ever looked up how to make a pressure cooker bomb? My husband, ever the oppositional kind, asked them if they themselves weren’t curious as to how a pressure cooker bomb works, if they ever looked it up. Two of them admitted they did. The men identified themselves as members of the "joint terrorism task force." The composition of such task forces depend on the region of the country, but, as we outlined after the Boston bombings, include a variety of federal agencies. Among them: the FBI and Homeland Security.
RELATED: PRISM Companies Start Denying Knowledge of the NSA Data Collection
Ever since details of the NSA's surveillance infrastructure were leaked by Edward Snowden, the agency has been insistent on the boundaries of the information it collects. It is not, by law, allowed to spy on Americans — although there are exceptions of which it takes
information from Americans unless those Americans are connected to terror suspects by no more than two other people. It collects metadata on phone calls made by Americans, but reportedly stopped collecting metadata on Americans' internet use in 2011. So how, then, would the government know what Catalano and her husband were searching for?
RELATED: Which Tech Company Does the NSA Use Most?
It's possible that one of the two of them is tangentially linked to a foreign terror suspect, allowing the government to review their internet activity. After all, that "no more than two
Google Pressure Cookers and Backpacks, Get a Visit from the Feds
Home U.S. World Politics Tech Science Health Odd News Opinion Local Dear Abby Comics ABC News Y! News Originals
RecommendedNational Football Le… Canada The Walt Disney C… Microsoft Los Angeles Dodgers
Mail Search News Search Web
Sign In
Home Mail News Sports Finance Weather Games Groups Answers Flickr More
These are just a few cases I These are just a few cases I’ ’m familiar with. There are m familiar with. There are numerous other interesting cases. numerous other interesting cases.
8/11/13 Primary Colors (novel) - Wikipedia, the free encyclopedia en.wikipedia.org/wiki/Primary_Colors_(novel) 2/6
Unmasking of "Anonymous"
An early reviewer opined that the author wished to remain unknown because "Anonymity makes truthfulness much easier".[4] Later commentators called the publishing of the book under an anonymous identity an effective marketing strategy that produced more publicity for the book, and thus more sales, without calling into question the author's actual inside knowledge.[2] Several people, including former Clinton speechwriter David Kusnet and, later, Vassar professor Donald Foster, correctly identified Klein as the novel's author, based on a literary analysis of the book and Klein's previous writing. Klein denied writing the book and publicly condemned Foster.[5][6] Klein denied authorship again in Newsweek, speculating that another writer wrote it. Washington Post Style editor David von Drehle, in an interview, asked Klein if he was willing to stake his journalistic credibility on his denial, to which Klein agreed.[7] On July 17, 1996, after The Washington Post published the results of a handwriting analysis of notes made on an early manuscript of the book, Klein finally admitted that he was "Anonymous".[8]
Plot summary
The book begins as an idealistic former congressional worker, Henry Burton, joins the presidential campaign of Southern governor Jack Stanton, a thinly disguised stand-in for Bill Clinton.[4] The plot then follows the primary election calendar beginning in New Hampshire where Stanton's affair with Cashmere, his wife's hairdresser, and his participation in a Vietnam War era protest come to light and threaten to derail his presidential prospects.[4] In Florida, Stanton revives his campaign by disingenuously portraying his Democratic opponent as insufficiently pro-Israel and as a weak supporter
policy wonk who talks too long, eats too much and is overly flirtatious toward women.[4] Stanton is also revealed to be insincere in his beliefs, saying whatever will help him to win.[4] Matters finally come to a head, and Burton is forced to choose between idealism and realism.
8/11/13 The Wrong Man - David Freed - The Atlantic www.theatlantic.com/magazine/archive/2010/05/the-wrong-man/308019/ 1/16
Politics Business Tech Entertainment Health Sexes National Global China Video Magazine Werner Herzog Ends Texting While Driving 'Ideological Fixation' Explains Obamacare Photos From a Battle With Fire The Strange Sexual Quirk of Filipino Seafarers
Special Reports In Focus Events E-books Newsletters Welcome to Holland James Fallows
PROFILE MAY 2010
DAVID FREED APR 13 2010, 9:00 AM ET
Tweet Tweet
45
3
THE FIRST ANTHRAX attacks came days after the jetliner assaults of September 11, 2001. Postmarked Trenton, New Jersey, and believed to have been sent from a mailbox near Princeton University, the initial mailings went to NBC News, the New York Post, and the Florida-based publisher of several supermarket tabloids, including The Sun and The National Enquirer. Three weeks later, two more envelopes containing anthrax arrived at the Senate offices
return address of a nonexistent “Greendale School” in Franklin Park, New
The letters accompanying the anthrax read like the work of a jihadist, suggesting that their author was an Arab extremist—or someone masquerading as one—yet also advised recipients to take antibiotics, implying that whoever had mailed them never really intended to harm anyone. But at least 17 people would fall ill and five would die—a photo editor at The Sun; two postal employees at a Washington, D.C., mail-processing center; a hospital stockroom clerk in Manhattan whose exposure to anthrax could never be fully explained; and a 94- year-old Connecticut widow whose mail apparently crossed paths with an anthrax letter somewhere in the labyrinth of the postal system. The attacks spawned a spate of hoax letters nationwide. Police were swamped with calls from citizens suddenly suspicious of their own mail.
The Wrong Man
In the fall of 2001, a nation reeling from the horror of 9/11 was rocked by a series of deadly anthrax attacks. As the pressure to find a culprit mounted, the FBI, abetted by the media, found one. The wrong one. This is the story of how federal authorities blew the biggest anti-terror investigation of the past decade—and nearly destroyed an innocent
VIDEO
An animated guide to the different energy sources that power our nation
WRITERS
James Fallows
Welcome to Holland AUG 10, 2013 247
Like Share Share
Melissa Golden/Redux
How Much Energy Does the U.S. Use?
More
It is far less developed than for English texts
Connoisseurship still dominates the investigation of authorship attribution, and it is at the center of some high authorship attribution, and it is at the center of some high profile authorship controversies. profile authorship controversies.
Stylometry analysis is more challenging than for English texts. texts.
English words form natural English words form natural “ “atoms atoms” ” for for stylometry stylometry analysis. But Chinese
characters are far less natural characters are far less natural “ “atoms atoms” ”. Each character by itself has too . Each character by itself has too many (often completely different) meanings. many (often completely different) meanings.
A mathematical A mathematical stylometric stylometric Study Study
Written by Cao Xueqin Xueqin around 1750 around 1750’ ’s s
One of China’ ’s Four Great Classical Novels s Four Great Classical Novels
Widely acknowledged as the greatest literary piece ever written in the history of Chinese literature. written in the history of Chinese literature.
First hand-
copied manuscript with 80 chapters began to circulate in 1759. circulate in 1759.
Printed version began to circulate in 1791. It was put together by Cheng together by Cheng Weiyuan Weiyuan and and Gao Gao E (Cheng E (Cheng-
Gao version). But it had 120 chapters. version). But it had 120 chapters.
Cheng-
Gao maintained that they obtained previously maintained that they obtained previously unknown manuscripts of Cao from various sources. unknown manuscripts of Cao from various sources.
Many scholars were skeptical of the last 40 chapters, and speculated that they were written by speculated that they were written by Gao Gao E. Some, such as
renowned scholar renowned scholar Hu Hu Shi, had called these chapters Shi, had called these chapters “ “fraud fraud” ” perpertrated perpertrated by by Gao Gao. .
Some scholars believe the last 40 chapters are inferior to the first 80 chapters, both in plot and in writing. the first 80 chapters, both in plot and in writing.
Some experts thought the fates of several characters in the end were inconsistent with what were foreshadowed. end were inconsistent with what were foreshadowed.
The authorship question was the main focus of “ “Redology Redology” ” for a long time. for a long time.
Redology relied almost exclusively on connoisseurship. relied almost exclusively on connoisseurship. Stylometry analysis has been very rare Stylometry analysis has been very rare ---
in fact pathetically sparse compared to the size of the pathetically sparse compared to the size of the Redology Redology literature. literature.
The few existing ones are a mixed bag in terms of quality, from reasonable to very flawed. from reasonable to very flawed.
None uses modern techniques such as machine learning theory. theory.
Cao (1985): analyzed the use of function characters in the book, and compared the first 40 chapters, middle 40 book, and compared the first 40 chapters, middle 40 chapters and last 40 chapters. chapters and last 40 chapters.
Zhang and Liu (1986): examined the use of characters
first 80 chapters. first 80 chapters.
Yu (1998): focused on the statistics of 5 characters and sentences ended in a particular way. sentences ended in a particular way.
Li (1987): statistical analysis of 47 function characters, and suggested that the last 40 chapters might be edited by suggested that the last 40 chapters might be edited by Gao Gao based on unfinished manuscripts of Cao. based on unfinished manuscripts of Cao.
Chan (1981): perhaps the best known and most extensive study, and Li d Li & Li (2006). & Li (2006).
Both studies broke the book down into three equal parts. A Frequency ency Vector for a selected group of characters was built for each par Vector for a selected group of characters was built for each part. The
correlations of these Frequency Vectors were computed. correlations of these Frequency Vectors were computed.
In Li & Li (2006), 47 characters were selected, and the pairwise pairwise correlation among the three Frequency Vectors. The authors correlation among the three Frequency Vectors. The authors concluded that they are all sufficiently correlated. concluded that they are all sufficiently correlated.
In Chan (1986), a fourth Frequency Vector from selected chapters of a
different book was added. By showing that part III was more clos different book was added. By showing that part III was more closely ely correlated to the first two parts than to the new book, the auth correlated to the first two parts than to the new book, the author drew
the one the one-
author conclusion.
The two studies supporting the one-
author hypothesis are both flawed. The study of Chan (1986) is severely flawed, both flawed. The study of Chan (1986) is severely flawed, considering the book for comparison was of a different considering the book for comparison was of a different genre. genre.
The other studies are more reasonable, although almost all
We would like to develop a more rigorous mathematical frame work for testing two frame work for testing two-
author hypothesis in general and for analyzing Dream of the Red Chamber in particular. and for analyzing Dream of the Red Chamber in particular.
Some books are written (or suspected to be written) by two authors, with the first X chapters written by author A and authors, with the first X chapters written by author A and last Y chapters written by author B. There is a shift in last Y chapters written by author B. There is a shift in writing style in the middle somewhere. writing style in the middle somewhere.
The idea is to detect these chronological dividing points ( (“ “chrono chrono-
divide” ”) )
We develop a simple mathematical framework for detecting detecting chrono chrono-
two authors write in an interwoven fashion. two authors write in an interwoven fashion.
We use Dream of the Red Chamber as a case study.
(a) (b) Figure 2. Experiment 2: (a) Mean cross validation error rate; (b) Values of SVM classifier on chapters 31-50. Note there is no chrono-divide.
(a) (b) Figure 3. Experiment 3: (a) Mean cross validation error rate; (b) Values of SVM classifier on chapters 96-105, which correspond to the samples 31-50 in all 80 samples. Note two samples come from one chapter in this experiment.
(a) (b) (c) Figure 4. Classification results from the test sampels of the other three classical novels: (a) Romance of the Three Kingdoms; (b) Water Margin; (c) Journey to the West.
20 40 60 80 100 120 140 20 40 60 80 100 120 140
Figure 5. Distances between the first 80 chapters of the Cheng-Gao version, the last 40 chapters of the Cheng-Gao version, and 30 chapters of Continued Dream of the Red Chamber .