What companies unabridged keyword blacklists say about Chinese - - PowerPoint PPT Presentation

what companies unabridged keyword blacklists say about
SMART_READER_LITE
LIVE PREVIEW

What companies unabridged keyword blacklists say about Chinese - - PowerPoint PPT Presentation

What companies unabridged keyword blacklists say about Chinese censorship of realtime chat Jeffrey Knockel, Dissertation Defense, December 2017 Committee Jedidiah Crandall (chair) Ronald Deibert Stephanie Forrest Jared Saia 1989


slide-1
SLIDE 1

What companies’ unabridged keyword blacklists say about Chinese censorship

  • f realtime chat

Jeffrey Knockel, Dissertation Defense, December 2017 Committee Jedidiah Crandall (chair) Ronald Deibert Stephanie Forrest Jared Saia

slide-2
SLIDE 2

“1989 ” 年民运

(1989 Democracy Movement)

slide-3
SLIDE 3

“ 习近平时代 ”

(Xi Jinping Era)

slide-4
SLIDE 4

“Baby Mama Drama”

slide-5
SLIDE 5

“Baby Mama Drama”

(A keyword appearing in a chat client)

slide-6
SLIDE 6

Who determines what’s censored in Chinese apps?

slide-7
SLIDE 7
  • Implementations are uniform
  • What is censored necessarily

reflects CPC strategies

  • e.g., collective action targeted,

government criticism permitted (King, Pan, Roberts; 2013, 2014)

Centralized and Monolithic?

slide-8
SLIDE 8
  • Intermediary liability
  • Censorship laws and policy can be

intentionally vague

  • Responsibility for implementing

censorship pushed down to companies

  • “Anaconda in the Chandelier” (Perry Link)

Decentralized and fragmented?

slide-9
SLIDE 9

How can we understand which is right?

  • Analyzing censorship in apps used in China
  • Client-side censorship offers research opportunities
  • Extract entire keyword lists used to trigger censorship
  • Compare across apps and industries
slide-10
SLIDE 10

Industry segments

  • Instant messaging apps (FOCI 2011, First Monday 2013)
  • Live streaming apps (FOCI 2015)
  • → Mobile gaming apps ← (FOCI 2017)

Jeffrey Knockel, Jedidiah R. Crandall, and Jared Saia. Three Researchers, Five Conjectures: An Empirical Analysis of TOM-Skype Censorship and Surveillance. FOCI 2011. San Francisco, California. August 2011. Jedidiah R. Crandall, Masashi Crete-Nishihata, Jeffrey Knockel, Sarah McKune, Adam Senft, Diana Tseng, and Greg

  • Wiseman. Chat program censorship and surveillance in China: Tracking TOM-Skype and Sina UC. First Monday

Volume 18, Number 7, 1 July 2013. Jeffrey Knockel, Masashi Crete-Nishihata, Jason Q. Ng, Adam Senft, and Jedidiah R. Crandall. Every Rose Has Its Thorn: Censorship and Surveillance on Social Video Platforms in China. FOCI 2015. Washington D.C., USA. Jeffrey Knockel, Lotus Ruan, and Masashi Crete-Nishihata. Measuring Decentralization of Chinese Keyword Censorship via Mobile Games. FOCI 2017. Vancouver, Canada.

slide-11
SLIDE 11

Instant messaging (IM) clients

Do Chinese companies use the same lists?

  • TOM-Skype
  • Sina UC

3% overlap No shared blacklist largely determining what is censored

slide-12
SLIDE 12

Instant messaging (IM) clients

Categorized into events Little high level overlap 2 companies, 1,000’s of keywords

slide-13
SLIDE 13

Live streaming platforms

Reverse engineer apps across entire industry segment

  • YY
  • Sina Show
  • 9158
  • GuaGua

Keyword similarities explained by developer similarities

slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16

Live streaming platforms

Tracked updates to list over time No large overlap in events that cannot be explained by shared

  • wnership

4 companies (6 total), 10,000’s of keywords

slide-17
SLIDE 17

China has the world’s largest and most lucrative mobile gaming market Estimated value of over 27.5 billion US$ in 2017

Source: https://newzoo.com/insights/articles/the-global-games-market-will-reach-108-9-billion-in-2017-with-mobile-taking-42/, Apr 2017

slide-18
SLIDE 18

Registration Approval → Ministry of Culture (MoC) Publication License → State Administration of Press, Publication, Radio, Film and Television (SAPPRFT)

slide-19
SLIDE 19

Prohibited Content in Online Games 1. violating basic principles set by the Constitution; 2. jeopardizing national unity, state sovereignty and territorial integrity; 3. leaking state secrets, endangering state security or damaging state honor and interests; 4. instigating ethnic hatred or discrimination, jeopardizing ethnic unity, and infringing ethnic rituals or customs; 5. promoting heretical or superstitious idea; 6. spreading rumors, disrupting social

  • rder and stability;

7. disseminating obscenity, pornography, gambling, violence or abetting crime; 8. humiliating or slandering others, infringing the lawful rights of others; 9. transgressing social morality;

  • 10. other contents forbidden by laws and

administrative regulations.

slide-20
SLIDE 20

Mobile games in China

There are a lot more Chinese games than Chinese chat platforms! Companies > 100, 100,000’s of keywords Allows us to test new hypotheses. Commonly censor in game chat and usernames. Many games are international games adapted for Chinese market.

slide-21
SLIDE 21

Hypotheses

Censorship keyword lists are: 1) Determined at the city or provincial level 2) Determined for specific genres of games 3) Related to the date that games are released 4) Largely determined by the publisher or developer

slide-22
SLIDE 22

“Initiating banned keywords data~!”

slide-23
SLIDE 23

Please enter your user name: Xi Jinping User name does not comply with regulations, please re-enter.

slide-24
SLIDE 24

Sampling methodology

  • Collected first 500 results from Hi Market using search query that only

returned highly downloaded Chinese-developed games

  • Same for internationally developed games
  • Searched APKs for sensitive words

falun, 法轮 (falun), fuck, 肏 (fuck)

  • Searched for censorship-related strings

blacklist, censor, dirty, filter, forbid, illegal, keyword, profan, sensitiv

slide-25
SLIDE 25

Keyword lists

From 836 games, found 132 lists from 113 games (152,114 unique keywords)

  • XML, JSON, CSV
  • Compiled Lua, C++
  • Encrypted files
slide-26
SLIDE 26

Interesting keywords

Criticism of Censorship Policies

  • 敏感词屏蔽的社会 (a society where sensitive keywords are blocked)

Multilingual Keywords

  • 일진회 (Iljinhoe), a nationwide pro-Japan organization that operated in Korea

in the 1900s

slide-27
SLIDE 27

Interesting keywords

Coded Language

  • 刁净瓶 (diāo jìng píng), referencing state leader

习近平 (xí jìnpíng)

  • 无法领奖的人 (a person who is unable to receive the award), referring to

China’s Nobel Laureate and dissident Liu Xiaobo Competitor Names

  • 侠客天下 (World of Knights)
  • 仙境传说 (Ragnarok Online)
slide-28
SLIDE 28

Content analysis

Sampled 7,000 keywords (1.1% margin with 95% confidence)

Theme Examples Event Anniversaries, Current Events Political Communist Party of China, Religious Groups People Government officials, Dissidents Social Gambling, Prurient Interests Technology Online Games, URLs Miscellaneous No clear context

slide-29
SLIDE 29
slide-30
SLIDE 30

Testing the four hypotheses

Took the 132 lists from 113 games (152,114 unique keywords) Turned each list into a vector of word counts

slide-31
SLIDE 31
slide-32
SLIDE 32

Statistical testing

Mantel test – a test for statistical correlation between similarity matrices X and Y r statistic a correlation statistic between -1 and 1 p value probability that at least as extreme correlation would arise from chance

slide-33
SLIDE 33

Hypotheses

Censorship keyword lists are: 1) Determined at the city or provincial level 2) Determined for specific genres of games 3) Related to the date that games are released 4) Largely determined by the publisher or developer

slide-34
SLIDE 34

Statistical testing

Mantel test – a test for statistical correlation between similarity matrices X and Y Y is the matrix of cosine similarities X is different depending on what we want to test

  • same genre
  • same publisher city
  • same developer city
  • similarity in approval dates
  • same publisher
  • same developer
slide-35
SLIDE 35

Results

Variable r statistic p value Same publisher city −0.014 0.65 Same publisher city −0.014 0.65 Same developer city −0.0069 0.58 Same genre −0.013 0.65 Similar approval date 0.16 0.0067 Same publisher 0.15 < 0.001 Same developer 0.17 < 0.001

slide-36
SLIDE 36

Repeated experiment

Different sampling methodology this time Many didn’t share the same publisher (50%) or developer (62%) with any other Selected from five popular publishers Giant, Happy Elements, iDreamSky, Netease, Tencent And from eight popular developers CatCap, Chukong, Joymeng, Ourpalm, Smile, Ultralisk, Xiao Ao

slide-37
SLIDE 37

Keyword lists

From 574 unique games, we found

  • 167 lists from 129 games
  • 171,150 unique keywords

We compared the lists in the same way as before.

slide-38
SLIDE 38
slide-39
SLIDE 39

Results

Variable r statistic p value Similar approval date

  • 0.056

0.83 Same publisher 0.21 < 0.001 Same developer 0.23 < 0.001

slide-40
SLIDE 40

Hypotheses

Censorship keyword lists are: ✗ Determined at the city or provincial level ✗ Determined for specific genres of games

?

Related to the date that games are released ✔ Largely determined by the publisher or developer This suggests that the responsibility of determining what to censor is pushed down as far as possible.

slide-41
SLIDE 41

Generalizing to other industry segments

No centralized blacklists or directives largely determining lists Directives from provincial level playing a large role? More data needed to be confident… If lessons from mistaken assumptions about centralized blacklists are true, then NO. Study motivations and incentives of private companies

slide-42
SLIDE 42

Some of my other work

  • Jeffrey Knockel, Adam Senft, and Ron Deibert. Privacy and Security Issues in BAT Web
  • Browsers. In the Proceedings of the 6th USENIX Workshop on Free and Open Communications
  • n the Internet (FOCI 2016). Austin, Texas. August 2016.
  • Jeffrey Knockel and Jedidiah R. Crandall. Counting Packets Sent Between Arbitrary Internet
  • Hosts. In the Proceedings of the 4th USENIX Workshop on Free and Open Communications on the

Internet (FOCI 2014). San Diego, California. August 2014.

  • Jeffrey Knockel, George Saad and Jared Saia. Self-Healing of Byzantine Faults. In the

Proceedings of the 15th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS 2013). Osaka, Japan. November 2013.

  • Jeffrey Knockel and Jedidiah R. Crandall. Protecting Free and Open Communications on the

Internet Against Man-in-the-Middle Attacks on Third-Party Software: We're FOCI'd. In the Proceedings of the 2nd USENIX Workshop on Free and Open Communications on the Internet (FOCI 2012). Bellevue, Washington. August 2012.

slide-43
SLIDE 43

Reports

  • Jeffrey Knockel, Adam Senft, and Ron Deibert. A Tough Nut to Crack A Further Look at

Privacy and Security Issues in UC Browser. Citizen Lab Report. August 2016

  • Jeffrey Knockel, Adam Senft, and Ron Deibert. WUP! There It Is: Privacy and Security

Issues in QQ Browser. Citizen Lab Report. March 2016.

  • Jeffrey Knockel, Sarah McKune, and Adam Senft. Baidu’s and Don’ts: Privacy and

Security Issues in Baidu Browser. Citizen Lab Report. February 2016.

  • Andrew Hilts, Christopher Parsons, and Jeffrey Knockel. Every Step You Fake: A

Comparative Analysis of Fitness Tracker Privacy and Security. Open Effect Report. February 2016.

  • Lotus Ruan, Jeffrey Knockel, Jason Q. Ng, and Masashi Crete-Nishihata. One App, Two

Systems How WeChat uses one censorship policy in China and another

  • internationally. Citizen Lab Report. November 2016.
slide-44
SLIDE 44

Impact of my research

  • Reported by international media (Bloomberg, Guardian,

Reuters, BBC, Fortune, Wall Street Journal, Washington Post, CNN Money, New York Times…)

  • Microsoft no longer performs client-side censorship or

surveillance in the Chinese Skype client!

slide-45
SLIDE 45

Questions?