SLIDE 1 “Collecting User's Data in a Socially-Responsible Manner.”
Photograph: Daniel Beltra/Greenpeace
Konark Modi
@konarkmodi
Josep M. Pujol
@solso
SLIDE 2 About Cliqz
- 80+ - Team size
- 500,000 - DAU
- 3 Million+ - Downloads (Germany only)
- 1 billion+ - Indexed pages (We do not believe in
indexing the web.)
- 5 TB - In-Memory indexed (Based on open source
and in-house build NoSQL stores.)
- 10x more coverage for anti-phishing protection
- As compared to other players like safebrowsing by
Google.
- Upcoming products like Anti-tracking etc.
SLIDE 3
About Cliqz
SLIDE 4
We Love Data …
SLIDE 5
Let's step back a bit in time, to get the context.
SLIDE 6 Source : http://thehumanfaceofbigdata.com
“ Data is the new oil ”
SLIDE 7
Data is still being collected without enough controls & measures.
Is privacy the new Green ?
SLIDE 8
The biggest by-product of which being SESSIONS.
Is privacy the new Green ?
SLIDE 9 How ?
Alice Alice Bob
MAP/REDUCE :D
Server-Side Alice Alice Bob Client-Side
Uncharted water
SLIDE 10 Instead …
Uncharted water
Server-Side Alice Alice Bob Client-Side Alice Alice Bob
MAP/REDUCE :D MAP/REDUCE :D MAP/REDUCE :D
SLIDE 11
Who is responsible ?
Is there a conspiracy theory or an evil plan ?
SLIDE 12
Well, we have a simpler explanation:
It’s the consequences of common development practices, which results in trading user’s data knowingly / unknowingly !
SLIDE 13
Demo
SLIDE 14
This looks like a toy example ?
SLIDE 15
Which are the queries that are so bad that forces people to redo the same query elsewhere ?
Let’s take a more complex case
SLIDE 16 Alice apache big data conf search engine 2 search engine 1 Alice apache big data conf
Client-Side
SLIDE 17 Alice apache big data conf search engine 2 search engine 1 Alice apache big data conf
Uncharted water
Alice apache big data conf search engine 2 search engine 1 Alice apache big data conf Map-Reduce
Client-Side
Server - Side
SLIDE 18 Alice apache big data conf search engine 2 search engine 1 Alice apache big data conf
Uncharted water
Alice apache big data conf search engine 2 search engine 1 Alice apache big data conf Map-Reduce
Client-Side
Server - Side
SLIDE 19 Alice apache big data conf search engine 2 search engine 1 Alice apache big data conf
Uncharted water
Alice apache big data conf search engine 2 search engine 1 Alice apache big data conf Map-Reduce Alice apache big data conf search engine 2 search engine 1 Alice apache big data conf Map-Reduce
Client-Side
Server - Side
SLIDE 20 Alice apache big data conf search engine 2 search engine 1 Alice apache big data conf
Uncharted water
Alice apache big data conf search engine 2 search engine 1 Alice apache big data conf Map-Reduce Alice apache big data conf search engine 2 search engine 1 Alice apache big data conf Map-Reduce
Client-Side
Server - Side
SLIDE 21 We mentioned before, we believe in data and are not against the collection .
- Stopping data collection altogether would be foolish
and dangerous.This also means stopping the wheels
- f innovation.
- Who would benefit the most by
supporting the ban on advertisements of tobacco products??
SLIDE 22
SLIDE 23
“Socially responsible manner” is an analogy to ensure events being collected are not suffering from pollutants like Explicit IDs, Implicit IDs and reaches home Secure.
SLIDE 24
Why does CLIQZ Care ?
SLIDE 25 German Data Privacy Laws Security breaches
When government knocks
SLIDE 26
So what do we bring on the table ??
SLIDE 27 HUMAN WEB
- We have developed HumanWeb to balance the Right-to-Privacy with the
needs to build products that improve the web and allow for more
- penness.
- Ensuring data that can infer sessions, linkages to navigation patterns is
not collected.
- Does not create so much data that could allow identification of individuals
- We do not want to know who "YOU" are, what "YOU" searched and when
"YOU" searched.
- Designed keeping in mind so that a "malicious/untrustworthy" actor or as
a matter of fact even anyone at Cliqz, getting access to the raw data flow cannot infer or identify individuals.
SLIDE 28 Sample events:
{ "action": action of the message, "ver": version name, "type": "humanweb", "payload": { }, //the actual data "ts": UTC time capped to the day, e.g. 20150909 }
- Sample event for Page
- Sample event for Query
SLIDE 29 HumanWeb
[ {event1}, {event2}, {event3} ]
Event Queue | Schedule to ensure not sent in batch Final checks Filtering Sanitisation / Masking Secure Channel
Client-side
Local storage | Structural data about webpages
Map-Reduce
Aggregations, Heuristics, Filtering,Hashing
SLIDE 30
Privacy breaches on the way home
To achieve total privacy, we must rely on a network of proxies that remove any network-related data like cookies, IP , headers so that finger-printing is impossible.
SLIDE 31
SecureChannel : Protection from network fingerprinting
SLIDE 32 SecureChannel : What do we encrypt ?
- The queries from the user (initiated by them upon activity on the
Cliqz’s instrumented Firefox address bar).
- All telemetry signals (initiated by Cliqz’s instrumented Firefox)
- All messages regarding the HumanWeb data collection effort.
Also, before reaching our infrastructure the encrypted messages are routed through a mesh of proxies.
SLIDE 33 SecureChannel : How do we encrypt ?
Life-Cycle of hashes / keys :
- AES : Hash-keys used with AES are used only one time. Even if the user types the
same query .
- Public / Private KeyPair ( Client ) :
- The Keys on client side are all short lived, we continuously generate keys on
the client-side.
- The public/private key pair of the client (the Extension) is meant to be used
- nly once and then thrown away. The key pairs are regenerated to fill a pool
while the browser is idle.
- Public / Private KeyPair ( Server ) :
- Only public part of this key is shared with the extension.
- The client uses it while encrypting the request. This is long lived key, currently
- nly to change in the case it is compromised
Client side : 128-bit symmetric AES encryption, OpenSSL RSA 1024-bit encryption. EventLogger: 128-bit symmetric AES encryption, OpenSSL RSA 4096-bit encryption.
SLIDE 34 SecureChannel : How do we encrypt ? (Extension)
encryptedRequest(iv:encryptedMsg:encryptedKey) iv :Initializaton Vector msg = (originalRequest + ExtensionPublicKey) key = md5(msg) encryptedMsg = AES.encrypt(msg, key, {mode: CBC, padding: PKCS7, iv: iv}) encryptedKey = sign(EventLoggerPublicKey, key) Each request to be encrypted has the following components :
- Message / Request to encrypt : Query or Data
- ExtensionPublicKey : Chosen from a pool of public keys for that user on
the machine, key is used only once and then discarded).
- Initialisation Vector : Derived from wordarray of 16-bits.
- EventLoggerPublicKey : Our public key, shared with the extension.
SLIDE 35 SecureChannel : Routing ? (Extension)
- Extension maintains a list of proxies which are healthy / good at that point in
time.
- When sending the request / message extension picks up the end-point in a
round-robin fashion (Round-robin for now).
- To avoid the risk of proxies being malicious with the message, we implement
scrambling and splitting of messages into a random ‘n’ parts just before sending the message from extension.
- The value of n is determined by the extension, we expect ‘n’ to be 1,2,4 or 8
for the time being. Also, the value of ’n’ is not known to proxies hence they are unaware if it has all the parts.
- The only way to tamper a message is to have all the parts to decrypt it, but
since messages are scrambled, split and send through different proxies this makes the messages safe from proxies.
- Event Logger waits for all the message by combination at our Event
Logger(Secure) can decrypt the message.
SLIDE 36 SecureChannel : How do we decrypt ? (Server)
EncryptedRequest = iv:encryptedMsg:encryptedKey key = unlock(EventLoggerPrivateKey, encryptedKey) msg = AES.decrypt(encryptedMsg, key, {mode: CBC, padding: PKCS7, iv: iv) request = msg.data ExtensionPublicKey = msg.pk (We need it to sign the response)
Important:
- Because the server receives messages in parts, to get the key and message we rely on
combinations.
- The message itself is scrambled, so even if it is decrypted we need to stitch it together by trying
different combinations.
SLIDE 37
All talk and no play, makes Jack a dull boy ! Demo
SLIDE 38 Thank You
http://www.cliqz.com/en
We believe it’s possible, we are actually doing it
photo: projectsecretidentity.org