Support t Vecto tor Machine (str treamlined) Mich Michael ael - PowerPoint PPT Presentation

extended version: Biehl-Part1.pdf Support t Vecto tor Machine (str treamlined) Mich Michael ael Bieh Biehl Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence University of Groningen www.cs.rug.nl/biehl

� �� th the sto torage problem revisite ted Solving the perceptron storage problem re-write the problem ... D = { ξ µ , S µ consider a given data set R } I S µ H = sign( w · ξ µ ) = S µ ... find a vector with for all µ w R R ⇔ sign( w · ξ µ S µ R ) = 1 ⇔ E µ = w · ξ µ S µ sign( w · ξ µ ) = S µ Note: R > 0 ( local potentials E µ ) equivalent problem: solve a set of linear inequalities (in w ) E µ = w · ξ µ S µ ... find a vector with for all µ w R ≥ c > 0 Note that the actual value of is irrelevant: IAC Winte ter School November 2018, La Laguna 2 satisfies satisfies

solving equati tions ? Instead of inequalities , try to solve P equations for N unknowns: N E µ = j S µ = 1 X w j ξ µ for all µ = 1 , 2 , . . . , P j =1 (A) if no solution exists, find approximati (A) tion by least square dev.: P minimize f = 1 (1 − E µ ) 2 X 2 µ =1 minimization, e.g. by means of gradient t descent t with P (1 � E µ ) ξ µ S µ X r w f = � µ =1 IAC Winte ter School November 2018, La Laguna 3

solving equati tions ? (B) (B) if the system is under-determined → find a unique solution: minimize 1 { E µ = 1 } P 2 | w | 2 under constraints µ =1 P L = 1 2 | w | 2 + λ µ (1 − E µ ) X Lagrange function µ =1 ∂ L ! necessary conditi tions for opti timum: ∂λ µ = (1 − E µ ) = 0 P P λ µ ξ µ S µ λ µ ξ µ S µ ! X X r w L = w � = 0 ⇒ w = µ =1 µ =1 Lagrange parameters ~ embedding str trength ths λ µ (rescaled with N) solution is a linear combination of the data IAC Winte ter School November 2018, La Laguna 4

solving equati tions ? eliminate weights: P N N 1 X X E ν = ( ξ µ k S µ ) ( ξ ν λ µ λ ν C ν µ λ µ k S ν ) X X w 2 j ∝ N µ =1 k =1 µ, ν j =1 | {z } ≡ C ν µ max λ L = − 1 λ ν C ν µ λ µ + X X simplified problem: λ µ 2 µ, ν µ ∂ L C ρ µ λ µ = (1 − E ρ ) X gradient ascent with: ∂λ ρ = 1 − µ (1 − E ρ ) ξ ρ S ρ in terms of weights: X ∆ w ∝ the same as in (A) !!! ρ IAC Winte ter School November 2018, La Laguna 5

solving equati tions ? rename the Lagrange parameters, re-writing the problem: P N N 1 X X E ν = x ν C ν µ x µ ( ξ µ k S µ ) ( ξ ν x µ X X w 2 k S ν ) j ∝ N µ =1 k =1 j =1 µ, ν | {z } ≡ C ν µ max x L = − 1 x ν C ν µ x µ + X X simplified problem: x µ 2 µ, ν µ ∂ L C ρ µ x µ = (1 − E ρ ) gradient ascent with: X ∂ x ρ = 1 − µ in terms of weights: (1 − E ρ ) ξ ρ S ρ X ∆ w ∝ the same as in (A) !!! ρ IAC Winte ter School November 2018, La Laguna 6

classical algorith thm: ADA DALINE E Adaptive Li Ad Linear N Neuron (Widrow and Hoff, 1960) ⇣ 1 − E µ ( t ) ⌘ ξ µ ( t ) S µ ( t ) Adaline algorithm: w ( t ) = w ( t − 1) + η sequence µ(t) ⇣ 1 − E µ ( t ) ⌘ x µ ( t ) x µ ( t − 1) + η = of examples iteration of weights / embedding strengths more general: training of a linear unit with continuous output P minimize f = 1 ( h µ − E µ ) 2 X with h µ ∈ I R, µ = 1 , 2 . . . , P 2 µ =1 P f = 1 y µ − w > ξ µ � 2 with y µ = h µ S µ X � 2 µ =1 gradient based learning for linear regression (MSE) frequent strategy: regression as a proxy for classification IAC Winte ter School November 2018, La Laguna 7

hardware realizati tion “Science in acti tion” ca. 1960 youtube video “science in action” with Bernard Widrow http://www.youtube.com/watch?v=IEFRtz68m-8 8

Intr troducti tion: • supervised learning, clasification, regression • machine learning “vs.” statistical modeling Ea Early (importa tant! t!) approaches: • linear threshold classifier, Rosenblatt’s Perceptron • adaptive linear neuron, Widrow and Hoff’s Adaline From Perceptr tron to Support t Vecto tor Machine • large margin classification • beyond linear separability Di Dista tance-based syste tems • prototypes: K-means and Vector Quantization • from K-Neares_Neighbors to Learning Vector Quantization • adaptive distance measures and relevance learning IAC Winte ter School November 2018, La Laguna 9

Optimal stability by quadratic optimization minimize 1 E µ = w > ξ µ S µ 2 w 2 subject to inequality constraints P � R ≥ 1 µ =1 1 Note: the solution of the problem yields stability κ max = w max | w max |

Notation: correlation matrix (outputs incorporated) with elements P-vectors: inequalities “one-vector”: (C is positive semi-definite)

Optimal stability by quadratic optimization minimize 1 E µ = w > ξ µ S µ 2 w 2 subject to inequality constraints P � R ≥ 1 µ =1 1 Note: the solution of the problem yields stability κ max = w max | w max | We can formulate optimal stability completely in terms of embedding strengths: subject to linear constraints minimize This is a special case of a standard problem in Quadratic Programming : minimize a nonlinear function under linear inequality constraints

Optimization theory: Kuhn–Tucker theorem see, e.g., R. Fletcher, Practical Methods of Optimization (Wiley, 1987) or http://wikipedia.org “Karush-Kuhn-Tucker-conditions” for a quick start necessary conditions for a local solution of a general non-linear optimization problem with equality and inequality constraints

Max. stability: Kuhn–Tucker theorem for a special non-linear optimization problem 1 x > C ~ x ≥ ~ minimize ~ 2 ~ x subject to C ~ 1 x � ) = 1 � > ( C ~ x > C ~ x, ~ x − ~ x − ~ Lagrange function: L ( ~ 1) 2 ~ Any solution can be represented by a Kuhn-Tucker (KT) point with: non-negative embedding strengths ( ← minover) linear separability complementarity implies also: straightforward to show: → all KT-points yield the same unique perceptron weight vector → any local solution is globally optimal

Duality, theory of Lagrange multipliers → equivalent formulation ( Wolfe dual ): absent in the f = − 1 x T C ~ x T ~ e subject to maximize 2 ~ x + ~ 1 x ≥ 0 ~ Adaline problem ~ x IAC Winte ter School November 2018, La Laguna

Duality, theory of Lagrange multipliers → equivalent formulation ( Wolfe dual ): f = − 1 x T C ~ x T ~ e subject to maximize 2 ~ x + ~ 1 x ≥ 0 ~ ~ x (Ad Adaptive PercepTron Tron) ) [Anlauf and Biehl, 1989] AdaTron algorithm: D = { ξ µ , S µ } – sequential presentation of examples I – gradient ascent w.r.t. e f , projected onto ~ x ≥ 0 x µ → max { 0 , x µ + ⌘ ( 1 − [ C ~ x ] µ ) } ( 0 < ⌘ < 2 ) z }| { h i µ x e f r ~ η IAC Winte ter School November 2018, La Laguna

Duality, theory of Lagrange multipliers → equivalent formulation ( Wolfe dual ): f = − 1 x T C ~ x T ~ e subject to maximize 2 ~ x + ~ 1 ~ x ≥ 0 ~ x (Ad Adaptive PercepTron Tron) ) [Anlauf and Biehl, 1989] AdaTron algorithm: D = { ξ µ , S µ } – sequential presentation of examples I – gradient ascent w.r.t. e f , projected onto ~ x ≥ 0 x µ → max { 0 , x µ + ⌘ ( 1 − [ C ~ x ] µ ) } ( 0 < ⌘ < 2 ) for the proof of convergence one can show: e x ∗ ) ≥ e • for an arbitrary ~ x ≥ 0 and a KT point ~ x ∗ : f ( ~ f ( ~ x ) • e f ( x ) is bounded from above in ~ x ≥ 0 • e f ( x ) increases in every cycle through I D , unless a KT point has been reached 5 IAC Winte ter School November 2018, La Laguna

Support Vectors x µ ( 1 − E µ ) = 0 for all complementarity condition: µ ⇢ � ⇢ � = 1 1 E µ E µ > i.e. either or 0 = 0 x µ x µ ≥ examples ... have to be embedded or ... are stabilized “automatically” the � weights � �� ∝ �� x µ � ξ µ S µ P µ depend (explicitly) only on a subset of I D if these support vectors were known in advance, training could be restricted to the subset (unfortunately they are not...) IAC Winte ter School November 2018, La Laguna

learn learnin ing in in v version ersion space ? space ? ... (including max. stability) is only possible if the data set is linearly separable • ... even then, it only makes sense if the unknown rule is a linearly separable function • the data set is reliable ( noise-free ) • ? lin. separable nonlin. boundary noisy data (?) IAC Winte ter School November 2018, La Laguna 19

. �lassi�ication beyond linear separability assume is not linearly separable - what can we do? potential reasons: noisy data, more complex problem → see “pocket algorithm” ● accept an approximation by a linearly separable function and �large margin with errors � ● ! large margins with errors arge margins with errors → see “committee and admit disagreements w.r.t. training data, but keep basic idea of optimal stability ● P 1 2 w 2 + γ subject to E µ ≥ 1 − β µ X β µ minimize w , β µ =1 and β µ ≥ 0 for all µ β µ = 0 ↔ E µ ≥ 1 ( slack variables β µ > 0 ↔ E µ < 1 E µ < 0 includes errors with

Support t Vecto tor Machine (str treamlined) Mich Michael ael - PowerPoint PPT Presentation

extended version: Biehl-Part1.pdf Support t Vecto tor Machine (str treamlined) Mich Michael ael Bieh Biehl Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence University of Groningen www.cs.rug.nl/biehl

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor

Circumventing Internet censorship with Tor Philipp Winter The Tor Project What Tor Browser does

Tor: The Onion Router 2 / 13 Tor: The Onion Router www.cbc.ca 2 / 13 Tor: The Onion Router

Tor and circumvention: Lessons learned Roger Dingledine The Tor Project https://torproject.org/

Tor update 2012 Roger Dingledine The Tor Project https://torproject.org/ 1 Today's plan 0)

Overview of Tor issues Roger Dingledine The Tor Project https://torproject.org/ 1 Today's plan

Cryptographic Challenges in and around Tor Nick Mathewson The Tor Project 9 January 2013

Tor: a quick overview Roger Dingledine The Tor Project https://torproject.org/ 1 What is Tor?

The Tor software ecosystem Roger Dingledine Jacob Appelbaum The Tor Project

Tor: a quick overview Roger Dingledine The Tor Project https://torproject.org/ 1 What is Tor?

Towards a Censorship Analyser for Tor Philipp Winter The Tor Project & Karlstad University

Tor and (un)provable privacy Roger Dingledine The Tor Project https://torproject.org/ 1

Tor: a quick overview Roger Dingledine The Tor Project https://torproject.org/ 1 What is Tor?

A Practical Congestion Attack on Tor Using Long Paths Towards De-anonymizing Tor Nathan S. Evans

F air vie w Br idge R e plac e me nt and Str e e t Impr ove me nts (9 th Str e e t to 16 th

49.9 % $164.14 $81.84 28.5% 0.5% 28.9% Source: STR STR CALENDAR YEAR-TO-DATE September 2020

Mathematical Analysis and Systems Theory Lassi Paunonen Assistant Professor, Mathematics Tampere

IgProf The ignominious profiler. A generic memory and performance profiler for linux

Survey Marks Enquiry Service (SMES) Join us at Webinar 1 slido.com #SMES What we will cover

Software Metrics And I gnominy Software Metrics And I gnominy Software Metrics And I gnominy

Course outline Review of conduction, resistance concept, extended surfaces, lumped capacitance

Hard Cash and Soft Skills: Experimental Evidence on Combining Scholarships and Mentoring in

Winter Storm Webinar Eric Ahasic NWS Twin Cities Tuesday, November 26, 2019 9:00 AM Current

The Seven (More) DEADLY SINS OF Microservices @ danielbryantuk @ spectolabs Previously, AT Devoxx

Support t Vecto tor Machine (str treamlined) Mich Michael ael - PowerPoint PPT Presentation

extended version: Biehl-Part1.pdf Support t Vecto tor Machine (str treamlined) Mich Michael ael Bieh Biehl Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence University of Groningen www.cs.rug.nl/biehl

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor

Circumventing Internet censorship with Tor Philipp Winter The Tor Project What Tor Browser does

Tor: The Onion Router 2 / 13 Tor: The Onion Router www.cbc.ca 2 / 13 Tor: The Onion Router

Tor and circumvention: Lessons learned Roger Dingledine The Tor Project https://torproject.org/

Tor update 2012 Roger Dingledine The Tor Project https://torproject.org/ 1 Today's plan 0)

Overview of Tor issues Roger Dingledine The Tor Project https://torproject.org/ 1 Today's plan

Cryptographic Challenges in and around Tor Nick Mathewson The Tor Project 9 January 2013

Tor: a quick overview Roger Dingledine The Tor Project https://torproject.org/ 1 What is Tor?

The Tor software ecosystem Roger Dingledine Jacob Appelbaum The Tor Project

Tor: a quick overview Roger Dingledine The Tor Project https://torproject.org/ 1 What is Tor?

Towards a Censorship Analyser for Tor Philipp Winter The Tor Project &amp; Karlstad University

Tor and (un)provable privacy Roger Dingledine The Tor Project https://torproject.org/ 1

Tor: a quick overview Roger Dingledine The Tor Project https://torproject.org/ 1 What is Tor?

A Practical Congestion Attack on Tor Using Long Paths Towards De-anonymizing Tor Nathan S. Evans

F air vie w Br idge R e plac e me nt and Str e e t Impr ove me nts (9 th Str e e t to 16 th

49.9 % $164.14 $81.84 28.5% 0.5% 28.9% Source: STR STR CALENDAR YEAR-TO-DATE September 2020

Mathematical Analysis and Systems Theory Lassi Paunonen Assistant Professor, Mathematics Tampere

IgProf The ignominious profiler. A generic memory and performance profiler for linux

Survey Marks Enquiry Service (SMES) Join us at Webinar 1 slido.com #SMES What we will cover

Software Metrics And I gnominy Software Metrics And I gnominy Software Metrics And I gnominy

Course outline Review of conduction, resistance concept, extended surfaces, lumped capacitance

Hard Cash and Soft Skills: Experimental Evidence on Combining Scholarships and Mentoring in

Winter Storm Webinar Eric Ahasic NWS Twin Cities Tuesday, November 26, 2019 9:00 AM Current

The Seven (More) DEADLY SINS OF Microservices @ danielbryantuk @ spectolabs Previously, AT Devoxx

Towards a Censorship Analyser for Tor Philipp Winter The Tor Project & Karlstad University