Auditing Machine Learning Models for Individual Bias and Unfairness - PowerPoint PPT Presentation

Auditing Machine Learning Models for Individual Bias and Unfairness Songkai Xue Department of Statistics, University of Michigan Joint work with Mikhail Yurochkin and Yuekai Sun

Introduction High-stakes decision making involves • Recidivism prediction (Angwin et al., 2016); • Housing advertisement (Angwin, Tobin and Varner, 2017); • Resume screening (Jeffrey, 2018). Who makes the decision? Human ? = Bias Machine � = No Bias 1/28

Northpointe’s COMPAS Dataset C orrectional O ffender M anagement P rofiling for A lternative S anctions Disparate impact on • Minorities; • Underprivileged groups. Protected/Sensitive attributes include • Race (black, white, · · · ); • Gender (female, male, · · · ). These attributes are protected by federal anti-discrimination law . 2/28

Northpointe’s COMPAS Dataset (Cont.) Prediction fails differently for black defendants. White Black Labeled higher risk, but didn’t re-offend 23.5% 44.9% Labeled lower risk, but did re-offend 47.7% 28.0% (Source: Machine bias , by ProPublica.) 3/28

Algorithmic Fairness Formal definitions of algorithmic fairness? YES. • Dwork et al. (2012); • Kleinberg, Mullainathan and Raghavan (2017); • Chouldechova (2017); • · · · Individual fairness + (statistically) inferential tools? Lacking. (This is what we wish to do.) 4/28

Group Fairness Group fairness is amenable to statistical analysis, ... • Calibration : equal false discovery and non-discovery rates. • Equalized odds : equal false positive and negative rates. but fails under scrutiny. • ML models that satisfy group fairness may be blatantly unfair for individual users (Dwork et al., 2012). • There are fundamental incompatibilities between common notions of group fairness (Kleinberg et al., 2017; Chouldechov, 2017). 5/28

Individual Fairness Main idea: “Treat similar users similarly”. Definition (Individual fairness, Dwork et al., 2012) An ML model h : X → Y is individually fair if there exists L > 0 such that d y ( h ( x 1 ) , h ( x 2 )) ≤ Ld x ( x 1 , x 2 ) for any x 1 , x 2 ∈ X , where d x : X × X → R + (resp. d y : Y × Y → R + ) measures similarity between users (resp. outputs). 6/28

What’s in the Pipeline? 1. Training individually fair ML models: Yurochkin, Bower, Sun, ICLR 2020 . 2. Testing whether an ML model is individually fair or not: Xue, Yurochkin, Sun, AISTATS 2020 . 7/28

Benefits of Our Methods Main benefits are 1. Black-box: Observing the outputs of ML models is sufficient. 2. Computational efficiency: The auditor solves a convex optimization problem. 3. Interpretability: Specific metric leads to specific interpretation. 8/28

Mathematical Preliminaries • The sample space: Z � X × Y • The induced metric on Z : d z (( x 1 , y 1 ) , ( x 2 , y 2 )) � d x ( x 1 , x 2 ) + ∞ × 1 { y 1 � = y 2 } • The Wasserstein distance on ∆( Z ) : � W ( P, Q ) = inf c ( z 1 , z 2 ) d Π( z 1 , z 2 ) , Π ∈C ( P,Q ) Z×Z where • ∆( Z ) is the set of probability distributions on Z ; • C ( P, Q ) is the set of couplings between P and Q ; • c ( · , · ) = d 2 z ( · , · ) is the transportation cost function. 9/28

The Auditor’s Problem Population version of the auditor’s problem: max E Z ∼ P [ ℓ h ( Z )] − E Z ∼ P ⋆ [ ℓ h ( Z )] P ∈ ∆( Z ) W ( P, P ⋆ ) ≤ ε, subject to where ε ≥ 0 is a transportation budget parameter, ℓ h : Z → R + is a loss function picked by the auditor. Main idea: If there is (purely) no bias/unfairness in the ML model, then it is not possible for the auditor to increase the risk by moving (probability) mass to similar areas of the sample space. 10/28

The Auditor’s Problem (Cont.) Empirical version of the auditor’s problem: max E Z ∼ P [ ℓ h ( Z )] − E Z ∼ P n [ ℓ h ( Z )] P ∈ ∆( Z ) subject to W ( P, P n ) ≤ ε, where P n is the empirical distribution of the collected audit data { ( x i , y i ) } n i =1 , since P ⋆ is unknown in practice. FaiTH statistic: We call the optimal value of this optimization problem the Fai r T ransport H ypothesis test statistic. 11/28

The Auditor’s Problem (Cont.) Original problem: W ( P,P n ) ≤ ε E Z ∼ P [ ℓ h ( Z )] . max Dual problem (Blanchet and Murthy, 2019): λ ≥ 0 { λε + E Z ∼ P n [ ℓ c W ( P,P n ) ≤ ε E Z ∼ P [ ℓ h ( Z )] = min max h,λ ( Z )] } , ℓ c x ∈X { ℓ h ( x, y i ) − λd 2 h,λ ( x i , y i ) = max x ( x, x i ) } . Pros: univariate problem; amenable to stochastic optimization. Cons: no global convergence guarantee; hard to establish limiting distribution of test statistic. 12/28

The Auditor’s Problem (Cont.) Empirical version of the auditor’s problem on finite sample space: l ⊤ (Π ⊤ 1 |Z| − f |Z| ) max Π ∈ R |Z|×|Z| + � C, Π � ≤ ε subject to Π 1 |Z| = f |Z| , where • l ∈ R |Z| is the vector of losses; • C ∈ R |Z|×|Z| is the matrix of transportation costs; • f |Z| ∈ ∆ |Z| is the empirical distribution of the data. 13/28

Asymptotics of the FaiTH Statistic Let • K = |Z| , l ∈ R K + and ε ≥ 0 ; • f ⋆ ∈ ∆ K and nf n ∼ Multinomial( n ; f ⋆ ) ; • C ∈ R K × K and D ∈ { 0 , 1 } K × K . + The FaiTH statistic is given by the value function l ⊤ (Π ⊤ 1 K − f n )   max   Π ∈ R K × K    +        subject to � C, Π � ≤ ε ψ ( f n ) � . � D, Π � = 0           Π 1 K = f n   The audit value is given by ψ ( f ⋆ ) . 14/28

Asymptotics of the FaiTH Statistic (Cont.) Theorem (Asymptotic distribution of the FaiTH statistic) The asymptotic distribution of ψ ( f n ) is the infimum of a Gaussian process: √ n { ψ ( f n ) − ψ ( f ⋆ ) } d → inf { ( λ + l ) ⊤ Z : ( ν, µ, λ ) ∈ Λ } , where Z ∼ N ( 0 K , Σ( f ⋆ )) , Σ is the multinomial covariance matrix of f ⋆ , and ν,µ ≥ 0 ,λ ∈ R K { εν + f ⊤ ⋆ λ : νC + µD + λ 1 ⊤ − 1 n l ⊤ } . n � R K × K Λ = arg max + Proof: Canonical perturbation theory = ⇒ Hadamard directional ⇒ Delta method. differentiability = 15/28

Asymptotics of the FaiTH Statistic (Cont.) A non-Gaussian example: 16/28

Boostrapping the Audit Value Efron’s n -out-of- n bootstrap is not consistent because ψ is not smooth enough. Instead, we use m -out-of- n bootstrap. Theorem (Consistency of m -out-of- n bootstrap) Let mf ∗ n,m ∼ Multinomial( m ; f n ) . As long as m = m ( n ) → ∞ and m/n → 0 , we have � √ m � E ∗ � � ψ ( f ∗ �� g n,m ) − ψ ( f n ) | f n p � − E [ g ( √ n { ψ ( f n ) − ψ ( f ⋆ ) } )] � sup → 0 , � � g ∈ BL 1 ( R ) � � where BL 1 ( R ) is the 1 -Lipschitz function subset of the � · � ∞ ball. 17/28

Boostrapping the Audit Value (Cont.) A non-Gaussian example: 18/28

Fair Transport Hypothesis Test Definition ( δ -fairness) For a constant δ ≥ 0 , an ML system is called δ –fair if ψ ( f ⋆ ) ≤ δ . Fai r T ransport H ypothesis Test ( FaiTH test): H 0 : ψ ( f ⋆ ) ≤ δ versus H 1 : ψ ( f ⋆ ) > δ. The auditor considers this hypothesis testing problem in order to test whether or not an ML system is δ -fair. 19/28

Inference for the Audit Value Two-sided confidence interval for the audit value ψ ( f ⋆ ) : c ∗ c ∗ � � 1 − α/ 2 α/ 2 √ n , ψ ( f n ) − √ n CI two-sided = ψ ( f n ) − , where c ∗ q be the q -th quantile of the bootstrap distribution. Theorem (Asymptotic coverage of two-sided CI) lim inf n →∞ P ( ψ ( f ⋆ ) ∈ CI two-sided ) ≥ 1 − α. 20/28

Inference for the Audit Value (Cont.) One-sided confidence interval for the audit value ψ ( f ⋆ ) : ψ ( f n ) − c ∗ � � 1 − α √ n , ∞ CI one-sided = . We reject the null hypothesis H 0 if ψ ( f n ) − c ∗ � � 1 − α δ �∈ √ n , ∞ . Theorem (Asymptotic validity of test) For any δ ≥ 0 , we have lim sup sup P f ⋆ ( δ �∈ CI one-sided ) ≤ α. n →∞ f ⋆ ∈ ∆ K + : ψ ( f ⋆ ) ≤ δ If ψ ( f ⋆ ) > δ , then lim n →∞ P ( δ �∈ CI one-sided ) = 1 . 21/28

COMPAS Results Experiment setup: • Total number of data points: 5278 ; • 70% for training and 30% for auditing ( n = 1584 ); • Discrete space Z with |Z| = 144 ; • Two samples which only differ in race or gender are free to move; • 0 − 1 loss, and δ = 0 . 0365 . FaiTH value can be interpreted as misclassification rates induced by the solution of the auditor’s problem. 3.65% is the midpoint of the proportion of innocent prisoners in the United States. (Source: Miscarriage of justice , by B. A. Garner) 22/28

COMPAS Results (Cont.) More than 3 prior crimes Age greater than 45 1 to 3 prior crimes Misconduct charge Age from 25 to 45 Age less than 25 No prior crimes Felony charge 40 Total number of individuals Black Female 0.0 4.0 6.0 6.0 0.0 4.0 6.0 4.0 20 White Female 0.0 46.0 6.0 6.0 0.0 46.0 47.0 5.0 0 Black Male 0.0 -31.0 -18.0 -18.0 0.0 -31.0 -44.0 -5.0 20 White Male 0.0 -19.0 6.0 6.0 0.0 -19.0 -9.0 -4.0 40 Recidivism 23/28

Auditing Machine Learning Models for Individual Bias and Unfairness - PowerPoint PPT Presentation

Auditing Machine Learning Models for Individual Bias and Unfairness Songkai Xue Department of Statistics, University of Michigan Joint work with Mikhail Yurochkin and Yuekai Sun Introduction High-stakes decision making involves Recidivism

Remarks on Auditing, Regulatory Auditing and Regulatory Auditing Strategy Prof. Dr. Horst

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Why Data Auditing is Important Arizona State Public Health Laboratory April 2019 Objectives

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Auditing Chapter 25 Computer Security: Art and Science , 2 nd Edition Version 1.0 Slide 25-1

Auditing in the Public Interest Auditing in the Public Interest Victorian Government

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Transistor bias circuits 1 Objectives Discuss the concept of dc biasing of a transistor for

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

Introduction Introduction to F to Feder ederal al Gr Grants ants Federal Funding Conference

Seattle Office of City Auditor PRESENTATION TO THE WORKING GROUP FOR PERFORMANCE AUDITING APRIL

Welcome Dr Faruk Majid GP and CCG Chair Agenda Welcome Annual report and accounts for

Audit Service Provider Briefing Portside Conference Centre 19 February 2019 Agenda Item Time

ISA 720 Other Information Cedric Gelard ISA 720 Task Force Chairman September 2014 IAASB

Board of Visitors Audit, Compliance, and Risk Committee June 2018 1 Action Items: 1. Audit

[ Name of event ] 2 ND June, 2017 University of Gibraltar, [ Date ] Murray Steele, JJ

Bank Individual Accountability Regime the fuller picture Simon Morris April 2015 Looking at

Auditing Machine Learning Models for Individual Bias and Unfairness - PowerPoint PPT Presentation

Auditing Machine Learning Models for Individual Bias and Unfairness Songkai Xue Department of Statistics, University of Michigan Joint work with Mikhail Yurochkin and Yuekai Sun Introduction High-stakes decision making involves Recidivism

Remarks on Auditing, Regulatory Auditing and Regulatory Auditing Strategy Prof. Dr. Horst

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Why Data Auditing is Important Arizona State Public Health Laboratory April 2019 Objectives

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Auditing Chapter 25 Computer Security: Art and Science , 2 nd Edition Version 1.0 Slide 25-1

Auditing in the Public Interest Auditing in the Public Interest Victorian Government

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Transistor bias circuits 1 Objectives Discuss the concept of dc biasing of a transistor for

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Equity &amp; Excellence: Hidden Bias Implicit Bias Inherent Bias

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

Introduction Introduction to F to Feder ederal al Gr Grants ants Federal Funding Conference

Seattle Office of City Auditor PRESENTATION TO THE WORKING GROUP FOR PERFORMANCE AUDITING APRIL

Welcome Dr Faruk Majid GP and CCG Chair Agenda Welcome Annual report and accounts for

Audit Service Provider Briefing Portside Conference Centre 19 February 2019 Agenda Item Time

ISA 720 Other Information Cedric Gelard ISA 720 Task Force Chairman September 2014 IAASB

Board of Visitors Audit, Compliance, and Risk Committee June 2018 1 Action Items: 1. Audit

[ Name of event ] 2 ND June, 2017 University of Gibraltar, [ Date ] Murray Steele, JJ

Bank Individual Accountability Regime the fuller picture Simon Morris April 2015 Looking at

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias