Home/Concepts/Artificial Intelligence/Bias and Fairness in AI

Bias and Fairness in AI

Detect, measure, and mitigate bias in machine learning models

⏱️ 29 min⚡ 20 interactions

Why AI Fairness Matters

AI Bias and Fairness addresses how machine learning systems can discriminate against protected groups. Even well-intentioned models trained on historical data can perpetuate and amplify societal inequalities. Understanding bias sources and mitigation strategies is crucial for ethical AI.

⚠️ The Core Problem

📊

Biased Data

Historical discrimination in training data

🔄

Amplification

Feedback loops worsen bias over time

⚖️

Real Harm

Discriminatory outcomes affect lives

Understanding the Origins of AI Bias

What is AI Bias?

AI bias occurs when machine learning systems systematically favor or discriminate against certain groups, individuals, or outcomes. Unlike random errors that affect predictions equally, bias creates systematic disparities that often disadvantage already marginalized groups.

The critical insight: AI doesn't create bias from nothing. It learns and often amplifies existing patterns in data, design choices, and societal structures. Even models trained on "objective" data can perpetuate historical discrimination.

🎯 Key Distinction: Bias vs Variance

Statistical Bias (model error): Difference between model's expected prediction and true value. This is technical, not necessarily unfair.

Bias = E[f̂(x)] - f(x)

Fairness Bias (social harm): Systematic discrimination against protected groups. This is what we address in fairness research.

Example: A hiring model with 5% error rate for men but 20% error rate for women has fairness bias even if statistical bias is low.

The Four Primary Sources of Bias

AI bias isn't monolithic—it emerges from multiple, often interconnected sources throughout the machine learning pipeline. Understanding each source is crucial for effective mitigation.

📊

1. Data Bias: The Foundation Problem

Data bias occurs when training data fails to represent reality fairly. Since models learn patterns from data, biased data produces biased models—garbage in, garbage out.

Types of Data Bias:

📉 Representation Bias (Sampling Bias)

Training data underrepresents or misrepresents certain groups. The data doesn't reflect the true population distribution.

Example: ImageNet Dataset

Early image recognition datasets were predominantly Western-centric. A model trained on these would fail to recognize non-Western weddings, clothing, or food because those images were severely underrepresented (e.g., 45% US images vs 3% from India despite India having 4× population).

⏳ Historical Bias

Data accurately reflects historical reality, but that reality contains discrimination. The model learns to perpetuate past injustices.

Example: Hiring Data

Amazon's recruiting tool trained on 10 years of resumes (mostly male engineers) learned that male-associated terms correlated with "good hire." The data accurately reflected past hiring, but past hiring was discriminatory. Model penalized resumes containing "women's chess club" or graduates of women's colleges.

🏷️ Label Bias (Measurement Bias)

Labels/outcomes are measured differently or incorrectly across groups, encoding bias into the "ground truth."

Example: Criminal Risk Assessment

Using "arrest" or "conviction" as labels assumes criminal justice system is unbiased. In reality, Black defendants are arrested/convicted at higher rates for similar behavior due to policing patterns. Model learns: "Black → high risk" not because of actual recidivism but because of biased measurement.

🔍 Aggregation Bias

Using a single model for different populations when their relationships between features and outcomes differ.

Example: Medical Diagnostics

Heart attack symptoms differ by gender (men: chest pain; women: nausea, fatigue). A model trained on pooled data learns predominantly male patterns. Result: Lower accuracy for women (women are 50% more likely to be misdiagnosed).

⚙️

2. Algorithm Bias: Design Choices Matter

Even with perfect data, algorithmic choices—objective functions, regularization, feature selection—can introduce or amplify bias.

🎯 Objective Function Bias

Optimizing overall accuracy often prioritizes majority group performance since they provide most training examples.

Mathematical Example:

Loss = (1/N) Σ loss(ŷᵢ, yᵢ)

Dataset: 900 Group A samples, 100 Group B samples. Achieve 95% accuracy on A, 60% accuracy on B.

Overall accuracy = (900×0.95 + 100×0.60)/1000 = 91.5%

Model is incentivized to improve Group A (larger contribution to loss) while neglecting Group B. 40% error rate on B is acceptable if it improves A!

🔧 Feature Selection Bias

Features that work well for majority may poorly represent minority experiences or encode stereotypes.

Example: Using "years of experience" in hiring models may disadvantage women who took career breaks for caregiving (societal pressure women face). Feature is technically accurate but encodes gendered societal structures.

👤

3. Human Bias: Designer Choices

ML practitioners' implicit biases and cultural assumptions shape system design, from problem framing to feature engineering to evaluation criteria.

🧠 Cognitive Biases in ML Development:

•
Automation Bias: Over-trusting model predictions, assuming AI is "objective" when it reflects training data's biases
•
Confirmation Bias: Interpreting results in ways that confirm existing beliefs about group differences
•
WEIRD Bias: Designing for Western, Educated, Industrialized, Rich, Democratic populations while claiming universality

Example: "Professionalism" Definition

A resume screening tool might penalize non-Western names, non-standard English, or employment gaps—reflecting designers' culturally-specific notions of "professional." These are subjective human judgments encoded as "objective" features.

🔄

4. Feedback Loop Bias: Self-Fulfilling Prophecies

The most insidious source: model predictions influence future data, creating self-reinforcing cycles that amplify initial biases exponentially.

📈 How Feedback Loops Work:

Initial Model trained on biased historical data (e.g., more arrests in Black neighborhoods)

Model Predicts higher crime risk in those neighborhoods

Police Deploy more officers to "high-risk" areas based on predictions

More Arrests occur in over-policed areas (not because more crime, but more scrutiny)

New Data shows even higher arrest rates in those neighborhoods

Model Retrains on this data, predictions become even more biased

⚠️ Result: Initial bias compounds with each iteration, creating a self-fulfilling prophecy that's extremely difficult to break.

Example: Content Recommendation

YouTube recommends videos → Users watch recommended content → Model learns these patterns → Recommends similar content → Creates "filter bubbles" and echo chambers → Users see increasingly extreme content → Model amplifies this pattern. Initial 10% bias toward sensational content becomes 80% dominance after iterations.

🔗 Intersecting Biases: The Compounding Effect

These sources don't operate in isolation—they interact and amplify each other:

→Historical data bias + Objective function that prioritizes accuracy = Model that optimizes for majority group
→Human biases in feature selection + Feedback loops = Self-reinforcing stereotypes encoded as "objective" patterns
→Representation bias + Aggregation bias = Systematically poor performance on underrepresented groups

1. Sources of AI Bias

🔍 Interactive: Explore Bias Origins

📊

Data Bias

Training data reflects historical discrimination or underrepresents groups

Examples:

•Historical hiring data favors one gender
•Image datasets with racial imbalance
•Loan data from redlined neighborhoods

⚠️ Impact:

Model learns and perpetuates existing inequalities

2. Dataset Representation Bias

📊 Interactive: Visualize Data Imbalance

Group A Representation: 70%

70%

30%

Group A (Majority)

70%

Model learns patterns well

Group B (Minority)

30%

Model may underperform

🚨 Severe Imbalance: High risk of poor performance on minority group

Mathematical Foundations of Fairness

The Challenge: Defining "Fair"

Unlike accuracy or precision (which have clear mathematical definitions), fairness is context-dependent and often contested. Different stakeholders have competing notions of what constitutes fair treatment. The machine learning community has formalized several mathematical definitions, each capturing different intuitions about fairness.

Critical insight: These fairness definitions are often mathematically incompatible—satisfying one makes it impossible to satisfy others except in trivial cases. This is known as the Impossibility Theorem of Fairness.

🎯 Notation

A ∈ {0,1}

Protected attribute (0=Group A, 1=Group B)

Example: A=0 (male), A=1 (female)

Y ∈ {0,1}

True outcome/label

Example: Y=1 (qualified), Y=0 (not qualified)

Ŷ ∈ {0,1}

Model prediction

Example: Ŷ=1 (predict hire), Ŷ=0 (predict reject)

S ∈ [0,1]

Model score/probability

Example: S=0.73 (73% probability of positive)

⚖️

1. Demographic Parity (Statistical Parity)

Mathematical Definition:

P(Ŷ=1 | A=0) = P(Ŷ=1 | A=1)

Probability of positive prediction is independent of protected attribute

Demographic parity requires that both groups receive positive outcomes at the same rate, regardless of ground truth qualifications. This is "fairness as equal representation in outcomes."

✓ Example: University Admissions

1000 applicants: 500 Group A, 500 Group B

Admit 200 total students (20% rate)

Demographic Parity requires:

Admit 100 from Group A (20%) AND 100 from Group B (20%)

P(admit | A) = 100/500 = 0.20
P(admit | B) = 100/500 = 0.20
✓ 0.20 = 0.20 → Parity satisfied

✓ When to Use:

• Equal opportunity policies (quotas)
• Outcomes should match population distribution
• Historical exclusion needs correction
• Ground truth labels may be biased

✗ Limitations:

• Ignores actual qualifications/merit
• May require unequal treatment
• Can violate individual fairness
• Assumes equal base rates across groups

⚠️ Controversial Issue: If groups have genuinely different qualification rates (e.g., different application pools), demographic parity requires selecting less qualified candidates from one group or more qualified from another. Critics call this "reverse discrimination."

🎯

2. Equal Opportunity (Equalized TPR)

Mathematical Definition:

P(Ŷ=1 | Y=1, A=0) = P(Ŷ=1 | Y=1, A=1)

True Positive Rate (TPR) / Recall / Sensitivity is equal across groups

Equal opportunity focuses on the qualified individuals. It requires that among people who should get positive outcomes (Y=1), both groups have equal probability of receiving them. This is "fairness as equal treatment of qualified candidates."

✓ Example: Job Hiring

Group A: 80 qualified (Y=1), 20 not qualified (Y=0)
Group B: 60 qualified (Y=1), 40 not qualified (Y=0)

Model predictions:

Group A: Hire 64 of 80 qualified → TPR = 64/80 = 0.80 (80%)
Group B: Hire 48 of 60 qualified → TPR = 48/60 = 0.80 (80%)

✓ Both groups: qualified individuals have 80% chance of being hired

Note: False positive rates can differ. Maybe Group A has 5 false positives (5/20=0.25) while Group B has 8 (8/40=0.20). Equal Opportunity doesn't constrain FPR.

✓ When to Use:

• Merit-based decisions important
• False negatives more harmful than false positives
• "Don't deny qualified people" priority
• Hiring, admissions, opportunity allocation

✗ Limitations:

• Ignores false positive rates (can differ)
• Requires accurate ground truth labels
• Doesn't address unequal base rates
• May allow different treatment of unqualified

💡 Intuition: Equal opportunity says "qualified people should have equal chances regardless of group." It's less restrictive than demographic parity—allows different selection rates if groups have different qualification rates, but ensures qualified individuals aren't disadvantaged by their group membership.

📊

3. Equalized Odds (Equalized TPR and FPR)

Mathematical Definition:

P(Ŷ=1 | Y=1, A=0) = P(Ŷ=1 | Y=1, A=1)

AND

P(Ŷ=1 | Y=0, A=0) = P(Ŷ=1 | Y=0, A=1)

Both TPR and FPR must be equal across groups

Equalized odds is the most restrictive: it requires equal treatment of both qualified (Y=1) and unqualified (Y=0) individuals. Groups must have same TPR (opportunity for qualified) and same FPR (protection for unqualified). This is "fairness as equal error rates."

✓ Example: Loan Approval

Group A: 70 will repay (Y=1), 30 will default (Y=0)
Group B: 60 will repay (Y=1), 40 will default (Y=0)

Equalized Odds requires:

TPR (approve those who'll repay):

Group A: 63/70 = 0.90 = Group B: 54/60 = 0.90 ✓

FPR (approve those who'll default):

Group A: 3/30 = 0.10 = Group B: 4/40 = 0.10 ✓

✓ Both error types equalized: 90% of good borrowers get loans, 10% of bad borrowers get loans (both groups)

✓ When to Use:

• Both types of errors matter
• False positives and negatives both harmful
• High-stakes decisions (lending, criminal justice)
• Want comprehensive fairness guarantee

✗ Limitations:

• Most restrictive (hardest to satisfy)
• Often reduces overall accuracy
• Requires very accurate labels
• May be impossible with different base rates

🎓 Key Insight: Equalized odds = Equal opportunity + Equal FPR. It's stricter because it protects both qualified people (via TPR) and unqualified people (via FPR) from group-based discrimination. Most "fair" in theory, but hardest to achieve in practice.

🚫

The Impossibility Theorem of Fairness

Proven mathematical result: Except in trivial cases (perfect prediction or equal base rates), you cannot simultaneously satisfy demographic parity, equal opportunity, and equalized odds. Fairness is trade-offs, not absolutes.

Why These Metrics Conflict:

Scenario: Medical School Admissions

Population:

Group A: 100 applicants, 80 qualified (80%)

Group B: 100 applicants, 40 qualified (40%)

Perfect classifier (100% accuracy):

Admits all 80 qualified from A, all 40 qualified from B

✓ Equal Opportunity: SATISFIED

TPR(A) = 80/80 = 1.0 = TPR(B) = 40/40 = 1.0

All qualified candidates admitted (both groups)

✓ Equalized Odds: SATISFIED

TPR(A)=TPR(B)=1.0 and FPR(A)=FPR(B)=0.0

Perfect predictions for both groups

✗ Demographic Parity: VIOLATED

P(Ŷ=1|A) = 80/100 = 0.80 ≠ P(Ŷ=1|B) = 40/100 = 0.40

Admission rates differ: 80% vs 40%

⚠️ The Dilemma: To satisfy demographic parity, must admit 60 from each group (equal rates). But Group B only has 40 qualified → must either admit 20 unqualified from B (violates merit) or reject 20 qualified from A (violates equal opportunity). No solution satisfies all constraints!

Mathematical Proof Sketch:

When base rates differ [P(Y=1|A=0) ≠ P(Y=1|A=1)], demographic parity forces equal positive prediction rates, but equal opportunity/equalized odds forces predictions to track true positives. You can't have equal overall rates AND equal conditional rates unless base rates are equal or predictor is perfect. QED.

🤔 How to Choose a Fairness Metric

Since you can't satisfy all metrics, choosing a fairness definition is a value judgment, not a technical decision. It depends on context, stakeholder input, and ethical considerations.

Choose Demographic Parity when:

• Historical exclusion needs active correction
• Outcomes should match demographics
• Ground truth labels may be biased
• Example: Diverse candidate slates

Choose Equal Opportunity when:

• Merit/qualifications matter
• False negatives more harmful
• "Equal chance if qualified" principle
• Example: Competitive admissions

Choose Equalized Odds when:

• Both error types matter equally
• High-stakes decisions
• Comprehensive fairness needed
• Example: Criminal sentencing

⚠️ Critical Warning: No metric is "correct." Each embeds different values. Involve stakeholders (especially affected communities) in choosing fairness criteria. Technical optimization alone cannot resolve ethical questions about how to treat people fairly.

3. Fairness Metrics

📐 Interactive: Compare Fairness Definitions

Demographic Parity

Mathematical Definition:

P(Ŷ=1|A=0) = P(Ŷ=1|A=1)

Equal positive prediction rates across groups

✓ When to Use:

When equal representation in outcomes is desired

⚠️ Limitation:

Ignores actual qualifications or base rates

The Dynamics of Bias Amplification

Feedback Loops: From Bad to Catastrophic

One of the most dangerous aspects of AI bias is amplification through feedback loops. Unlike static bias that remains constant, feedback loops create self-reinforcing cycles where model predictions influence future data collection, which influences future predictions, creating exponential growth in bias.

A model with just 10% initial bias can reach 80-90% bias after just a few feedback iterations. This isn't a bug—it's the natural consequence of deploying ML systems that interact with the world.

🔄

The Feedback Loop Mechanism

Feedback loops occur when model outputs become model inputs in the next iteration. This creates a circular dependency that can amplify small biases exponentially.

The Six-Step Cycle:

Initial Training Data (t=0)

Historical data contains existing bias (e.g., 60% arrests in Black neighborhoods, 40% in white neighborhoods due to historical policing patterns, not actual crime rates)

Model Learns Patterns

Algorithm learns: "Black neighborhood → high crime probability." Not because it's racist, but because it optimizes to fit the biased training data patterns

Deployment & Predictions

Model predicts higher crime risk in Black neighborhoods. System outputs: "Deploy 70% of police to Black neighborhoods, 30% to white neighborhoods"

Human Actions Based on Predictions

Police follow model recommendations. More officers patrol Black neighborhoods → More surveillance, more stops, more scrutiny

New Data Collection (t=1)

More policing → More arrests in Black neighborhoods (not because more crime occurs, but because more eyes are watching). New data shows 75% of arrests in Black neighborhoods

Model Retraining → Amplification

Model retrained on new data (75% vs 25%). Learns even stronger association. Next iteration predicts 80% policing in Black neighborhoods. Cycle continues, bias grows exponentially

🚨 The Catastrophic Result:

Each iteration amplifies the bias. After 3-5 iterations, the system creates a self-fulfilling prophecy: predictions appear "accurate" because the system itself creates the reality it predicts. The bias becomes institutionalized and nearly impossible to detect from accuracy metrics alone.

📈

Mathematical Model of Amplification

We can model bias amplification mathematically to understand the exponential growth dynamics:

Basic Amplification Model:

B(t+1) = B(t) × (1 + α)

Where:

B(t) = bias at time t

α = amplification rate per iteration (typically 0.2-0.5)

Example: Content Recommendation Bias

Initial: 10% of recommended videos are extreme content
Amplification rate: α = 0.3 (30% increase per iteration)

Iteration 0: B(0) = 10%

Iteration 1: B(1) = 10% × 1.3 = 13%

Iteration 2: B(2) = 13% × 1.3 = 16.9%

Iteration 3: B(3) = 16.9% × 1.3 = 22%

Iteration 4: B(4) = 22% × 1.3 = 28.6%

Iteration 5: B(5) = 28.6% × 1.3 = 37.2%

Iteration 10: B(10) = 10% × (1.3)^10 = 137% → 100% (capped)

⚠️ After just 10 iterations (maybe 10 weeks), nearly all recommended content is extreme. Started at 10%, ended at saturation.

General Solution (Exponential Growth):

B(t) = B(0) × (1 + α)^t

This is exponential growth with base (1+α). Similar to compound interest, but for bias instead of money.

Doubling time: t = log(2) / log(1+α)
For α=0.3: doubles every ~2.6 iterations

📊 Key Insight: The amplification rate α depends on how strongly the model's predictions influence future data. Higher influence → faster amplification. This is why deployed systems with direct feedback (recommendations, policing, lending) are especially dangerous.

🌍

Real-World Amplification Cases

🚨 Predictive Policing (PredPol)

Initial bias: Historical arrest data overrepresented minority neighborhoods (legacy of discriminatory policing)

Iteration 1: System predicts high crime in those areas → More police deployed

Iteration 2-5: More arrests in predicted areas (observation bias) → Model confidence increases → Even more policing

Result: After multiple iterations, system creates massively disproportionate policing. Studies found some neighborhoods had 10× policing intensity vs demographically similar areas, purely due to algorithmic feedback loop

📱 YouTube Radicalization Pipeline

Initial: User watches one conspiracy video (10% of watch history)

Amplification: Algorithm notices high engagement (controversial content gets clicks) → Recommends similar content → User watches more → Algorithm updates: "This user likes extreme content" → Recommends even more extreme videos

Outcome: Studies showed users can go from mainstream content to extremist content in 5-10 recommendations (hours/days). The algorithm doesn't "intend" radicalization—it optimizes for watch time, but creates radicalization pipeline as side effect

💳 Credit Scoring Spiral

t=0: Model denies loan to marginalized group at slightly higher rate due to historical bias

t=1: Denied applicants can't build credit history → Next model sees "no history = high risk" → Even higher denial rate

t=2-3: Compound effect: No loans → No assets → Lower income → Model predicts higher default risk → More denials

Generational impact: Creates self-perpetuating poverty cycles. Initial 5-10% disparity becomes 40-50% wealth gap over decades

🛡️

Breaking Feedback Loops

Feedback loops are hard to break because the system appears to be working correctly (predictions match observations). Intervention requires understanding the causal structure:

✓ Effective Interventions:

• Randomized deployments: Randomly vary predictions to gather unbiased data (like A/B testing but for fairness)
• Causal modeling: Model counterfactuals ("what would have happened if...") instead of just correlations
• External data sources: Use data not influenced by model predictions
• Temporal discounting: Weight recent data less if it's influenced by model

✗ Ineffective (Common Mistakes):

• Just retraining: Using biased new data amplifies rather than fixes bias
• Removing protected attributes: Doesn't stop feedback if proxies exist
• Accuracy monitoring alone: Feedback loops can increase accuracy while worsening fairness
• One-time correction: Feedback loops require continuous monitoring, not one-time fixes

🔬 Research Finding: Studies show that once a feedback loop is established, it requires 3-5× more effort to reverse than it would have taken to prevent initially. Prevention is far easier than cure—design systems to avoid feedback loops from the start.

⚠️ Critical Takeaway: Bias Amplification is the Default

Without active intervention, deployed ML systems with any feedback component will naturally amplify existing biases. This isn't a failure of engineering—it's a mathematical consequence of optimizing on biased feedback. Fairness requires constant vigilance, not just good initial training.

4. Bias Amplification Over Time

🔄 Interactive: Feedback Loop Simulation

Initial Bias Level: 10%

Iteration 110%

Iteration 213%

Iteration 317%

Final Bias After 3 Iterations:

22%

Moderate: Some amplification but manageable

🔄 Feedback Loop: Model predictions influence future data collection, creating a self-reinforcing cycle. Small initial biases compound exponentially without intervention.

5. Protected Attributes

🛡️ Interactive: Identify Sensitive Features

Protected attributes are characteristics that should not influence decisions in fair systems. Click to toggle which attributes to protect:

Protected Attributes1

⚠️ Note: Removing protected attributes from features doesn't guarantee fairness. Proxy variables (zip code for race, name for gender) can still encode bias.

Disparate Impact: Legal and Statistical Framework

From Legal Doctrine to ML Metric

Disparate impact is a legal concept that became a fundamental fairness metric in machine learning. Unlike disparate treatment (intentional discrimination), disparate impact focuses on outcomes: even neutral policies can be discriminatory if they disproportionately harm protected groups.

The 80% Rule (Four-Fifths Rule) is the most widely used quantitative standard for detecting discrimination in employment, lending, and increasingly, in AI systems. It provides a clear mathematical threshold for when outcome disparities constitute evidence of bias.

⚖️

Legal Origins and Framework

The disparate impact doctrine originated from Griggs v. Duke Power Co. (1971), a landmark U.S. Supreme Court case. The court ruled that employment practices with discriminatory effects violate civil rights law, even without discriminatory intent.

Historical Case: Griggs v. Duke Power (1971)

Situation: Company required high school diploma for employment
Claimed Purpose: "Ensure quality workers" (race-neutral policy)
Impact: 34% of white applicants had diploma vs 12% of Black applicants
Disparate Impact Ratio: 12/34 = 0.35 = 35% (far below 80%)

Court Decision: Policy was discriminatory despite neutral language because (1) disparate impact existed and (2) diploma requirement wasn't related to job performance (couldn't be justified as business necessity).

Disparate Treatment

• Intentional discrimination
• Different rules for different groups
• Requires proof of intent
• Example: "No women allowed"

Disparate Impact

• Unintentional, outcomes-based
• Same rules, different results
• Proven by statistical evidence
• Example: Height requirements excluding women

📊

The 80% Rule: Mathematical Definition

Formal Definition:

Disparate Impact Ratio = min(SR_A, SR_B) / max(SR_A, SR_B)

Where SR = Selection Rate for each group

DI Ratio ≥ 0.80 → No prima facie discrimination

DI Ratio < 0.80 → Evidence of disparate impact

The 80% Rule states: The selection rate for the protected group should be at least 80% (four-fifths) of the selection rate for the group with the highest rate. Falls below this threshold trigger legal scrutiny.

✓ Step-by-Step Calculation Example

Scenario: Loan Approval System

Group A (majority): 500 applicants, 400 approved

Group B (minority): 300 applicants, 180 approved

Step 1: Calculate Selection Rates

SR_A = 400/500 = 0.80 = 80%

SR_B = 180/300 = 0.60 = 60%

Step 2: Identify Min and Max

min(80%, 60%) = 60% (Group B)

max(80%, 60%) = 80% (Group A)

Step 3: Calculate Ratio

DI Ratio = 60% / 80% = 0.75 = 75%

Step 4: Interpret

75% < 80% → Disparate impact detected!
Group B's approval rate is only 75% of Group A's rate. This falls below the 80% threshold and constitutes prima facie evidence of discrimination requiring justification.

💡 Why 80%? The threshold is somewhat arbitrary (established in 1978 Uniform Guidelines on Employee Selection), but has become the legal standard. It represents "substantial disparity"—a 20% difference is considered significant, while smaller differences might be due to chance or legitimate factors.

📐

Relationship to Statistical Parity

The 80% Rule is closely related to statistical parity (demographic parity) but is less strict. Statistical parity requires exact equality (100% ratio), while the 80% Rule allows for some disparity (recognizing real-world variability).

Statistical Parity (Strict)

P(Ŷ=1|A=0) = P(Ŷ=1|A=1)

Requires: Exactly equal selection rates

Example: 80% = 80% (ratio: 100%)

✓ Ideal fairness standard

✗ May be unrealistic/impossible

80% Rule (Pragmatic)

min/max ≥ 0.80

Allows: Up to 20% disparity

Example: 72% vs 90% (ratio: 80%)

✓ Legally enforceable threshold

✓ Accounts for statistical noise

Interpretation Guide:

Ratio ≥ 80%✓ Acceptable

70% ≤ Ratio < 80%⚠️ Concerning (investigate)

60% ≤ Ratio < 70%🚨 Significant disparity

Ratio < 60%🔴 Severe discrimination

⚖️

Legal Burden and Defenses

Proving disparate impact shifts the burden of proof to the employer/system deployer. The three-stage legal framework:

Stage 1: Prima Facie Case (Plaintiff)

Show that 80% Rule is violated using statistical evidence. If ratio < 80%, establishes presumption of discrimination. Burden shifts to defendant.

Stage 2: Business Necessity Defense (Defendant)

Must prove the practice is job-related and consistent with business necessity. Not enough to say "our model is accurate"—must show discriminatory criteria are essential.

Example Defenses:

• ✓ Physical strength requirement for construction workers (genuinely necessary)
• ✗ College degree for janitor position (not necessary for job)
• ✓ Credit check for financial officer (related to job duties)
• ✗ Credit check for retail cashier (not clearly necessary)

Stage 3: Less Discriminatory Alternative (Plaintiff)

If business necessity shown, plaintiff can still prevail by proving a less discriminatory alternative exists that achieves the same business goal. Forces consideration of fairness in design choices.

⚠️ For ML Systems: "Our algorithm is accurate" is not a valid defense for disparate impact. Must show (1) the features causing disparity are necessary for the stated purpose and (2) no alternative approach with less disparity exists. This is a high bar!

🤖

Applying the 80% Rule to ML Systems

The 80% Rule provides a clear, testable criterion for ML fairness. Unlike abstract fairness concepts, it gives developers and auditors a specific threshold to measure against.

Practical Implementation Steps:

1.
Identify Protected Groups: Determine which attributes (race, gender, age, etc.) are protected by law in your jurisdiction
2.
Measure Selection Rates: Calculate positive prediction rate for each group (even if protected attribute not used as feature—test on outcomes!)
3.
Compute DI Ratio: min/max of selection rates. Flag if < 80%
4.
Document Justification: If ratio < 80%, document business necessity and explore alternatives
5.
Monitor Continuously: DI ratio can drift over time due to feedback loops or distribution shift

✓ Advantages:

• Clear numerical threshold (80%)
• Legally recognized standard
• Easy to compute and explain
• Focuses on outcomes, not process
• Courts understand it (decades of precedent)

✗ Limitations:

• Binary comparison (only 2 groups at once)
• Doesn't consider error types (FP vs FN)
• 80% threshold somewhat arbitrary
• Doesn't account for intersectionality
• May conflict with accuracy/merit

🔍 Best Practice: Use the 80% Rule as a minimum baseline, not the only fairness criterion. Systems passing the 80% Rule can still have unfair error distributions (unequal TPR/FPR). Combine with equal opportunity or equalized odds for comprehensive fairness.

🎯 Key Takeaway: Quantifying Discrimination

The 80% Rule bridges law and machine learning by providing a concrete, enforceable standard for disparate impact. ML practitioners should treat it as a mandatory baseline check, not an optional "nice-to-have." Violating the 80% Rule exposes organizations to legal liability and ethical scrutiny.

6. Disparate Impact: The 80% Rule

⚖️ Interactive: Hiring Bias Simulator

Male Approval Rate: 80%

Female Approval Rate: 50%

Male Approval

80%

Female Approval

50%

Disparate Impact Ratio

63%

Min(rate1, rate2) / Max(rate1, rate2)

Concerning

Below 80% threshold

📋 80% Rule: Legal guideline that selection rate for protected group should be at least 80% of the highest group's rate. Below this threshold suggests discrimination.

7. Bias Mitigation Strategies

🛠️ Interactive: Apply Mitigation Techniques

No Mitigation

Standard training without fairness constraints

Bias Reduction

Model Accuracy

92%

8. Group-Specific Thresholds

🎚️ Interactive: Adjust Decision Boundaries

Group A Threshold: 0.50

Group B Threshold: 0.50

Group A Threshold

0.50

Balanced

Group B Threshold

0.50

Balanced

✅ Equal Thresholds: Same decision boundary for both groups

🎯 Threshold Adjustment: One post-processing technique is to set different classification thresholds for different groups to achieve equal error rates or equal opportunity.

9. Real-World Bias Cases

📰 Interactive: Learn from History

💼

Amazon Hiring Tool

2018

⚠️ The Issue:

Resume screening AI penalized women due to male-dominated training data

📊 Outcome:

System scrapped after bias discovered

💡 Lesson Learned:

Historical data perpetuates discrimination

10. Fairness-Accuracy Tradeoff

⚖️ Interactive: Balance Competing Goals

Fairness Priority: 50%

Maximize AccuracyBalancedMaximize Fairness

Model Accuracy

85.0%

Fairness Score

50%

Combined Score

67.5%

⚖️ The Tradeoff: Balanced approach considers both fairness and accuracy. Often the best practical choice.

🎯 Key Takeaways

📊

Bias Has Multiple Sources

Data, algorithms, human designers, and feedback loops all contribute to AI bias. Address all sources, not just one.

📐

No Universal Fairness Metric

Demographic parity, equal opportunity, and equalized odds are incompatible. Choose based on context and stakeholder values.

🔄

Feedback Loops Amplify Bias

Initial small biases compound over time as model predictions influence future data. Monitor and intervene continuously.

🛡️

Removing Features Isn't Enough

Proxy variables (zip code, name, school) can encode protected attributes. Audit for disparate impact even without explicit sensitive features.

🛠️

Multiple Mitigation Approaches

Pre-processing (data), in-processing (training), and post-processing (predictions) each have tradeoffs. Often combine multiple techniques.

⚖️

Fairness vs Accuracy Tradeoff

Some accuracy loss is often acceptable for fairness. The "right" balance depends on domain, risks, and ethical considerations. No technical solution alone.