Skip to content

fairness_training

Guaranteed fairness constraints for PyTorch neural networks

Python 3.9+ PyTorch 2.0+ License: MIT Open In Colab


What is fairness_training?

fairness_training lets you train any PyTorch network with hard fairness constraints. Unlike penalty-based methods that only encourage fairness, this library guarantees that predictions satisfy specified criteria on every training batch.

The approach: append a differentiable fairness layer — a convex optimization problem solved via cvxpylayers — that projects raw predictions onto the feasible set defined by your constraints. Because the projection is differentiable, gradients flow back through it into the network weights during training.


Key Features

  • Verified Fairness


    Hard constraints guarantee that the specified constraints are satisfied

  • End-to-End Learning and Constraint-Aware


    The fairness layer is fully differentiable, enabling the model to learn how to satisfy constraints during training as opposed to relying on post-hoc corrections after model training

  • Flexible Architecture


    Works with any classification or regression architecture. Just append the fairness layer to the end of your model

  • Online Inference


    Novel primal-dual algorithm provides aggregate fairness guarantees over time even with small batch sizes during real-time inference


How It Works

flowchart LR
    A[Input X] --> B["Neural Network f(·)"]
    B --> C["Raw Predictions ẑ = f(X)"]
    C --> D["Fairness Layer g(·)"]
    D --> E["Fair Predictions ŷ = g(ẑ)"]

    style D fill:#e1f5fe

The fairness layer solves a convex optimization problem:

\[ g(z) = \arg\min_{\tilde{y}} \|\tilde{y} - z\|_2^2 \quad \text{s.t.} \quad \text{constraints satisfied} \]

This projection is differentiable via implicit differentiation through the KKT conditions, enabling standard backpropagation.


Supported Fairness Criteria

Metric Description Use Case
Mean Prediction Parity \( \lvert E[\hat{y} \mid x_j=0] - E[\hat{y} \mid x_j=1] \rvert \leq \epsilon \) Regression or Classification Scores
Mean Residual Fairness \( \lvert E[y - \hat{y} \mid x_j=a] \rvert \leq \epsilon \ \forall a \) Regression or Classification Scores
Equalized Odds \( \lvert E[\hat{y} \mid x_j=0, y=a] - E[\hat{y} \mid x_j=1, y = a] \rvert \leq \epsilon\ \forall a \in \{0,1\} \) Binary Classification
Custom Metrics Extend FairnessMetric base class Affine fairness constraints

Citation

If you use fairness_training in your research, please cite:

@inproceedings{troxell2026fairness,
  title     = {Differentiable Optimization Layers for Guaranteed Fairness in Deep Learning},
  author    = {Troxell, David and Roemer, Noah and Mont{\'u}far, Guido},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
  year      = {2026},
  note      = {To appear}
}
This package relies heavily on the wonderful cvxpylayers package. We encourage you to also cite their work:

@inproceedings{agrawal2019differentiable,
  title     = {Differentiable Convex Optimization Layers},
  author    = {Agrawal, Akshay and Amos, Brandon and Barratt, Shane and Boyd, Stephen and Diamond, Steven and Kolter, Zico},
  booktitle = {Advances in Neural Information Processing Systems},
  volume    = {32},
  year      = {2019}
}

Next Steps

  • Getting Started


    Install the fairness_training package

  • User Guide


    Learn the core concepts, assumptions, and how to use fairness_training effectively

  • API Reference


    Complete documentation of all classes and functions

  • Examples


    End-to-end examples on real datasets