fairness_training¶
Guaranteed fairness constraints for PyTorch neural networks
What is fairness_training?¶
fairness_training lets you train any PyTorch network with hard fairness constraints. Unlike penalty-based methods that only encourage fairness, this library guarantees that predictions satisfy specified criteria on every training batch.
The approach: append a differentiable fairness layer — a convex optimization problem solved via cvxpylayers — that projects raw predictions onto the feasible set defined by your constraints. Because the projection is differentiable, gradients flow back through it into the network weights during training.
Key Features¶
-
Verified Fairness
Hard constraints guarantee that the specified constraints are satisfied
-
End-to-End Learning and Constraint-Aware
The fairness layer is fully differentiable, enabling the model to learn how to satisfy constraints during training as opposed to relying on post-hoc corrections after model training
-
Flexible Architecture
Works with any classification or regression architecture. Just append the fairness layer to the end of your model
-
Online Inference
Novel primal-dual algorithm provides aggregate fairness guarantees over time even with small batch sizes during real-time inference
How It Works¶
flowchart LR
A[Input X] --> B["Neural Network f(·)"]
B --> C["Raw Predictions ẑ = f(X)"]
C --> D["Fairness Layer g(·)"]
D --> E["Fair Predictions ŷ = g(ẑ)"]
style D fill:#e1f5fe
The fairness layer solves a convex optimization problem:
This projection is differentiable via implicit differentiation through the KKT conditions, enabling standard backpropagation.
Supported Fairness Criteria¶
| Metric | Description | Use Case |
|---|---|---|
| Mean Prediction Parity | \( \lvert E[\hat{y} \mid x_j=0] - E[\hat{y} \mid x_j=1] \rvert \leq \epsilon \) | Regression or Classification Scores |
| Mean Residual Fairness | \( \lvert E[y - \hat{y} \mid x_j=a] \rvert \leq \epsilon \ \forall a \) | Regression or Classification Scores |
| Equalized Odds | \( \lvert E[\hat{y} \mid x_j=0, y=a] - E[\hat{y} \mid x_j=1, y = a] \rvert \leq \epsilon\ \forall a \in \{0,1\} \) | Binary Classification |
| Custom Metrics | Extend FairnessMetric base class |
Affine fairness constraints |
Citation¶
If you use fairness_training in your research, please cite:
@inproceedings{troxell2026fairness,
title = {Differentiable Optimization Layers for Guaranteed Fairness in Deep Learning},
author = {Troxell, David and Roemer, Noah and Mont{\'u}far, Guido},
booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
year = {2026},
note = {To appear}
}
@inproceedings{agrawal2019differentiable,
title = {Differentiable Convex Optimization Layers},
author = {Agrawal, Akshay and Amos, Brandon and Barratt, Shane and Boyd, Stephen and Diamond, Steven and Kolter, Zico},
booktitle = {Advances in Neural Information Processing Systems},
volume = {32},
year = {2019}
}
Next Steps¶
-
Install the fairness_training package
-
Learn the core concepts, assumptions, and how to use fairness_training effectively
-
Complete documentation of all classes and functions
-
End-to-end examples on real datasets