Inference¶

The fairness_training package provides two inference regimes to handle different deployment scenarios. This guide explains when to use each and how they work.

The Two Regimes¶

Regime	Batch Size	Guarantee	Use Case
Large-Batch	≥ b_tau	Per-batch fairness	Batch processing, offline inference
Small-Batch	< b_tau	Aggregate fairness over time	Real-time, streaming

Use Large-Batch When:

You're doing batch processing (not real-time)
You can control batch composition
You need per-batch and aggregate fairness guarantees

Use Small-Batch When:

Real-time/streaming inference
Can't control when requests arrive
Fairness over time (i.e. after many, many inference inputs seen) is sufficient

Large-Batch Inference¶

When your inference batch size is ≥ b_tau (default 64), fairness_training enforces hard per-batch constraints.

Requirements

Batch size ≥ b_tau
Both groups must be present in each batch, and batch composition (i.e. group membership ratios) must be equal

Critical Requirement

Similar to training, the fairness_training package assumes you can use stratified sampling in your inference dataset when operating in the large-batch regime. This leads to exact fairness guarantees.

Small-Batch Inference (Primal-Dual Algorithm)¶

For real-time inference where you can't control batch sizes, fairness_training uses an online primal-dual algorithm with the following guarantee:

Key insight: Individual batches may violate constraints, but the sample-weighted average violation converges to at most ε as the number of batches T → ∞:

\[\bar{\Delta}_T = \frac{1}{N_T} \sum_{t=1}^{T} n_t \cdot \Delta_t \;\leq\; \varepsilon\]

This is the quantity reported as weighted_avg_fairness_gap in trainer.evaluate() output. It is not the same as the pooled gap (all predictions concatenated) — see Core Concepts for details. There is no finite-T guarantee: individual batches and short sequences may still show gaps above ε.

Parameters¶

Parameter	Default	Description
`eta_0`	0.5	Initial dual step size. Increase → more strict enforcement
`b_tau`	64	Threshold between large-batch (hard constraints) and small-batch (primal-dual) regimes

State Management¶

Always reset before a new inference sequence:

model.reset_inference_state()

This resets: - lambda_dual → 0 - dual_update_count → 0 - cumulative_samples → 0 - cumulative_weighted_violation → 0 - lambda_max → 0

When to Reset:

Before evaluating on a new test set
At the start of each day/session for a production system
When you want to measure fairness over a specific time window

When NOT to Reset:

During a continuous inference session where you want aggregate guarantees
In the middle of processing a batch sequence

Debugging Inference Issues¶

Common Issues

Fairness gap exceeds tolerance in aggregate stats for small-batch regime:

Likely need more samples for convergence
Check if group proportions are skewed
Try increasing eta_0

Fairness gap exceeds tolerance in aggregate stats for large-batch regime:

Check if stratified sampling is used
Try drastically increasing the width of prediction bounds
Optimization solvers often operate with some slight numerical error. Try slightly decreasing epsilon to account for this

Solver failures during inference:

Ensure both groups present in batch
Check for NaN/Inf in inputs
Relax fairness_tolerance if too tight

Logging for Production¶

import logging

logger = logging.getLogger('fairness_training')

def inference_with_logging(model, X, y=None):
    predictions = model(X, y=y, inference=True)

    # Log batch-level metrics
    logger.info(f"Batch size: {len(X)}")
    logger.info(f"Lambda: {model.lambda_dual:.4f}")
    logger.info(f"Cumulative samples: {model.cumulative_samples}")

    return predictions

Next Steps¶

Examples: Small-Batch Inference: Complete example
API Reference: FairModel: Full API documentation