Skip to content

Inference

The fairness_training package provides two inference regimes to handle different deployment scenarios. This guide explains when to use each and how they work.


The Two Regimes

Regime Batch Size Guarantee Use Case
Large-Batch ≥ b_tau Per-batch fairness Batch processing, offline inference
Small-Batch < b_tau Aggregate fairness over time Real-time, streaming

Use Large-Batch When:

  • You're doing batch processing (not real-time)
  • You can control batch composition
  • You need per-batch and aggregate fairness guarantees

Use Small-Batch When:

  • Real-time/streaming inference
  • Can't control when requests arrive
  • Fairness over time (i.e. after many, many inference inputs seen) is sufficient

Large-Batch Inference

When your inference batch size is ≥ b_tau (default 64), fairness_training enforces hard per-batch constraints.

Requirements

  • Batch size ≥ b_tau
  • Both groups must be present in each batch, and batch composition (i.e. group membership ratios) must be equal

Critical Requirement

Similar to training, the fairness_training package assumes you can use stratified sampling in your inference dataset when operating in the large-batch regime. This leads to exact fairness guarantees.


Small-Batch Inference (Primal-Dual Algorithm)

For real-time inference where you can't control batch sizes, fairness_training uses an online primal-dual algorithm with the following guarantee:

Key insight: Individual batches may violate constraints, but the sample-weighted average violation converges to at most ε as the number of batches T → ∞:

\[\bar{\Delta}_T = \frac{1}{N_T} \sum_{t=1}^{T} n_t \cdot \Delta_t \;\leq\; \varepsilon\]

This is the quantity reported as weighted_avg_fairness_gap in trainer.evaluate() output. It is not the same as the pooled gap (all predictions concatenated) — see Core Concepts for details. There is no finite-T guarantee: individual batches and short sequences may still show gaps above ε.

Parameters

Parameter Default Description
eta_0 0.5 Initial dual step size. Increase → more strict enforcement
b_tau 64 Threshold between large-batch (hard constraints) and small-batch (primal-dual) regimes

State Management

Always reset before a new inference sequence:

model.reset_inference_state()

This resets: - lambda_dual → 0 - dual_update_count → 0 - cumulative_samples → 0 - cumulative_weighted_violation → 0 - lambda_max → 0

When to Reset:

  • Before evaluating on a new test set
  • At the start of each day/session for a production system
  • When you want to measure fairness over a specific time window

When NOT to Reset:

  • During a continuous inference session where you want aggregate guarantees
  • In the middle of processing a batch sequence

Debugging Inference Issues

Common Issues

Fairness gap exceeds tolerance in aggregate stats for small-batch regime:

  • Likely need more samples for convergence
  • Check if group proportions are skewed
  • Try increasing eta_0

Fairness gap exceeds tolerance in aggregate stats for large-batch regime:

  • Check if stratified sampling is used
  • Try drastically increasing the width of prediction bounds
  • Optimization solvers often operate with some slight numerical error. Try slightly decreasing epsilon to account for this

Solver failures during inference:

  • Ensure both groups present in batch
  • Check for NaN/Inf in inputs
  • Relax fairness_tolerance if too tight

Logging for Production

import logging

logger = logging.getLogger('fairness_training')

def inference_with_logging(model, X, y=None):
    predictions = model(X, y=y, inference=True)

    # Log batch-level metrics
    logger.info(f"Batch size: {len(X)}")
    logger.info(f"Lambda: {model.lambda_dual:.4f}")
    logger.info(f"Cumulative samples: {model.cumulative_samples}")

    return predictions

Next Steps