Why must a weighted DVIR scoring model be deterministic?

The same defect must always resolve to the same score so a compliance action is reproducible and audit-defensible under 49 CFR § 396.11(a)(3). Externalize weights into a version-tagged manifest, use Decimal half-up rounding to avoid floating-point drift at band boundaries, and persist the weight version and factor vector on every scored record so an auditor can reconstruct any historical score exactly.

How do you stop a critical defect from scoring below the out-of-service threshold?

Apply two floors: any defect on a CVSA out-of-service component (brakes, steering, coupling) floors the safety factor at 0.90, and a driver-attested critical flag hard-clamps the final score to at least 70. The driver flag is treated as a floor, never a ceiling, so a mislabeled brake defect still surfaces as critical from the component-based safety factor per § 396.11(a)(3).

How do you keep vectorized batch scoring consistent with per-record scoring?

Keep the scalar function as the source of truth and pin the NumPy path to it with a parity test that scores the same rows both ways and asserts byte-identical results. Deduplicate by vehicle_vin and component_code before scoring so re-synced offline submissions do not double-count recurrence, and reapply the critical floor after clamping in both paths.

Building a Weighted Defect Scoring Model in Python

A normalized Driver Vehicle Inspection Report (DVIR) defect record answers what is wrong; it does not answer how urgently the vehicle must come off the road, and that gap is exactly where carriers get burned. 49 CFR § 396.11(a)(3) obliges the carrier to correct — and certify corrected — every defect that would affect safe operation before the unit is dispatched again, but the regulation gives no arithmetic for ranking a slack adjuster against a cracked mirror. If your model produces a different number for the same defect on two different runs, or if it lets a brake defect resolve below the out-of-service (OOS) threshold, the failure does not surface at ingestion — it surfaces months later as a Vehicle Maintenance BASIC violation at a roadside inspection, with no defensible audit trail behind the score. This page answers one focused question: how to build a deterministic, version-controlled weighted scoring function in Python that turns a validated defect into a reproducible 0–100 integer and maps it to the same severity bands the rest of the pipeline enforces. It is the concrete implementation of the algorithm specified by the parent Severity Scoring Algorithms for DVIR Defects reference, and its output is consumed directly by the Critical vs Non-Critical Routing Logic engine.

Prerequisites

This scoring function is a pure, side-effect-free consumer: it receives an already-validated defect and returns a score plus its factor breakdown. It does not fetch records and it does not route them. Before implementing it you need Python 3.10+ (the code uses match statements and the X | Y union operator) and:

pydantic>=2.6 — immutable input models and strict type coercion at the boundary.
numpy>=1.26 — vectorized factor arithmetic for fleet-scale batches.
pyyaml>=6.0 — version-controlled weight matrices loaded from configuration, never hard-coded.
decimal (stdlib) — fixed-point rounding so the same inputs never straddle a band boundary because of floating-point drift.

Two upstream contracts feed this page. Every inbound defect must already conform to the canonical record shape defined by the Standardized DVIR JSON Schema Design — a payload that fails that contract must be rejected before it reaches this layer, never defaulted to a low score. The component_code the safety factor keys off is the controlled vocabulary produced by Defect Code Standardization Across Fleets, so a code that resolves to a brake or steering group carries its OOS weight identically everywhere in the pipeline.

The scoring model emits exactly one integer per defect on the canonical scale, and the band cutoffs are held identical to every other page that references them:

Severity band	Score range	Routing obligation
Minor	`0–34`	Log to preventative-maintenance queue for trend analysis
Major	`35–69`	Schedule a regulated repair window before next dispatch
Critical	`70–100`	Trigger an immediate OOS hold under § 396.11(a)(3)

Step-by-Step Implementation

Step 1 — Load and validate the weight matrix

Hard-coded thresholds destroy the audit trail: when an auditor asks why this defect scored 72, you must be able to name the exact weight version that produced it. Externalize the four policy weights into a version-tagged YAML manifest and validate it with Pydantic so a malformed matrix fails loudly at load, not silently at scoring time. The four weights must sum to 1.0 — a matrix that does not is a configuration error, not a rounding curiosity.

import yaml
from pydantic import BaseModel, Field, model_validator


class WeightMatrix(BaseModel):
    """Version-controlled policy weights; the four must sum to 1.0."""
    version: str = Field(min_length=1)          # audit key, e.g. "2026-Q3-heavy"
    safety_impact: float = Field(ge=0.0, le=1.0)
    regulatory_risk: float = Field(ge=0.0, le=1.0)
    operational_downtime: float = Field(ge=0.0, le=1.0)
    historical_recurrence: float = Field(ge=0.0, le=1.0)

    @model_validator(mode="after")
    def weights_sum_to_one(self) -> "WeightMatrix":
        total = (
            self.safety_impact + self.regulatory_risk
            + self.operational_downtime + self.historical_recurrence
        )
        # A drifted matrix silently rescales every score — reject it.
        if abs(total - 1.0) > 1e-6:
            raise ValueError(f"weights sum to {total}, must equal 1.0")
        return self


def load_weights(path: str) -> WeightMatrix:
    """Load + validate the manifest; raise before any defect is scored."""
    with open(path, "r", encoding="utf-8") as fh:
        raw = yaml.safe_load(fh)
    return WeightMatrix.model_validate(raw)  # ValidationError propagates by design

Step 2 — Define the immutable defect input

The scoring function must never mutate its input, and it must reject anything it cannot type. Model the defect as a frozen Pydantic model whose fields match the canonical schema. Note that driver_severity_flag is treated as a floor, never a ceiling: a driver may under-report, but the § 396.11(a)(3) attestation means a driver-flagged critical defect can never be scored down below the critical band.

from datetime import datetime
from enum import Enum
from pydantic import BaseModel, Field


class DriverFlag(str, Enum):
    MINOR = "minor"
    MAJOR = "major"
    CRITICAL = "critical"


class DefectInput(BaseModel):
    model_config = {"frozen": True}  # pure function contract: no mutation

    dvir_id: str = Field(min_length=1)
    vehicle_vin: str = Field(min_length=17, max_length=17)
    component_code: str = Field(min_length=1)   # SAE J1939 SPN / OEM fault tree
    driver_severity_flag: DriverFlag
    timestamp_utc: datetime
    odometer_reading: int = Field(ge=0)
    recurrence_count: int = Field(ge=0)          # prior occurrences, this VIN+component
    is_oos_component: bool                        # brake/steering/coupling group, etc.

Step 3 — Extract the four normalized factors

Each factor is a pure mapping into the closed interval 0.0–1.0. Keeping the factors normalized is what lets the policy weights be interpreted as percentages and keeps the model auditable — every factor is independently explainable. The safety factor is anchored to the CVSA out-of-service criteria: any defect on an OOS-eligible component (brakes, steering, coupling) floors the safety factor high enough that no combination of low weights can pull the composite below the critical band.

from decimal import Decimal, ROUND_HALF_UP

# CVSA OOS components must dominate the safety factor regardless of other inputs.
OOS_SAFETY_FLOOR = 0.90
FLAG_FLOOR = {DriverFlag.MINOR: 0.0, DriverFlag.MAJOR: 0.35, DriverFlag.CRITICAL: 0.70}


def safety_impact(defect: DefectInput) -> float:
    """1.0 for an OOS-eligible component; otherwise driven by the driver flag floor."""
    if defect.is_oos_component:
        return max(OOS_SAFETY_FLOOR, FLAG_FLOOR[defect.driver_severity_flag])
    return FLAG_FLOOR[defect.driver_severity_flag]


def regulatory_risk(defect: DefectInput) -> float:
    """Weight the § 396.11(a)(3) driver attestation into a 0.0-1.0 risk."""
    return FLAG_FLOOR[defect.driver_severity_flag]


def operational_downtime(defect: DefectInput) -> float:
    """Higher-mileage units carry more downtime exposure; capped at 1.0."""
    return min(defect.odometer_reading / 750_000, 1.0)


def historical_recurrence(defect: DefectInput) -> float:
    """A repeat defect on the same component escalates; saturates at 3 priors."""
    return min(defect.recurrence_count / 3, 1.0)

Step 4 — Compute the deterministic composite score

The composite is a linear combination of the four factors and the weight matrix, scaled to 0–100. Use Decimal with explicit half-up rounding before the integer cast so a score of 69.5 rounds to 70 deterministically on every platform — floating-point round() uses banker’s rounding and will straddle the band boundary inconsistently. Clamp to [0, 100] and apply the driver flag as a hard floor so a critical attestation can never resolve below 70.

def score_defect(defect: DefectInput, weights: WeightMatrix) -> dict:
    """Pure function: (defect, weights) -> {score, band, factor_vector, weight_version}."""
    factors = {
        "safety_impact": safety_impact(defect),
        "regulatory_risk": regulatory_risk(defect),
        "operational_downtime": operational_downtime(defect),
        "historical_recurrence": historical_recurrence(defect),
    }
    weighted = (
        factors["safety_impact"] * weights.safety_impact
        + factors["regulatory_risk"] * weights.regulatory_risk
        + factors["operational_downtime"] * weights.operational_downtime
        + factors["historical_recurrence"] * weights.historical_recurrence
    )
    # Deterministic half-up rounding, then clamp to the canonical 0-100 scale.
    raw = Decimal(str(weighted * 100)).quantize(Decimal("1"), rounding=ROUND_HALF_UP)
    score = int(max(0, min(100, raw)))

    # § 396.11(a)(3): a driver-attested critical defect can never score below OOS.
    if defect.driver_severity_flag is DriverFlag.CRITICAL:
        score = max(score, 70)

    return {
        "score": score,
        "band": band_for(score),
        "factor_vector": {k: round(v, 4) for k, v in factors.items()},
        "weight_version": weights.version,  # persisted for audit reconstruction
    }


def band_for(score: int) -> str:
    """Map a 0-100 score to the canonical band the routing engine expects."""
    match score:
        case s if s >= 70:
            return "critical"   # 70-100 -> immediate OOS hold, § 396.11(a)(3)
        case s if s >= 35:
            return "major"      # 35-69  -> regulated repair window
        case _:
            return "minor"      # 0-34   -> preventative-maintenance queue

Step 5 — Vectorize for fleet-scale batches

The pure function above is correct but per-row; at fleet scale you score millions of defects per audit window. The band logic and OOS floor must produce byte-identical results whether run one row at a time or vectorized, so keep the scalar function as the source of truth and pin the two paths together in tests (Step 6). Before scoring a batch, deduplicate: a single inspection can carry the same defect twice from driver re-submission or an offline sync conflict, and scoring both double-counts recurrence. Group by vehicle_vin and component_code and keep the highest-severity instance.

import numpy as np
import pandas as pd


def score_batch(df: pd.DataFrame, weights: WeightMatrix) -> pd.DataFrame:
    """Vectorized scoring; expects the same fields as DefectInput. Dedupes first."""
    rank = {"minor": 1, "major": 2, "critical": 3}
    df = df.copy()
    df["_rank"] = df["driver_severity_flag"].map(rank)
    df = (
        df.sort_values("_rank")
          .drop_duplicates(subset=["vehicle_vin", "component_code"], keep="last")
          .drop(columns="_rank")
    )

    flag_floor = df["driver_severity_flag"].map(FLAG_FLOOR).to_numpy()
    safety = np.where(df["is_oos_component"].to_numpy(),
                      np.maximum(OOS_SAFETY_FLOOR, flag_floor), flag_floor)
    downtime = np.minimum(df["odometer_reading"].to_numpy() / 750_000, 1.0)
    recurrence = np.minimum(df["recurrence_count"].to_numpy() / 3, 1.0)

    weighted = (
        safety * weights.safety_impact
        + flag_floor * weights.regulatory_risk
        + downtime * weights.operational_downtime
        + recurrence * weights.historical_recurrence
    )
    scores = np.clip(np.rint(weighted * 100).astype(int), 0, 100)
    # Reapply the § 396.11(a)(3) critical floor after clamping.
    scores = np.where(df["driver_severity_flag"].to_numpy() == "critical",
                      np.maximum(scores, 70), scores)
    df["severity_score"] = scores
    return df[["dvir_id", "vehicle_vin", "component_code", "severity_score"]]

Verification and Testing

The score is a compliance artifact, so test the behaviors that keep a vehicle off the road — not merely that the code runs. Three assertions matter most: the OOS floor holds, the scalar and vectorized paths agree, and rounding is deterministic at the band boundary.

import pytest

W = WeightMatrix(version="test", safety_impact=0.5, regulatory_risk=0.2,
                 operational_downtime=0.15, historical_recurrence=0.15)


def _defect(**over) -> DefectInput:
    base = dict(dvir_id="d1", vehicle_vin="1FUJGLDR1CLBP8834",
                component_code="BRK-101", driver_severity_flag=DriverFlag.CRITICAL,
                timestamp_utc="2026-07-01T12:00:00Z", odometer_reading=0,
                recurrence_count=0, is_oos_component=True)
    return DefectInput(**{**base, **over})


def test_oos_component_never_scores_below_critical():
    # A brake defect with zero mileage and no history must still be OOS.
    assert score_defect(_defect(), W)["band"] == "critical"


def test_critical_flag_is_a_hard_floor():
    d = _defect(is_oos_component=False, odometer_reading=0, recurrence_count=0)
    assert score_defect(d, W)["score"] >= 70  # § 396.11(a)(3) floor


def test_weight_matrix_must_sum_to_one():
    with pytest.raises(ValueError):
        WeightMatrix(version="bad", safety_impact=0.5, regulatory_risk=0.5,
                     operational_downtime=0.5, historical_recurrence=0.5)


def test_scalar_and_vectorized_agree():
    rows = [_defect(dvir_id=f"d{i}", odometer_reading=i * 90_000,
                    driver_severity_flag=DriverFlag.MAJOR, is_oos_component=False,
                    recurrence_count=i % 3).model_dump() for i in range(1, 8)]
    df = pd.DataFrame(rows)
    batch = score_batch(df, W).set_index("dvir_id")["severity_score"].to_dict()
    for r in rows:
        expected = score_defect(DefectInput(**r), W)["score"]
        assert batch[r["dvir_id"]] == expected  # both paths must match exactly

Run these under pytest in CI and fail the build on any regression. The scalar-vs-vectorized parity test is the one that catches the most dangerous class of bug: a fast path that quietly disagrees with the reference path on a boundary case.

Common Failure Modes and Gotchas

Weight-matrix drift. Editing one weight without rebalancing the others changes every score in the fleet, and if the four no longer sum to 1.0 the whole scale silently rescales. The weights_sum_to_one validator rejects this at load, and persisting weight_version on every scored record lets an auditor reproduce any historical score exactly. Never edit weights in place without minting a new version tag.
Banker’s rounding at the band boundary. Python’s built-in round(69.5) returns 70 but round(68.5) returns 68 — banker’s rounding is non-obvious and platform-consistent only by luck. A defect that computes to exactly 69.5 must resolve the same way every run, so use Decimal with ROUND_HALF_UP in the scalar path and pin the vectorized path to it in tests.
Duplicate defects inflating recurrence. An offline mobile app that re-syncs on reconnect can submit the same defect twice, doubling the recurrence_count contribution and pushing a minor defect into the major band. Deduplicate by vehicle_vin + component_code before scoring, keeping the highest-severity instance, and reconcile offline-queued timestamps against the inspection instant, not the sync time.
Treating the driver flag as a ceiling. Clamping a score down to match a driver’s minor flag defeats the model — the flag is a floor under § 396.11(a)(3), not a cap. A brake defect a driver mislabels minor must still surface as critical from the component-based safety factor; the model overrides upward but never downward.

Severity Scoring Algorithms for DVIR Defects — the parent reference specifying the four-factor algorithm this page implements.
Critical vs Non-Critical Routing Logic — the engine that acts on the bands this model emits.
Dynamic Threshold Tuning for Fleet Types — how to version and tune the weight matrices per fleet class.
Standardized DVIR JSON Schema Design — the canonical record shape every scored defect is validated against.
Defect Code Standardization Across Fleets — the controlled vocabulary the safety factor keys off.

Part of the Severity Scoring Algorithms for DVIR Defects guide. Back to Defect Classification & Repair Order Routing.