Skip to content

Classification & Routing

Building a Weighted Defect Scoring Model in Python

Fleet compliance pipelines require deterministic translation of unstructured Driver Vehicle Inspection Report (DVIR) entries into quantifiable risk metrics. A weighted defect scoring model serves as the computational bridge between raw inspection telemetry and actionable maintenance dispatch. When implementing this model, precision in configuration, rigorous edge-case handling, and predictable score normalization are non-negotiable. The architecture must align with established Severity Scoring Algorithms for DVIR Defects to ensure regulatory alignment and operational consistency across mixed-fleet deployments.

1. Configuration Architecture & Schema Validation

Anchor link to "1. Configuration Architecture & Schema Validation"

The foundation of a reliable scoring engine lies in a strictly typed configuration schema. Hard-coded thresholds introduce technical debt and obscure audit trails. Instead, externalize severity weights, regulatory multipliers, and component decay factors into a version-controlled YAML manifest. A production-ready configuration maps defect codes to a three-tier severity matrix aligned with FMCSA guidance:

  • Critical (Tier 1): Immediate out-of-service (OOS) conditions per 49 CFR § 396.11.
  • Major (Tier 2): Repair required within twenty-four hours or prior to next dispatch.
  • Minor (Tier 3): Monitor, schedule preventative maintenance, or log for trend analysis.

Each tier requires a base weight, a fleet-type multiplier (e.g., heavy-duty vs. light commercial), and a regulatory penalty coefficient. When parsing this configuration, validate against a Pydantic model to catch schema drift before runtime. Missing keys or malformed multipliers will silently corrupt downstream routing logic. Implement strict validation at ingestion using model_validate() with explicit error handling for ValidationError exceptions. Refer to Pydantic’s official validation documentation for robust schema enforcement patterns.

2. Vectorized Scoring Engine & Time Decay

Anchor link to "2. Vectorized Scoring Engine & Time Decay"

The scoring function should leverage vectorized operations via pandas or numpy to maintain linear performance across high-volume inspection batches. Initialize a scoring matrix where each row represents a unique vehicle-defect pair and columns represent weight components. Apply the deterministic formula:

composite_score = (base_weight * fleet_multiplier) + (regulatory_penalty * time_decay_factor)

Time decay should be calculated using exponential smoothing to prevent stale defects from artificially inflating current risk profiles. The decay factor follows exp(-λ * days_since_inspection), where λ is a configurable smoothing constant (typically 0.1 to 0.3 for daily fleet cycles). When aggregating scores per vehicle, cap the maximum at one hundred to maintain interpretability for dispatch dashboards. Use np.clip() to enforce boundaries and prevent arithmetic overflow during batch processing. Always isolate the scoring logic into a pure function with explicit type hints to enable static analysis and unit testing without side effects.

3. Data Hygiene & Deduplication Logic

Anchor link to "3. Data Hygiene & Deduplication Logic"

Real-world DVIR data is inherently noisy. The most common failure mode occurs when a single inspection report contains duplicate defect entries for the same component. This typically stems from driver re-submission, offline caching, or telematics sync conflicts. Implement a deduplication pass that groups by vehicle_id, component_code, and inspection_timestamp, retaining only the highest-severity instance before weight calculation.

Secondary noise sources include unstructured free-text notes and OCR artifacts from paper-based forms. Apply a normalization layer that maps synonymous defect descriptions to standardized component_code values. Failing to sanitize inputs at this stage will cascade into false-positive routing triggers and unnecessary shop floor congestion.

The following implementation demonstrates a production-ready, type-safe pipeline that integrates configuration validation, deduplication, vectorized scoring, and boundary enforcement.

python
import numpy as np
import pandas as pd
import yaml
from pydantic import BaseModel, ValidationError, model_validator
from typing import Dict

# --- Configuration Schema ---
class SeverityTier(BaseModel):
    base_weight: float
    fleet_multiplier: float
    regulatory_penalty: float

class DefectConfig(BaseModel):
    tiers: Dict[str, SeverityTier]
    decay_lambda: float = 0.15

    @model_validator(mode="after")
    def validate_weights(cls, values):
        for tier_name, tier in values.tiers.items():
            if tier.base_weight < 0 or tier.fleet_multiplier <= 0:
                raise ValueError(f"Invalid weights for {tier_name}")
        return values

def load_config(path: str) -> DefectConfig:
    """Loads + validates the version-controlled YAML manifest."""
    with open(path, "r") as fh:
        raw = yaml.safe_load(fh)
    try:
        return DefectConfig.model_validate(raw)
    except ValidationError as exc:
        raise RuntimeError(f"Invalid defect config at {path}: {exc}") from exc

# --- Pure Scoring Function ---
def compute_dvir_scores(
    inspections: pd.DataFrame,
    config: DefectConfig
) -> pd.DataFrame:
    """
    Computes weighted defect scores per vehicle using vectorized operations.
    Expects DataFrame columns: vehicle_id, component_code, severity_tier, inspection_date
    """
    # 1. Deduplicate: Keep highest severity per vehicle/component
    tier_order = {"Critical": 3, "Major": 2, "Minor": 1}
    inspections["severity_rank"] = inspections["severity_tier"].map(tier_order)
    deduped = inspections.sort_values("severity_rank").drop_duplicates(
        subset=["vehicle_id", "component_code"], keep="last"
    ).drop(columns="severity_rank")

    # 2. Map configuration weights
    tier_weights = pd.DataFrame.from_dict(config.tiers, orient="index")
    merged = deduped.merge(tier_weights, left_on="severity_tier", right_index=True, how="left")

    # 3. Calculate time decay
    merged["days_elapsed"] = (pd.Timestamp.now() - pd.to_datetime(merged["inspection_date"])).dt.days
    merged["time_decay_factor"] = np.exp(-config.decay_lambda * merged["days_elapsed"])

    # 4. Vectorized composite score
    merged["composite_score"] = (
        (merged["base_weight"] * merged["fleet_multiplier"]) +
        (merged["regulatory_penalty"] * merged["time_decay_factor"])
    )

    # 5. Aggregate & clip
    vehicle_scores = merged.groupby("vehicle_id")["composite_score"].sum().reset_index()
    vehicle_scores["normalized_score"] = np.clip(vehicle_scores["composite_score"], 0, 100.0)

    return vehicle_scores[["vehicle_id", "normalized_score"]]

5. Compliance Integration & Dispatch Routing

Anchor link to "5. Compliance Integration & Dispatch Routing"

Once scores are normalized, they must feed directly into maintenance dispatch logic. Threshold mapping should follow a deterministic routing matrix:

  • Score ≥ 85: Trigger immediate OOS workflow, notify compliance officer, and generate a digital repair order with mandatory pre-trip clearance.
  • Score 50–84: Schedule within 24 hours, flag for next available bay, and attach historical defect trends.
  • Score < 50: Log to preventative maintenance queue for trend analysis and parts forecasting.

This routing architecture directly supports Defect Classification & Repair Order Routing by ensuring that computational risk scores translate into auditable maintenance actions. All scoring events, configuration versions, and routing decisions must be persisted to an immutable audit log to satisfy FMCSA recordkeeping requirements and internal QA audits. Vectorized processing guarantees sub-second latency even across enterprise-scale fleets, while strict Pydantic validation ensures that configuration drift never compromises regulatory compliance.

By treating defect scoring as a deterministic, version-controlled computation rather than an ad-hoc heuristic, fleet operators achieve predictable maintenance dispatch, reduced roadside violations, and transparent compliance reporting across all vehicle classes.