Design Principles¶
This document explains the key architectural decisions and design principles underlying the RWA calculator.
Core Principles¶
1. Single Codebase, Dual Framework¶
Decision: Support both CRR and Basel 3.1 in a single codebase.
Rationale: - Avoids code duplication - Enables easy comparison between frameworks - Simplifies maintenance - Supports transition planning
Implementation:
# Framework-specific behavior controlled by configuration
if config.framework == RegulatoryFramework.CRR:
rwa = rwa * config.scaling_factor # 1.06
else:
rwa = apply_output_floor(rwa, sa_rwa, config.output_floor)
2. Pure LazyFrame Operations¶
Decision: Use Polars LazyFrames exclusively for all data operations.
Rationale: - Enables query optimization by Polars - Reduces memory usage through lazy evaluation - Allows automatic parallelization - Avoids row-by-row iteration (major performance gain)
Anti-pattern (Avoided):
# BAD: Row-by-row processing
for row in dataframe.iter_rows():
result = calculate_rwa(row)
results.append(result)
Pattern (Used):
# GOOD: Vectorized operation
result = df.with_columns(
rwa=pl.col("ead") * pl.col("risk_weight") * pl.col("factor")
)
3. Protocol-Based Interfaces¶
Decision: Define component interfaces using Python Protocols.
Rationale: - Enables dependency injection - Supports testing with mocks - Allows multiple implementations - Documents expected behavior
Implementation:
from typing import Protocol
class LoaderProtocol(Protocol):
def load(self, path: Path) -> RawDataBundle:
"""Load raw data from source."""
...
# Any class with matching signature satisfies protocol
class ParquetLoader:
def load(self, path: Path) -> RawDataBundle:
# Implementation
...
class MockLoader:
def load(self, path: Path) -> RawDataBundle:
# Test implementation
...
4. Immutable Data Contracts¶
Decision: All data transfer objects (bundles) are frozen dataclasses.
Rationale: - Prevents accidental mutation - Ensures thread safety - Supports lazy evaluation - Makes data flow predictable
Implementation:
from dataclasses import dataclass
import polars as pl
@dataclass(frozen=True)
class RawDataBundle:
counterparties: pl.LazyFrame
facilities: pl.LazyFrame
loans: pl.LazyFrame
collateral: pl.LazyFrame | None = None
guarantees: pl.LazyFrame | None = None
5. Error Accumulation¶
Decision: Accumulate errors rather than fail fast.
Rationale: - Reports all validation issues at once - Supports audit requirements - Allows partial results - Better user experience
Implementation:
@dataclass
class LazyFrameResult:
frame: pl.LazyFrame
errors: list[CalculationError] = field(default_factory=list)
@property
def has_errors(self) -> bool:
return any(e.severity != ErrorSeverity.WARNING for e in self.errors)
@property
def warnings(self) -> list[CalculationError]:
return [e for e in self.errors if e.severity == ErrorSeverity.WARNING]
# Usage
result = processor.apply_crm(data, config)
if result.has_errors:
for error in result.errors:
logger.error(f"{error.exposure_reference}: {error.message}")
6. Factory Methods for Configuration¶
Decision: Use factory methods rather than direct construction.
Rationale: - Self-documenting code - Encapsulates complex initialization - Ensures valid combinations - Easier to use correctly
Implementation:
class CalculationConfig:
@classmethod
def crr(cls, reporting_date: date, **kwargs) -> "CalculationConfig":
"""Create CRR configuration with appropriate defaults."""
return cls(
framework=RegulatoryFramework.CRR,
reporting_date=reporting_date,
scaling_factor=Decimal("1.06"),
pd_floors=PDFloors.crr(),
# ... other CRR defaults
**kwargs
)
@classmethod
def basel_3_1(cls, reporting_date: date, **kwargs) -> "CalculationConfig":
"""Create Basel 3.1 configuration with appropriate defaults."""
return cls(
framework=RegulatoryFramework.BASEL_3_1,
reporting_date=reporting_date,
scaling_factor=Decimal("1.0"),
pd_floors=PDFloors.basel_3_1(),
# ... other Basel 3.1 defaults
**kwargs
)
7. Hierarchical Join Pattern¶
Decision: Use iterative joins for hierarchy resolution instead of Python dictionaries.
Rationale: - 50-100x performance improvement - Stays within Polars optimization - Handles deep hierarchies efficiently - Avoids Python GIL limitations
Anti-pattern (Avoided):
# BAD: Python dictionary lookup
parent_dict = {row['id']: row['parent'] for row in df}
results = [parent_dict.get(x) for x in ids]
Pattern (Used):
# GOOD: Polars join
result = (
df
.join(parent_df, left_on="parent_id", right_on="id")
.select(["id", "resolved_parent"])
)
Module Organization¶
Main Entry Point First¶
Decision: Place main entry points at the top of modules.
Rationale: - Reads like a book - Easy to find key functionality - Follows "newspaper" style - Better developer experience
Implementation:
# module.py
# Main entry point at top
def calculate_rwa(exposures: Bundle, config: Config) -> Result:
"""Calculate RWA for all exposures."""
validated = _validate_exposures(exposures)
enriched = _apply_crm(validated, config)
return _compute_rwa(enriched, config)
# Supporting functions below
def _validate_exposures(exposures: Bundle) -> Bundle:
...
def _apply_crm(exposures: Bundle, config: Config) -> Bundle:
...
def _compute_rwa(exposures: Bundle, config: Config) -> Result:
...
Clean Imports¶
Decision: Organize imports clearly with separation.
# Standard library
from dataclasses import dataclass
from datetime import date
from decimal import Decimal
# Third-party
import polars as pl
from polars_normal_stats import normal_cdf, normal_ppf
# Local - contracts first
from rwa_calc.contracts.bundles import ResultBundle
from rwa_calc.contracts.config import CalculationConfig
# Local - engine components
from rwa_calc.engine.irb.formulas import calculate_k
Error Handling Strategy¶
Validation Errors¶
Collected and reported:
errors = []
if pd < config.pd_floors.minimum:
errors.append(ValidationError(
field="pd",
message=f"PD {pd} below floor {config.pd_floors.minimum}"
))
Calculation Errors¶
Logged with context:
try:
k = calculate_k(pd, lgd, correlation)
except ValueError as e:
errors.append(CalculationError(
exposure_id=exposure.id,
stage="IRB",
message=str(e)
))
System Errors¶
Raised immediately:
Testing Philosophy¶
Test-Driven Development¶
- Write acceptance test (what should happen)
- Write unit tests (how it should work)
- Implement to pass tests
- Refactor with confidence
Test Organization¶
tests/
├── acceptance/ # End-to-end scenarios
│ ├── crr/ # CRR scenarios
│ └── basel31/ # Basel 3.1 scenarios
├── contracts/ # Interface compliance
├── unit/ # Component tests
│ ├── test_loader.py
│ ├── test_hierarchy.py
│ └── crr/ # Framework-specific
└── fixtures/ # Test data generation
Test Naming¶
def test_sa_corporate_rated_cqs2_returns_50_percent_risk_weight():
"""Clear description of what is being tested."""
...
def test_sme_factor_tiered_calculation_above_threshold():
"""Documents the specific scenario."""
...
Documentation Philosophy¶
Code as Documentation¶
Self-documenting names and types:
def calculate_maturity_adjustment(
pd: float,
effective_maturity_years: float
) -> float:
"""Calculate maturity adjustment factor for IRB."""
...
Regulatory References¶
Link to articles:
def calculate_sme_supporting_factor(
total_exposure: Decimal,
threshold: Decimal = EUR_2_5M
) -> Decimal:
"""
Calculate SME supporting factor per CRR Article 501.
The factor provides capital relief for SME exposures using
a tiered approach based on total exposure amount.
"""
...
Next Steps¶
- Pipeline Architecture - Detailed pipeline documentation
- Data Flow - How data moves through the system
- Component Overview - Individual components