Testing Guide¶
This guide covers the testing approach, test organisation, and how to write effective tests. The test suite currently contains ~2,065 tests across unit, acceptance, contract, integration, and benchmark categories.
| Category | Tests | Description |
|---|---|---|
| Unit | 1,537 | Component tests in isolation |
| Acceptance | 275 | End-to-end regulatory scenarios (CRR + Basel 3.1 + comparison) |
| Contract | 125 | Protocol/interface compliance |
| Benchmark | 27–34 | Performance at various scales (7 slow tests deselected by default) |
| Integration | 101 | Cross-component integration |
| Total | ~2,065 |
Test Organisation¶
tests/
├── unit/ # Component unit tests (1,537 tests)
│ ├── crr/ # CRR-specific tests
│ ├── crm/ # Credit risk mitigation tests
│ ├── irb/ # IRB-specific tests
│ ├── basel31/ # Basel 3.1-specific tests
│ └── api/ # API tests
├── acceptance/ # End-to-end scenario tests (275 tests)
│ ├── crr/ # CRR framework scenarios (97 tests)
│ ├── basel31/ # Basel 3.1 framework scenarios (116 tests)
│ └── comparison/ # Cross-framework comparison tests (62 tests)
├── contracts/ # Interface compliance tests (125 tests)
├── integration/ # Cross-component integration tests (101 tests)
├── benchmarks/ # Performance tests (27–34 tests)
│ ├── data_generators.py # Dataset generation for various scales
│ └── data/ # Cached benchmark datasets (parquet)
├── bdd/ # BDD-style tests (step definitions)
├── fixtures/ # Test data generators (parquet fixtures)
│ ├── counterparty/ # Counterparty fixtures (5 types)
│ ├── exposures/ # Facility, loan, contingent, mapping fixtures
│ ├── collateral/ # Collateral fixtures
│ ├── guarantee/ # Guarantee fixtures
│ ├── provision/ # Provision fixtures
│ ├── ratings/ # Rating fixtures
│ ├── mapping/ # Hierarchy mapping fixtures
│ ├── fx_rates/ # FX rates fixtures
│ └── generate_all.py # Master fixture generation script
└── expected_outputs/ # Golden files for acceptance tests
├── crr/ # CRR expected RWA outputs
└── basel31/ # Basel 3.1 expected RWA outputs
Running Tests¶
All Tests¶
# Run entire test suite (benchmarks and slow tests excluded by default)
uv run pytest
# With verbose output
uv run pytest -v
# With coverage
uv run pytest --cov=src/rwa_calc --cov-report=html
By Category¶
# Unit tests only
uv run pytest tests/unit
# Acceptance tests
uv run pytest tests/acceptance
uv run pytest tests/acceptance/crr
uv run pytest tests/acceptance/basel31
uv run pytest tests/acceptance/comparison
# Contract tests
uv run pytest tests/contracts
# Benchmarks (requires --benchmark-only or --benchmark-enable)
uv run pytest tests/benchmarks --benchmark-only
# Include slow tests (10M+ scale)
uv run pytest -m slow
Specific Tests¶
# Run specific file
uv run pytest tests/unit/test_pipeline.py
# Run specific test
uv run pytest tests/unit/test_pipeline.py::test_crr_basic_calculation
# Run by pattern
uv run pytest -k "test_sa_"
# Stop on first failure
uv run pytest -x
# Show local variables in tracebacks
uv run pytest -l
# Run last failed tests
uv run pytest --lf
Test Categories¶
Unit Tests¶
Test individual components in isolation:
# tests/unit/test_ccf.py
import pytest
from rwa_calc.engine.ccf import get_ccf
from rwa_calc.domain.enums import RegulatoryFramework
class TestCCF:
"""Tests for credit conversion factor calculation."""
def test_unconditionally_cancellable_crr_returns_zero(self):
"""Unconditionally cancellable commitments have 0% CCF under CRR."""
ccf = get_ccf(
item_type="UNDRAWN_COMMITMENT",
is_unconditionally_cancellable=True,
original_maturity_years=5,
framework=RegulatoryFramework.CRR,
)
assert ccf == 0.0
def test_unconditionally_cancellable_basel31_returns_ten_percent(self):
"""Unconditionally cancellable has 10% CCF under Basel 3.1."""
ccf = get_ccf(
item_type="UNDRAWN_COMMITMENT",
is_unconditionally_cancellable=True,
original_maturity_years=5,
framework=RegulatoryFramework.BASEL_3_1,
)
assert ccf == 0.10
Contract Tests¶
Test interface compliance:
# tests/contracts/test_calculator_protocol.py
import pytest
from rwa_calc.contracts.protocols import SACalculatorProtocol
from rwa_calc.engine.sa.calculator import SACalculator
class TestSACalculatorProtocol:
"""Verify SACalculator implements protocol correctly."""
def test_implements_protocol(self):
"""SACalculator should implement SACalculatorProtocol."""
calculator = SACalculator()
assert isinstance(calculator, SACalculatorProtocol)
def test_calculate_returns_result_bundle(self, sample_exposures, config):
"""Calculate should return SAResultBundle."""
calculator = SACalculator()
result = calculator.calculate(sample_exposures, config)
assert hasattr(result, "data")
assert hasattr(result, "errors")
Acceptance Tests¶
End-to-end tests that run fixture data through the full production pipeline and compare results against pre-calculated expected outputs (golden files). There are three suites:
CRR scenarios (97 tests across 9 files):
| File | Tests | Covers |
|---|---|---|
test_scenario_crr_a_sa.py |
14 | Standardised Approach risk weights |
test_scenario_crr_b_firb.py |
13 | Foundation IRB |
test_scenario_crr_c_airb.py |
7 | Advanced IRB |
test_scenario_crr_d_crm.py |
9 | Credit Risk Mitigation |
test_scenario_crr_e_slotting.py |
9 | Specialised Lending Slotting |
test_scenario_crr_f_supporting_factors.py |
15 | SME/Infrastructure factors |
test_scenario_crr_g_provisions.py |
17 | Provision resolution |
test_scenario_crr_h_complex.py |
4 | Complex/combined scenarios |
test_scenario_crr_i_defaulted.py |
9 | Defaulted exposures |
Basel 3.1 scenarios (116 tests across 9 files):
| File | Tests | Covers |
|---|---|---|
test_scenario_b31_a_sa.py |
14 | SA risk weights (PRA PS1/26) |
test_scenario_b31_b_firb.py |
16 | Foundation IRB |
test_scenario_b31_c_airb.py |
13 | Advanced IRB |
test_scenario_b31_d_crm.py |
15 | Credit Risk Mitigation |
test_scenario_b31_d7_parameter_substitution.py |
5 | IRB parameter substitution |
test_scenario_b31_e_slotting.py |
13 | Specialised Lending Slotting |
test_scenario_b31_f_output_floor.py |
6 | Output floor (72.5%) |
test_scenario_b31_g_provisions.py |
24 | Provision resolution |
test_scenario_b31_h_complex.py |
10 | Complex/combined scenarios |
Comparison tests (62 tests) validate that CRR and Basel 3.1 results relate to each other correctly (e.g. output floor binds when SA RWA exceeds IRB).
Each acceptance test looks up a specific exposure in the pipeline results and asserts against the expected output:
# tests/acceptance/crr/test_scenario_crr_a_sa.py
class TestCRRGroupA_StandardisedApproach:
def test_crr_a1_uk_sovereign_zero_rw(
self,
sa_results_df: pl.DataFrame,
expected_outputs_dict: dict[str, dict[str, Any]],
) -> None:
"""
CRR-A1: UK Sovereign with CQS 1 should have 0% risk weight.
Input: £1,000,000 loan to UK Government (CQS 1)
Expected: RWA = £0 (0% RW per CRR Art. 114)
"""
expected = expected_outputs_dict["CRR-A1"]
result = get_sa_result_for_exposure(sa_results_df, "LOAN_SOV_UK_001")
assert result is not None, "Exposure LOAN_SOV_UK_001 not found in SA results"
assert_risk_weight_match(
result["risk_weight"], expected["risk_weight"], scenario_id="CRR-A1"
)
assert_rwa_within_tolerance(
result["rwa_post_factor"], expected["rwa_after_sf"], scenario_id="CRR-A1"
)
The session-scoped fixtures (sa_results_df, expected_outputs_dict, pipeline_results, etc.) are defined in each suite's conftest.py. The pipeline runs once per session and results are shared across all tests.
Benchmark Tests¶
Performance tests at various scales. See Benchmark Tests for full details.
# Run benchmarks (uses cached datasets)
uv run pytest tests/benchmarks --benchmark-only
# Force regenerate all benchmark datasets
uv run pytest tests/benchmarks --benchmark-only --benchmark-regenerate
# Force regenerate a specific scale
uv run pytest tests/benchmarks --benchmark-only --benchmark-regenerate-scale=100k
Scales: 10K (quick, ~1s), 100K (standard, ~5s), 1M (large, ~60s), 10M (production, slow marker).
Acceptance Test Datasets¶
Acceptance tests depend on two things: input fixture data (parquet files in tests/fixtures/) and expected outputs (golden files in tests/expected_outputs/). Understanding how to generate and maintain these is essential for working with acceptance tests.
Data flow¶
tests/fixtures/ tests/expected_outputs/
├── counterparty/*.py ├── crr/expected_rwa_crr.json
├── exposures/*.py └── basel31/expected_rwa_b31.json
├── collateral/*.py
├── ... │
│ │
▼ generate_all.py │
│ │
tests/fixtures/ │
├── counterparty/*.parquet │
├── exposures/*.parquet │
├── ... │
│ │
▼ conftest.py: load_fixtures() │
│ │
RawDataBundle (LazyFrames) │
│ │
▼ conftest.py: PipelineOrchestrator.run() │
│ │
AggregatedResultBundle │
│ │
▼ test assertions ◄────────────────────────┘
Generating fixture data¶
Each subdirectory in tests/fixtures/ contains Python modules with create_*() and save_*() functions that produce parquet files. The master script generates everything in the correct order:
This produces parquet files across 8 fixture groups:
| Group | Directory | Files | Description |
|---|---|---|---|
| Counterparties | counterparty/ |
sovereign, institution, corporate, retail, specialised_lending | All counterparty types with various CQS bands |
| Mappings | mapping/ |
org_mapping, lending_mapping | Organisational and lending group hierarchies |
| Ratings | ratings/ |
ratings | External and internal ratings |
| Exposures | exposures/ |
facilities, loans, contingents, facility_mapping | All exposure types with facility-to-loan mappings |
| Collateral | collateral/ |
collateral | Financial and non-financial collateral |
| Guarantees | guarantee/ |
guarantee | Guarantee and credit protection |
| Provisions | provision/ |
provision | Specific and general provisions |
| FX Rates | fx_rates/ |
fx_rates | Currency conversion rates |
The script also runs data integrity checks (referential integrity between counterparties, exposures, collateral, etc.).
When to regenerate: after modifying any create_*() function in tests/fixtures/, or when adding new test scenarios that require new input data.
Expected outputs (golden files)¶
Expected outputs live in tests/expected_outputs/ and define the correct RWA results for each test scenario. Three formats are supported (checked in priority order):
- Parquet (fastest) —
expected_rwa_crr.parquet/expected_rwa_b31.parquet - CSV (fallback) —
expected_rwa_crr.csv/expected_rwa_b31.csv - JSON (source of truth) —
expected_rwa_crr.json/expected_rwa_b31.json
Each record has at minimum:
scenario_id— unique test identifier (e.g."CRR-A1","B31-B03")scenario_group— grouping for fixture filtering (e.g."CRR-A","B31-D")risk_weight,rwa_after_sf,ead— expected calculation outputs
The JSON file is the canonical source of truth. Parquet/CSV are derived for faster loading. When updating expected values, edit the JSON and regenerate the other formats.
How conftest.py wires it together¶
Each acceptance suite's conftest.py (e.g. tests/acceptance/crr/conftest.py) provides session-scoped fixtures:
load_test_fixtures— callsload_fixtures()fromworkbooks/shared/fixture_loader.py, which reads all parquet files into aFixtureDatacontainer of LazyFramesraw_data_bundle— assembles theFixtureDatainto aRawDataBundle(the pipeline's input type)pipeline_results— runsPipelineOrchestrator().run_with_data(bundle, config)once per sessionsa_results_df/irb_results_df/slotting_results_df— collected DataFrames for each approachexpected_outputs_df/expected_outputs_dict— loaded golden file data for assertions
Different configs are used for different scenario groups (SA-only, full IRB, slotting permissions, etc.).
Adding a new acceptance test scenario¶
- Add input data in the appropriate
tests/fixtures/module (e.g. a new counterparty incorporate.py, a new loan inloans.py) - Regenerate fixtures:
uv run python tests/fixtures/generate_all.py - Add expected outputs to the relevant JSON golden file in
tests/expected_outputs/ - Write the test in the appropriate
test_scenario_*.pyfile, using conftest fixtures - Run and verify:
uv run pytest tests/acceptance/crr/test_scenario_crr_a_sa.py -v
Test Fixtures¶
Unit test fixtures¶
Unit tests use inline Polars DataFrames or @pytest.fixture functions:
@pytest.fixture
def sample_counterparty():
"""Single corporate counterparty."""
return pl.DataFrame({
"counterparty_id": ["C001"],
"counterparty_name": ["Acme Corp"],
"counterparty_type": ["CORPORATE"],
"country_code": ["GB"],
"annual_turnover": [30_000_000.0],
}).lazy()
Parametrized fixtures¶
@pytest.fixture(params=[
("CQS_1", 0.20),
("CQS_2", 0.50),
("CQS_3", 0.75),
("CQS_4", 1.00),
("UNRATED", 1.00),
])
def corporate_risk_weight_case(request):
"""Parametrized corporate risk weight test cases."""
cqs, expected_rw = request.param
return {"cqs": cqs, "expected_risk_weight": expected_rw}
Configuration fixtures¶
import pytest
from datetime import date
from rwa_calc.contracts.config import CalculationConfig
@pytest.fixture
def crr_config():
"""Standard CRR configuration."""
return CalculationConfig.crr(reporting_date=date(2026, 12, 31))
@pytest.fixture
def basel31_config():
"""Standard Basel 3.1 configuration."""
return CalculationConfig.basel_3_1(reporting_date=date(2027, 1, 1))
@pytest.fixture(params=["crr", "basel31"])
def both_frameworks(request, crr_config, basel31_config):
"""Run test under both frameworks."""
if request.param == "crr":
return crr_config
return basel31_config
Writing Effective Tests¶
Test Naming¶
Use descriptive names that explain what is being tested, under what conditions, and the expected outcome:
# Good
def test_sme_factor_tiered_calculation_exposure_above_threshold_returns_blended_factor():
...
# Bad
def test_sme():
...
Test Structure (AAA Pattern)¶
def test_irb_capital_requirement_calculation():
"""Test IRB K formula calculation."""
# Arrange
pd = 0.01
lgd = 0.45
correlation = 0.20
# Act
k = calculate_k(pd, lgd, correlation)
# Assert
assert k == pytest.approx(0.0445, rel=0.01)
Assertions¶
# Exact equality
assert result == expected
# Approximate equality
assert result == pytest.approx(expected, rel=0.01) # 1% tolerance
assert result == pytest.approx(expected, abs=0.001) # Absolute tolerance
# Collections
assert set(result) == set(expected)
assert result in expected_values
# DataFrame assertions
assert len(df) == expected_count
assert df["column"].sum() == expected_sum
Testing Errors¶
def test_invalid_pd_raises_error():
"""Negative PD should raise ValueError."""
with pytest.raises(ValueError, match="PD must be positive"):
calculate_k(pd=-0.01, lgd=0.45, correlation=0.20)
def test_calculation_accumulates_errors():
"""Invalid exposures should accumulate errors."""
result = pipeline.run_with_data(invalid_data, config)
assert result.has_errors
assert any("Invalid PD" in e.message for e in result.errors)
Test Markers¶
Markers are configured in pyproject.toml:
[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-v --tb=short --benchmark-disable -m 'not slow'"
markers = [
"benchmark: mark test as a benchmark (deselect with --benchmark-skip)",
"slow: mark test as slow (10M+ scale, may take several minutes)",
]
Usage:
Run by marker:
Coverage¶
Generate Coverage Report¶
# Terminal report
uv run pytest --cov=src/rwa_calc
# HTML report
uv run pytest --cov=src/rwa_calc --cov-report=html
open htmlcov/index.html
Coverage Configuration¶
# pyproject.toml
[tool.coverage.run]
source = ["src/rwa_calc"]
branch = true
[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"if TYPE_CHECKING:",
"raise NotImplementedError",
]
Next Steps¶
- Specifications - Regulatory specifications and scenarios
- Adding Features - Extending the calculator
- Code Style - Coding conventions
- Architecture - System design