Testing Guide¶

This guide covers the testing approach, test organisation, and how to write effective tests. The test suite currently contains ~2,065 tests across unit, acceptance, contract, integration, and benchmark categories.

Category	Tests	Description
Unit	1,537	Component tests in isolation
Acceptance	275	End-to-end regulatory scenarios (CRR + Basel 3.1 + comparison)
Contract	125	Protocol/interface compliance
Benchmark	27–34	Performance at various scales (7 slow tests deselected by default)
Integration	101	Cross-component integration
Total	~2,065

Test Organisation¶

tests/
├── unit/                    # Component unit tests (1,537 tests)
│   ├── crr/                 # CRR-specific tests
│   ├── crm/                 # Credit risk mitigation tests
│   ├── irb/                 # IRB-specific tests
│   ├── basel31/             # Basel 3.1-specific tests
│   └── api/                 # API tests
├── acceptance/              # End-to-end scenario tests (275 tests)
│   ├── crr/                 # CRR framework scenarios (97 tests)
│   ├── basel31/             # Basel 3.1 framework scenarios (116 tests)
│   └── comparison/          # Cross-framework comparison tests (62 tests)
├── contracts/               # Interface compliance tests (125 tests)
├── integration/             # Cross-component integration tests (101 tests)
├── benchmarks/              # Performance tests (27–34 tests)
│   ├── data_generators.py   # Dataset generation for various scales
│   └── data/                # Cached benchmark datasets (parquet)
├── bdd/                     # BDD-style tests (step definitions)
├── fixtures/                # Test data generators (parquet fixtures)
│   ├── counterparty/        # Counterparty fixtures (5 types)
│   ├── exposures/           # Facility, loan, contingent, mapping fixtures
│   ├── collateral/          # Collateral fixtures
│   ├── guarantee/           # Guarantee fixtures
│   ├── provision/           # Provision fixtures
│   ├── ratings/             # Rating fixtures
│   ├── mapping/             # Hierarchy mapping fixtures
│   ├── fx_rates/            # FX rates fixtures
│   └── generate_all.py      # Master fixture generation script
└── expected_outputs/        # Golden files for acceptance tests
    ├── crr/                 # CRR expected RWA outputs
    └── basel31/             # Basel 3.1 expected RWA outputs

Running Tests¶

All Tests¶

# Run entire test suite (benchmarks and slow tests excluded by default)
uv run pytest

# With verbose output
uv run pytest -v

# With coverage
uv run pytest --cov=src/rwa_calc --cov-report=html

By Category¶

# Unit tests only
uv run pytest tests/unit

# Acceptance tests
uv run pytest tests/acceptance
uv run pytest tests/acceptance/crr
uv run pytest tests/acceptance/basel31
uv run pytest tests/acceptance/comparison

# Contract tests
uv run pytest tests/contracts

# Benchmarks (requires --benchmark-only or --benchmark-enable)
uv run pytest tests/benchmarks --benchmark-only

# Include slow tests (10M+ scale)
uv run pytest -m slow

Specific Tests¶

# Run specific file
uv run pytest tests/unit/test_pipeline.py

# Run specific test
uv run pytest tests/unit/test_pipeline.py::test_crr_basic_calculation

# Run by pattern
uv run pytest -k "test_sa_"

# Stop on first failure
uv run pytest -x

# Show local variables in tracebacks
uv run pytest -l

# Run last failed tests
uv run pytest --lf

Test Categories¶

Unit Tests¶

Test individual components in isolation:

# tests/unit/test_ccf.py

import pytest
from rwa_calc.engine.ccf import get_ccf
from rwa_calc.domain.enums import RegulatoryFramework

class TestCCF:
    """Tests for credit conversion factor calculation."""

    def test_unconditionally_cancellable_crr_returns_zero(self):
        """Unconditionally cancellable commitments have 0% CCF under CRR."""
        ccf = get_ccf(
            item_type="UNDRAWN_COMMITMENT",
            is_unconditionally_cancellable=True,
            original_maturity_years=5,
            framework=RegulatoryFramework.CRR,
        )
        assert ccf == 0.0

    def test_unconditionally_cancellable_basel31_returns_ten_percent(self):
        """Unconditionally cancellable has 10% CCF under Basel 3.1."""
        ccf = get_ccf(
            item_type="UNDRAWN_COMMITMENT",
            is_unconditionally_cancellable=True,
            original_maturity_years=5,
            framework=RegulatoryFramework.BASEL_3_1,
        )
        assert ccf == 0.10

Contract Tests¶

Test interface compliance:

# tests/contracts/test_calculator_protocol.py

import pytest
from rwa_calc.contracts.protocols import SACalculatorProtocol
from rwa_calc.engine.sa.calculator import SACalculator

class TestSACalculatorProtocol:
    """Verify SACalculator implements protocol correctly."""

    def test_implements_protocol(self):
        """SACalculator should implement SACalculatorProtocol."""
        calculator = SACalculator()
        assert isinstance(calculator, SACalculatorProtocol)

    def test_calculate_returns_result_bundle(self, sample_exposures, config):
        """Calculate should return SAResultBundle."""
        calculator = SACalculator()
        result = calculator.calculate(sample_exposures, config)
        assert hasattr(result, "data")
        assert hasattr(result, "errors")

Acceptance Tests¶

End-to-end tests that run fixture data through the full production pipeline and compare results against pre-calculated expected outputs (golden files). There are three suites:

CRR scenarios (97 tests across 9 files):

File	Tests	Covers
`test_scenario_crr_a_sa.py`	14	Standardised Approach risk weights
`test_scenario_crr_b_firb.py`	13	Foundation IRB
`test_scenario_crr_c_airb.py`	7	Advanced IRB
`test_scenario_crr_d_crm.py`	9	Credit Risk Mitigation
`test_scenario_crr_e_slotting.py`	9	Specialised Lending Slotting
`test_scenario_crr_f_supporting_factors.py`	15	SME/Infrastructure factors
`test_scenario_crr_g_provisions.py`	17	Provision resolution
`test_scenario_crr_h_complex.py`	4	Complex/combined scenarios
`test_scenario_crr_i_defaulted.py`	9	Defaulted exposures

Basel 3.1 scenarios (116 tests across 9 files):

File	Tests	Covers
`test_scenario_b31_a_sa.py`	14	SA risk weights (PRA PS1/26)
`test_scenario_b31_b_firb.py`	16	Foundation IRB
`test_scenario_b31_c_airb.py`	13	Advanced IRB
`test_scenario_b31_d_crm.py`	15	Credit Risk Mitigation
`test_scenario_b31_d7_parameter_substitution.py`	5	IRB parameter substitution
`test_scenario_b31_e_slotting.py`	13	Specialised Lending Slotting
`test_scenario_b31_f_output_floor.py`	6	Output floor (72.5%)
`test_scenario_b31_g_provisions.py`	24	Provision resolution
`test_scenario_b31_h_complex.py`	10	Complex/combined scenarios

Comparison tests (62 tests) validate that CRR and Basel 3.1 results relate to each other correctly (e.g. output floor binds when SA RWA exceeds IRB).

Each acceptance test looks up a specific exposure in the pipeline results and asserts against the expected output:

# tests/acceptance/crr/test_scenario_crr_a_sa.py

class TestCRRGroupA_StandardisedApproach:

    def test_crr_a1_uk_sovereign_zero_rw(
        self,
        sa_results_df: pl.DataFrame,
        expected_outputs_dict: dict[str, dict[str, Any]],
    ) -> None:
        """
        CRR-A1: UK Sovereign with CQS 1 should have 0% risk weight.

        Input: £1,000,000 loan to UK Government (CQS 1)
        Expected: RWA = £0 (0% RW per CRR Art. 114)
        """
        expected = expected_outputs_dict["CRR-A1"]
        result = get_sa_result_for_exposure(sa_results_df, "LOAN_SOV_UK_001")

        assert result is not None, "Exposure LOAN_SOV_UK_001 not found in SA results"
        assert_risk_weight_match(
            result["risk_weight"], expected["risk_weight"], scenario_id="CRR-A1"
        )
        assert_rwa_within_tolerance(
            result["rwa_post_factor"], expected["rwa_after_sf"], scenario_id="CRR-A1"
        )

The session-scoped fixtures (sa_results_df, expected_outputs_dict, pipeline_results, etc.) are defined in each suite's conftest.py. The pipeline runs once per session and results are shared across all tests.

Benchmark Tests¶

Performance tests at various scales. See Benchmark Tests for full details.

# Run benchmarks (uses cached datasets)
uv run pytest tests/benchmarks --benchmark-only

# Force regenerate all benchmark datasets
uv run pytest tests/benchmarks --benchmark-only --benchmark-regenerate

# Force regenerate a specific scale
uv run pytest tests/benchmarks --benchmark-only --benchmark-regenerate-scale=100k

Scales: 10K (quick, ~1s), 100K (standard, ~5s), 1M (large, ~60s), 10M (production, slow marker).

Acceptance Test Datasets¶

Acceptance tests depend on two things: input fixture data (parquet files in tests/fixtures/) and expected outputs (golden files in tests/expected_outputs/). Understanding how to generate and maintain these is essential for working with acceptance tests.

Data flow¶

tests/fixtures/               tests/expected_outputs/
    ├── counterparty/*.py             ├── crr/expected_rwa_crr.json
    ├── exposures/*.py                └── basel31/expected_rwa_b31.json
    ├── collateral/*.py
    ├── ...                                     │
    │                                           │
    ▼  generate_all.py                          │
    │                                           │
tests/fixtures/                                 │
    ├── counterparty/*.parquet                  │
    ├── exposures/*.parquet                     │
    ├── ...                                     │
    │                                           │
    ▼  conftest.py: load_fixtures()             │
    │                                           │
    RawDataBundle (LazyFrames)                  │
    │                                           │
    ▼  conftest.py: PipelineOrchestrator.run()  │
    │                                           │
    AggregatedResultBundle                      │
    │                                           │
    ▼  test assertions ◄────────────────────────┘

Generating fixture data¶

Each subdirectory in tests/fixtures/ contains Python modules with create_*() and save_*() functions that produce parquet files. The master script generates everything in the correct order:

# Generate all fixture parquet files
uv run python tests/fixtures/generate_all.py

This produces parquet files across 8 fixture groups:

Group	Directory	Files	Description
Counterparties	`counterparty/`	sovereign, institution, corporate, retail, specialised_lending	All counterparty types with various CQS bands
Mappings	`mapping/`	org_mapping, lending_mapping	Organisational and lending group hierarchies
Ratings	`ratings/`	ratings	External and internal ratings
Exposures	`exposures/`	facilities, loans, contingents, facility_mapping	All exposure types with facility-to-loan mappings
Collateral	`collateral/`	collateral	Financial and non-financial collateral
Guarantees	`guarantee/`	guarantee	Guarantee and credit protection
Provisions	`provision/`	provision	Specific and general provisions
FX Rates	`fx_rates/`	fx_rates	Currency conversion rates

The script also runs data integrity checks (referential integrity between counterparties, exposures, collateral, etc.).

When to regenerate: after modifying any create_*() function in tests/fixtures/, or when adding new test scenarios that require new input data.

Expected outputs (golden files)¶

Expected outputs live in tests/expected_outputs/ and define the correct RWA results for each test scenario. Three formats are supported (checked in priority order):

Parquet (fastest) — expected_rwa_crr.parquet / expected_rwa_b31.parquet
CSV (fallback) — expected_rwa_crr.csv / expected_rwa_b31.csv
JSON (source of truth) — expected_rwa_crr.json / expected_rwa_b31.json

Each record has at minimum:

scenario_id — unique test identifier (e.g. "CRR-A1", "B31-B03")
scenario_group — grouping for fixture filtering (e.g. "CRR-A", "B31-D")
risk_weight, rwa_after_sf, ead — expected calculation outputs

The JSON file is the canonical source of truth. Parquet/CSV are derived for faster loading. When updating expected values, edit the JSON and regenerate the other formats.

How conftest.py wires it together¶

Each acceptance suite's conftest.py (e.g. tests/acceptance/crr/conftest.py) provides session-scoped fixtures:

load_test_fixtures — calls load_fixtures() from workbooks/shared/fixture_loader.py, which reads all parquet files into a FixtureData container of LazyFrames
raw_data_bundle — assembles the FixtureData into a RawDataBundle (the pipeline's input type)
pipeline_results — runs PipelineOrchestrator().run_with_data(bundle, config) once per session
sa_results_df / irb_results_df / slotting_results_df — collected DataFrames for each approach
expected_outputs_df / expected_outputs_dict — loaded golden file data for assertions

Different configs are used for different scenario groups (SA-only, full IRB, slotting permissions, etc.).

Adding a new acceptance test scenario¶

Add input data in the appropriate tests/fixtures/ module (e.g. a new counterparty in corporate.py, a new loan in loans.py)
Regenerate fixtures: uv run python tests/fixtures/generate_all.py
Add expected outputs to the relevant JSON golden file in tests/expected_outputs/
Write the test in the appropriate test_scenario_*.py file, using conftest fixtures
Run and verify: uv run pytest tests/acceptance/crr/test_scenario_crr_a_sa.py -v

Test Fixtures¶

Unit test fixtures¶

Unit tests use inline Polars DataFrames or @pytest.fixture functions:

@pytest.fixture
def sample_counterparty():
    """Single corporate counterparty."""
    return pl.DataFrame({
        "counterparty_id": ["C001"],
        "counterparty_name": ["Acme Corp"],
        "counterparty_type": ["CORPORATE"],
        "country_code": ["GB"],
        "annual_turnover": [30_000_000.0],
    }).lazy()

Parametrized fixtures¶

@pytest.fixture(params=[
    ("CQS_1", 0.20),
    ("CQS_2", 0.50),
    ("CQS_3", 0.75),
    ("CQS_4", 1.00),
    ("UNRATED", 1.00),
])
def corporate_risk_weight_case(request):
    """Parametrized corporate risk weight test cases."""
    cqs, expected_rw = request.param
    return {"cqs": cqs, "expected_risk_weight": expected_rw}

Configuration fixtures¶

import pytest
from datetime import date
from rwa_calc.contracts.config import CalculationConfig

@pytest.fixture
def crr_config():
    """Standard CRR configuration."""
    return CalculationConfig.crr(reporting_date=date(2026, 12, 31))

@pytest.fixture
def basel31_config():
    """Standard Basel 3.1 configuration."""
    return CalculationConfig.basel_3_1(reporting_date=date(2027, 1, 1))

@pytest.fixture(params=["crr", "basel31"])
def both_frameworks(request, crr_config, basel31_config):
    """Run test under both frameworks."""
    if request.param == "crr":
        return crr_config
    return basel31_config

Writing Effective Tests¶

Test Naming¶

Use descriptive names that explain what is being tested, under what conditions, and the expected outcome:

# Good
def test_sme_factor_tiered_calculation_exposure_above_threshold_returns_blended_factor():
    ...

# Bad
def test_sme():
    ...

Test Structure (AAA Pattern)¶

def test_irb_capital_requirement_calculation():
    """Test IRB K formula calculation."""
    # Arrange
    pd = 0.01
    lgd = 0.45
    correlation = 0.20

    # Act
    k = calculate_k(pd, lgd, correlation)

    # Assert
    assert k == pytest.approx(0.0445, rel=0.01)

Assertions¶

# Exact equality
assert result == expected

# Approximate equality
assert result == pytest.approx(expected, rel=0.01)  # 1% tolerance
assert result == pytest.approx(expected, abs=0.001)  # Absolute tolerance

# Collections
assert set(result) == set(expected)
assert result in expected_values

# DataFrame assertions
assert len(df) == expected_count
assert df["column"].sum() == expected_sum

Testing Errors¶

def test_invalid_pd_raises_error():
    """Negative PD should raise ValueError."""
    with pytest.raises(ValueError, match="PD must be positive"):
        calculate_k(pd=-0.01, lgd=0.45, correlation=0.20)

def test_calculation_accumulates_errors():
    """Invalid exposures should accumulate errors."""
    result = pipeline.run_with_data(invalid_data, config)
    assert result.has_errors
    assert any("Invalid PD" in e.message for e in result.errors)

Test Markers¶

Markers are configured in pyproject.toml:

[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-v --tb=short --benchmark-disable -m 'not slow'"
markers = [
    "benchmark: mark test as a benchmark (deselect with --benchmark-skip)",
    "slow: mark test as slow (10M+ scale, may take several minutes)",
]

Usage:

@pytest.mark.slow
def test_large_portfolio_calculation():
    ...

Run by marker:

uv run pytest -m "not slow"    # Default — excludes slow tests
uv run pytest -m slow           # Only slow tests

Coverage¶

Generate Coverage Report¶

# Terminal report
uv run pytest --cov=src/rwa_calc

# HTML report
uv run pytest --cov=src/rwa_calc --cov-report=html
open htmlcov/index.html

Coverage Configuration¶

# pyproject.toml
[tool.coverage.run]
source = ["src/rwa_calc"]
branch = true

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "if TYPE_CHECKING:",
    "raise NotImplementedError",
]

Next Steps¶

Specifications - Regulatory specifications and scenarios
Adding Features - Extending the calculator
Code Style - Coding conventions
Architecture - System design