RiskLab 01: Building the Market Data Core

🔗 GitHub Repository: prashant-fintech/risklab
🚀 Live Demo: risklab.streamlit.app

In quantitative finance, the quality of your risk engine is only as good as the data feeding it. Before we can calculate Value at Risk (VaR) or run stress tests, we need a robust, standardized way to ingest, align, and clean financial time series.

Post 01 establishes the Market Data Core for RiskLab. Instead of writing ad-hoc pandas scripts for every new model, we have built a reusable “kernel” that enforces strict data contracts and handles the messy reality of financial data—missing observations, different time zones, and market holidays.

1. The “Contracts First” Approach

We use Pydantic to define strict interfaces for all data operations. This ensures that every function in the pipeline—whether it’s calculating returns or resampling prices—knows exactly what arguments to expect and validates them at runtime.

Instead of passing loose dictionaries or relying on default arguments hidden in functions, we define explicit Specifications:

# from risklab_core/contracts/market_data.py

class ReturnsSpec(RiskLabModel):
    method: Literal["simple", "log"] = "simple"
    dropna: bool = True

class OutlierSpec(RiskLabModel):
    method: Literal[None, "winsorize", "clip"] = None
    lower_q: float = Field(0.01, ge=0.0, le=0.5)
    upper_q: float = Field(0.99, ge=0.5, le=1.0)

This design allows us to save these configurations as JSON/YAML later, making the entire risk pipeline reproducible and serializable.

2. Core Transforms: From Prices to Returns

The heart of the module is transforms.py. It provides the foundational math for converting raw asset prices into the return series required for volatility modeling.

Return Calculation

We support both Simple Returns ($P_t / P_{t-1} - 1$) and Log Returns ($\ln(P_t / P_{t-1})$). The implementation is vectorized using pandas for performance but guarded by our specs:

# from risklab_core/market_data/transforms.py

def to_returns(prices: pd.DataFrame, spec: ReturnsSpec=ReturnsSpec()) -> pd.DataFrame:
    if spec.method == "simple":
        returns = prices.pct_change()
    elif spec.method == "log":
        returns = np.log(prices).diff()
    # ... handles dropna logic automatically
    return returns

Asset Alignment

A common pain point in portfolio risk is “ragged” data—where Asset A trades on a day that Asset B does not. The align_assets function standardizes this, allowing users to choose between inner joins (intersection) or outer joins (union) with configurable fill policies (ffill, bfill).

3. Handling Outliers

Real market data is noisy. A single bad tick can blow up a covariance matrix. We implemented a handle_outliers utility that supports:

Winsorization: Capping extreme values at the $n^{th}$ percentile (e.g., 1st and 99th) rather than removing them.
Clipping: Hard limits on returns (e.g., +/- 10%).

# from risklab_core/market_data/outliers.py

def winsorize(df: pd.DataFrame, lower_q: float=0.01, upper_q: float=0.99) -> pd.DataFrame:
    lo = df.quantile(lower_q)
    hi = df.quantile(upper_q)
    return df.clip(lower=lo, upper=hi, axis=1)

4. Calendar Logic

Risk doesn’t happen on weekends. To accurately model time horizons (e.g., “10-day VaR”), we need to know which days are actual trading days.

We introduced a TradingCalendar abstraction and a CalendarRegistry. This allows the system to be market-aware. The base implementation includes a WeekdayCalendar (Mon-Fri) and a StaticHolidayCalendar that can ingest specific holiday sets (e.g., NYSE, LSE).

# from risklab_core/market_data/calendars/static_holidays.py

class StaticHolidayCalendar(TradingCalendar):
    def is_trading_day(self, d) -> bool:
        if d.weekday() >= 5: return False  # Weekends
        if self.is_holiday(d): return False # Explicit holidays
        return True

5. Verification & Testing

We follow a strict Test-Driven Development (TDD) cycle. The unit tests in tests/unit/test_market_data.py ensure we meet our Acceptance Criteria:

Precision: Verified that log returns match manual np.log calculations.
Alignment: Verified that merging disjoint time series results in a single, monotonic index.
Resiliency: Verified that the pipeline handles NaNs and empty DataFrames gracefully.

Architecture Overview

The Market Data Core follows a layered architecture:

Key Takeaways

Contracts Before Code: Using Pydantic to enforce data contracts makes the pipeline safer and more maintainable.
Standardization is Critical: Financial data is messy—building reusable utilities for alignment, outlier handling, and calendar logic saves hours of debugging.
Test Everything: TDD ensures that each component works in isolation before integrating into larger models.

Next Steps

With the foundation laid, the next phase (Post 02) will focus on Volatility Engines—using these aligned return series to forecast risk using EWMA and GARCH models.

Ready to dive into the code? Check out the full implementation on GitHub:
→ github.com/prashant-fintech/risklab

Try it live: Explore the interactive demo:
→ risklab.streamlit.app

Questions or feedback? Feel free to open an issue or reach out!

1. The “Contracts First” Approach#

2. Core Transforms: From Prices to Returns#

Return Calculation#

Asset Alignment#

3. Handling Outliers#

4. Calendar Logic#

5. Verification & Testing#

Architecture Overview#

Key Takeaways#

Next Steps#