π GitHub Repository: prashant-fintech/risklab
π Live Demo: risklab.streamlit.app
In quantitative finance, the quality of your risk engine is only as good as the data feeding it. Before we can calculate Value at Risk (VaR) or run stress tests, we need a robust, standardized way to ingest, align, and clean financial time series.
Post 01 establishes the Market Data Core for RiskLab. Instead of writing ad-hoc pandas scripts for every new model, we have built a reusable “kernel” that enforces strict data contracts and handles the messy reality of financial dataβmissing observations, different time zones, and market holidays.
1. The “Contracts First” Approach
We use Pydantic to define strict interfaces for all data operations. This ensures that every function in the pipelineβwhether it’s calculating returns or resampling pricesβknows exactly what arguments to expect and validates them at runtime.
Instead of passing loose dictionaries or relying on default arguments hidden in functions, we define explicit Specifications:
# from risklab_core/contracts/market_data.py
class ReturnsSpec(RiskLabModel):
method: Literal["simple", "log"] = "simple"
dropna: bool = True
class OutlierSpec(RiskLabModel):
method: Literal[None, "winsorize", "clip"] = None
lower_q: float = Field(0.01, ge=0.0, le=0.5)
upper_q: float = Field(0.99, ge=0.5, le=1.0)
This design allows us to save these configurations as JSON/YAML later, making the entire risk pipeline reproducible and serializable.
2. Core Transforms: From Prices to Returns
The heart of the module is transforms.py. It provides the foundational math for converting raw asset prices into the return series required for volatility modeling.
Return Calculation
We support both Simple Returns ($P_t / P_{t-1} - 1$) and Log Returns ($\ln(P_t / P_{t-1})$). The implementation is vectorized using pandas for performance but guarded by our specs:
# from risklab_core/market_data/transforms.py
def to_returns(prices: pd.DataFrame, spec: ReturnsSpec=ReturnsSpec()) -> pd.DataFrame:
if spec.method == "simple":
returns = prices.pct_change()
elif spec.method == "log":
returns = np.log(prices).diff()
# ... handles dropna logic automatically
return returns
Asset Alignment
A common pain point in portfolio risk is “ragged” dataβwhere Asset A trades on a day that Asset B does not. The align_assets function standardizes this, allowing users to choose between inner joins (intersection) or outer joins (union) with configurable fill policies (ffill, bfill).
3. Handling Outliers
Real market data is noisy. A single bad tick can blow up a covariance matrix. We implemented a handle_outliers utility that supports:
- Winsorization: Capping extreme values at the $n^{th}$ percentile (e.g., 1st and 99th) rather than removing them.
- Clipping: Hard limits on returns (e.g., +/- 10%).
# from risklab_core/market_data/outliers.py
def winsorize(df: pd.DataFrame, lower_q: float=0.01, upper_q: float=0.99) -> pd.DataFrame:
lo = df.quantile(lower_q)
hi = df.quantile(upper_q)
return df.clip(lower=lo, upper=hi, axis=1)
4. Calendar Logic
Risk doesn’t happen on weekends. To accurately model time horizons (e.g., “10-day VaR”), we need to know which days are actual trading days.
We introduced a TradingCalendar abstraction and a CalendarRegistry. This allows the system to be market-aware. The base implementation includes a WeekdayCalendar (Mon-Fri) and a StaticHolidayCalendar that can ingest specific holiday sets (e.g., NYSE, LSE).
# from risklab_core/market_data/calendars/static_holidays.py
class StaticHolidayCalendar(TradingCalendar):
def is_trading_day(self, d) -> bool:
if d.weekday() >= 5: return False # Weekends
if self.is_holiday(d): return False # Explicit holidays
return True
5. Verification & Testing
We follow a strict Test-Driven Development (TDD) cycle. The unit tests in tests/unit/test_market_data.py ensure we meet our Acceptance Criteria:
- Precision: Verified that log returns match manual
np.logcalculations. - Alignment: Verified that merging disjoint time series results in a single, monotonic index.
- Resiliency: Verified that the pipeline handles NaNs and empty DataFrames gracefully.
Architecture Overview
The Market Data Core follows a layered architecture:
Key Takeaways
- Contracts Before Code: Using Pydantic to enforce data contracts makes the pipeline safer and more maintainable.
- Standardization is Critical: Financial data is messyβbuilding reusable utilities for alignment, outlier handling, and calendar logic saves hours of debugging.
- Test Everything: TDD ensures that each component works in isolation before integrating into larger models.
Next Steps
With the foundation laid, the next phase (Post 02) will focus on Volatility Enginesβusing these aligned return series to forecast risk using EWMA and GARCH models.
Ready to dive into the code? Check out the full implementation on GitHub:
β github.com/prashant-fintech/risklab
Try it live: Explore the interactive demo:
β risklab.streamlit.app
Questions or feedback? Feel free to open an issue or reach out!
