Volatility Analysis
Analyse realised and implied volatility using EWMA, Parkinson, and Garman-Klass estimators with regime breakdowns.
Volatility Analysis Framework
This notebook defines a standardized protocol for measuring price volatility in OHLCV time-series data. It covers rolling standard deviation, Average True Range, Bollinger Bands, and annualized volatility on a representative dummy dataset.
1. Dependency Installation
!pip install pandas numpyRequirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2) Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (2.0.2) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2026.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)
2. Library Imports
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns3. Key Concepts
Volatility Volatility measures how much a price moves over a given period. High volatility means the price is making large, rapid moves — the market is uncertain or actively reacting to news. Low volatility means the price is moving slowly and predictably. In trading, volatility is used to size positions (larger positions in low-volatility markets, smaller in high-volatility), set stop-losses, and price options.
Standard Deviation of Returns The most basic volatility measure. It computes how spread out the returns are around their average. A standard deviation of 0.01 means returns typically deviate 1% from their average. Computed on a rolling window so it reflects recent conditions rather than the full history.
True Range (TR) True Range is the largest of three measurements for each candle:
high − low: The range within the current candle.|high − previous close|: How far the high reached above or below where the previous candle ended.|low − previous close|: How far the low reached above or below where the previous candle ended.
Measurements 2 and 3 exist to capture gaps — situations where the
price opens significantly higher or lower than the previous close.
A simple high − low would miss these gaps entirely. True Range
always takes the largest of the three to ensure the full price
movement since the last close is captured.
Average True Range (ATR) ATR is the rolling average of True Range over N candles. It gives a smoothed, continuous estimate of how much the price typically moves per candle. A high ATR indicates a volatile period; a low ATR indicates a quiet period. ATR is used directly in trading to set stop-loss distances — placing a stop at 2× ATR below entry, for example, means the stop is placed outside normal market noise.
Bollinger Bands Bollinger Bands consist of three lines plotted around the price:
- Middle Band: Rolling mean of the close price over N candles.
- Upper Band: Middle Band + (K × rolling standard deviation).
- Lower Band: Middle Band − (K × rolling standard deviation).
The bands expand when volatility is high (standard deviation is large) and contract when volatility is low. Price touching or crossing the upper band indicates the price is unusually high relative to recent history; the lower band indicates unusually low. The standard parameters are N=20 candles and K=2 standard deviations.
%B (Percent B) %B measures where the current close price sits within the Bollinger Bands, expressed as a value between 0 and 1:
- %B = 1.0: Close is exactly on the upper band.
- %B = 0.5: Close is exactly on the middle band.
- %B = 0.0: Close is exactly on the lower band.
- %B > 1.0: Close is above the upper band (extreme high).
- %B < 0.0: Close is below the lower band (extreme low).
Annualized Volatility
Volatility computed on a per-minute basis is difficult to compare
across assets or timeframes. Annualized volatility scales the per-period
volatility up to a full year by multiplying by the square root of the
number of periods in a year. For 1-minute data: √(525,600 minutes/year).
This produces a percentage that can be compared directly to annual
volatility figures published for stocks, bonds, or other assets.
4. Dummy Dataset
raw_data = {
"datetime": [
"2024-01-01 00:00:00+00:00",
"2024-01-01 00:01:00+00:00",
"2024-01-01 00:02:00+00:00",
"2024-01-01 00:03:00+00:00",
"2024-01-01 00:04:00+00:00",
"2024-01-01 00:05:00+00:00",
"2024-01-01 00:06:00+00:00",
"2024-01-01 00:07:00+00:00",
"2024-01-01 00:08:00+00:00",
"2024-01-01 00:09:00+00:00",
],
"open": [42100, 42200, 42150, 42300, 42250,
42400, 42350, 42500, 42450, 42600],
"high": [42300, 42400, 42350, 42500, 42450,
42600, 42550, 42700, 42650, 42800],
"low": [41900, 42000, 41950, 42100, 42050,
42200, 42150, 42300, 42250, 42400],
"close": [42200, 42150, 42300, 42250, 42400,
42350, 42500, 42450, 42600, 42550],
"volume": [10.5, 8.2, 9.1, 11.3, 7.6,
12.4, 6.8, 13.1, 9.9, 10.2],
}
df = pd.DataFrame(raw_data)
df["datetime"] = pd.to_datetime(df["datetime"], utc=True)
print("--- Raw OHLCV Data ---")
display(df)--- Raw OHLCV Data ---
| datetime | open | high | low | close | volume | |
|---|---|---|---|---|---|---|
| 0 | 2024-01-01 00:00:00+00:00 | 42100 | 42300 | 41900 | 42200 | 10.5 |
| 1 | 2024-01-01 00:01:00+00:00 | 42200 | 42400 | 42000 | 42150 | 8.2 |
| 2 | 2024-01-01 00:02:00+00:00 | 42150 | 42350 | 41950 | 42300 | 9.1 |
| 3 | 2024-01-01 00:03:00+00:00 | 42300 | 42500 | 42100 | 42250 | 11.3 |
| 4 | 2024-01-01 00:04:00+00:00 | 42250 | 42450 | 42050 | 42400 | 7.6 |
| 5 | 2024-01-01 00:05:00+00:00 | 42400 | 42600 | 42200 | 42350 | 12.4 |
| 6 | 2024-01-01 00:06:00+00:00 | 42350 | 42550 | 42150 | 42500 | 6.8 |
| 7 | 2024-01-01 00:07:00+00:00 | 42500 | 42700 | 42300 | 42450 | 13.1 |
| 8 | 2024-01-01 00:08:00+00:00 | 42450 | 42650 | 42250 | 42600 | 9.9 |
| 9 | 2024-01-01 00:09:00+00:00 | 42600 | 42800 | 42400 | 42550 | 10.2 |
def analyze_volatility(
df: pd.DataFrame,
rolling_window: int = 5,
bb_window: int = 5,
bb_std: float = 2.0,
periods_per_year: int = 525_600 # minutes in a year
) -> pd.DataFrame:
"""
Compute volatility indicators from OHLCV data.
Args:
df (pd.DataFrame): Cleaned OHLCV DataFrame with UTC datetime column.
rolling_window (int): Window size for rolling std and ATR.
bb_window (int): Window size for Bollinger Bands.
bb_std (float): Number of standard deviations for Bollinger Band width.
periods_per_year (int): Number of candle periods in one year for annualization.
Returns:
pd.DataFrame: Input DataFrame extended with computed volatility metrics.
"""
df = df.copy().sort_values("datetime", ignore_index=True)
# Log return: required for standard deviation volatility calculation
log_return = np.log(df["close"] / df["close"].shift(1))
# Rolling standard deviation of log returns
df["rolling_std"] = log_return.rolling(window=rolling_window).std()
# Annualized volatility: scales per-minute std to a full-year equivalent
df["annualized_vol"] = df["rolling_std"] * np.sqrt(periods_per_year)
# True Range components
hl = df["high"] - df["low"] # high minus low
hpc = (df["high"] - df["close"].shift(1)).abs() # high minus previous close
lpc = (df["low"] - df["close"].shift(1)).abs() # low minus previous close
# True Range: maximum of the three components per candle
df["true_range"] = pd.concat([hl, hpc, lpc], axis=1).max(axis=1)
# Average True Range: rolling mean of True Range
df["atr"] = df["true_range"].rolling(window=rolling_window).mean()
# Bollinger Bands
bb_mean = df["close"].rolling(window=bb_window).mean()
bb_std_val = df["close"].rolling(window=bb_window).std()
df["bb_upper"] = bb_mean + (bb_std * bb_std_val)
df["bb_middle"] = bb_mean
df["bb_lower"] = bb_mean - (bb_std * bb_std_val)
# %B: position of close within Bollinger Bands (0 = lower, 1 = upper)
df["bb_pct_b"] = (df["close"] - df["bb_lower"]) / (df["bb_upper"] - df["bb_lower"])
# Bandwidth: width of the bands relative to the middle band
# High bandwidth = high volatility; low bandwidth = low volatility (squeeze)
df["bb_bandwidth"] = (df["bb_upper"] - df["bb_lower"]) / df["bb_middle"]
return df[[
"datetime", "open", "high", "low", "close", "volume",
"rolling_std", "annualized_vol",
"true_range", "atr",
"bb_upper", "bb_middle", "bb_lower", "bb_pct_b", "bb_bandwidth",
]]Code Logic
Log return
np.log(df["close"] / df["close"].shift(1)): Computed internally as the input to standard deviation. Log returns are used rather than simple returns because their statistical distribution is better behaved for volatility estimation — see Notebook 15 for full definition.
Rolling standard deviation
log_return.rolling(window=rolling_window).std(): At each row, computes the standard deviation of the lastrolling_windowlog returns. The firstrolling_window − 1rows produceNaN. A larger value means returns have been more spread out (more volatile) over the recent window.
Annualized volatility
df["rolling_std"] * np.sqrt(periods_per_year): Volatility scales with the square root of time — a mathematical property of how random price movements accumulate. Multiplying by√(periods per year)converts per-minute volatility to the annual equivalent. See Section 3 for full definition.
True Range
pd.concat([hl, hpc, lpc], axis=1).max(axis=1): Stacks the three True Range components as columns and takes the row-wise maximum.abs()is applied to components 2 and 3 because the gap can be in either direction — both upward and downward gaps represent full movement that must be captured. See Section 3 for full definition.
ATR
df["true_range"].rolling(window=rolling_window).mean(): Rolling average of True Range. Smooths candle-to-candle spikes to produce a stable estimate of typical price movement per period.
Bollinger Bands
bb_mean: Rolling mean computed once and reused for both the middle band and standard deviation calculation — avoids redundant computation.df["bb_upper"] / df["bb_lower"]: Middle band ± (K × rolling standard deviation). The bands widen whenbb_std_valis large (volatile period) and narrow when it is small (quiet period). See Section 3 for full definition.
%B
(df["close"] - df["bb_lower"]) / (df["bb_upper"] - df["bb_lower"]): Normalizes the close price to the 0–1 range defined by the lower and upper bands. Values outside 0–1 indicate the price has moved beyond the bands. See Section 3 for full definition.
Bandwidth
(df["bb_upper"] - df["bb_lower"]) / df["bb_middle"]: Expresses band width as a fraction of the middle band price, making it comparable across different price levels. A narrow bandwidth (Bollinger Squeeze) often precedes a large directional move as compressed volatility is released.
6. Execution
df_volatility = analyze_volatility(
df,
rolling_window = 5,
bb_window = 5,
bb_std = 2.0,
periods_per_year = 525_600,
)
print("--- Volatility Analysis Output ---")
display(df_volatility)--- Volatility Analysis Output ---
| datetime | open | high | low | close | volume | rolling_std | annualized_vol | true_range | atr | bb_upper | bb_middle | bb_lower | bb_pct_b | bb_bandwidth | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2024-01-01 00:00:00+00:00 | 42100 | 42300 | 41900 | 42200 | 10.5 | NaN | NaN | 400.0 | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 2024-01-01 00:01:00+00:00 | 42200 | 42400 | 42000 | 42150 | 8.2 | NaN | NaN | 400.0 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 2024-01-01 00:02:00+00:00 | 42150 | 42350 | 41950 | 42300 | 9.1 | NaN | NaN | 400.0 | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 2024-01-01 00:03:00+00:00 | 42300 | 42500 | 42100 | 42250 | 11.3 | NaN | NaN | 400.0 | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 2024-01-01 00:04:00+00:00 | 42250 | 42450 | 42050 | 42400 | 7.6 | NaN | NaN | 400.0 | 400.0 | 42452.353841 | 42260.0 | 42067.646159 | 0.863913 | 0.009103 |
| 5 | 2024-01-01 00:05:00+00:00 | 42400 | 42600 | 42200 | 42350 | 12.4 | 0.002591 | 1.878609 | 400.0 | 400.0 | 42482.353841 | 42290.0 | 42097.646159 | 0.655963 | 0.009097 |
| 6 | 2024-01-01 00:06:00+00:00 | 42350 | 42550 | 42150 | 42500 | 6.8 | 0.002588 | 1.876395 | 400.0 | 400.0 | 42552.353841 | 42360.0 | 42167.646159 | 0.863913 | 0.009082 |
| 7 | 2024-01-01 00:07:00+00:00 | 42500 | 42700 | 42300 | 42450 | 13.1 | 0.002585 | 1.874175 | 400.0 | 400.0 | 42582.353841 | 42390.0 | 42197.646159 | 0.655963 | 0.009075 |
| 8 | 2024-01-01 00:08:00+00:00 | 42450 | 42650 | 42250 | 42600 | 9.9 | 0.002582 | 1.871972 | 400.0 | 400.0 | 42652.353841 | 42460.0 | 42267.646159 | 0.863913 | 0.009060 |
| 9 | 2024-01-01 00:09:00+00:00 | 42600 | 42800 | 42400 | 42550 | 10.2 | 0.002579 | 1.869763 | 400.0 | 400.0 | 42682.353841 | 42490.0 | 42297.646159 | 0.655963 | 0.009054 |
7. Summary Statistics
print("--- Volatility Summary ---")
display(df_volatility[[
"rolling_std", "annualized_vol", "true_range", "atr",
"bb_bandwidth", "bb_pct_b"
]].describe().round(6))
print("\n--- Schema Summary ---")
df_volatility.info()--- Volatility Summary ---
| rolling_std | annualized_vol | true_range | atr | bb_bandwidth | bb_pct_b | |
|---|---|---|---|---|---|---|
| count | 5.000000 | 5.000000 | 10.0 | 6.0 | 6.000000 | 6.000000 |
| mean | 0.002585 | 1.874183 | 400.0 | 400.0 | 0.009079 | 0.759938 |
| std | 0.000005 | 0.003497 | 0.0 | 0.0 | 0.000019 | 0.113899 |
| min | 0.002579 | 1.869763 | 400.0 | 400.0 | 0.009054 | 0.655963 |
| 25% | 0.002582 | 1.871972 | 400.0 | 400.0 | 0.009064 | 0.655963 |
| 50% | 0.002585 | 1.874175 | 400.0 | 400.0 | 0.009079 | 0.759938 |
| 75% | 0.002588 | 1.876395 | 400.0 | 400.0 | 0.009093 | 0.863913 |
| max | 0.002591 | 1.878609 | 400.0 | 400.0 | 0.009103 | 0.863913 |
--- Schema Summary --- <class 'pandas.core.frame.DataFrame'> RangeIndex: 10 entries, 0 to 9 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 datetime 10 non-null datetime64[ns, UTC] 1 open 10 non-null int64 2 high 10 non-null int64 3 low 10 non-null int64 4 close 10 non-null int64 5 volume 10 non-null float64 6 rolling_std 5 non-null float64 7 annualized_vol 5 non-null float64 8 true_range 10 non-null float64 9 atr 6 non-null float64 10 bb_upper 6 non-null float64 11 bb_middle 6 non-null float64 12 bb_lower 6 non-null float64 13 bb_pct_b 6 non-null float64 14 bb_bandwidth 6 non-null float64 dtypes: datetime64[ns, UTC](1), float64(10), int64(4) memory usage: 1.3 KB