Data·Analysis·Intermediate

Volatility Analysis

Analyse realised and implied volatility using EWMA, Parkinson, and Garman-Klass estimators with regime breakdowns.

volatilityEWMAregimes

Volatility Analysis Framework

This notebook defines a standardized protocol for measuring price volatility in OHLCV time-series data. It covers rolling standard deviation, Average True Range, Bollinger Bands, and annualized volatility on a representative dummy dataset.


1. Dependency Installation

[28]
!pip install pandas numpy
Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (2.0.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2026.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)

2. Library Imports

[29]
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

3. Key Concepts

Volatility Volatility measures how much a price moves over a given period. High volatility means the price is making large, rapid moves — the market is uncertain or actively reacting to news. Low volatility means the price is moving slowly and predictably. In trading, volatility is used to size positions (larger positions in low-volatility markets, smaller in high-volatility), set stop-losses, and price options.

Standard Deviation of Returns The most basic volatility measure. It computes how spread out the returns are around their average. A standard deviation of 0.01 means returns typically deviate 1% from their average. Computed on a rolling window so it reflects recent conditions rather than the full history.

True Range (TR) True Range is the largest of three measurements for each candle:

  1. high − low: The range within the current candle.
  2. |high − previous close|: How far the high reached above or below where the previous candle ended.
  3. |low − previous close|: How far the low reached above or below where the previous candle ended.

Measurements 2 and 3 exist to capture gaps — situations where the price opens significantly higher or lower than the previous close. A simple high − low would miss these gaps entirely. True Range always takes the largest of the three to ensure the full price movement since the last close is captured.

Average True Range (ATR) ATR is the rolling average of True Range over N candles. It gives a smoothed, continuous estimate of how much the price typically moves per candle. A high ATR indicates a volatile period; a low ATR indicates a quiet period. ATR is used directly in trading to set stop-loss distances — placing a stop at 2× ATR below entry, for example, means the stop is placed outside normal market noise.

Bollinger Bands Bollinger Bands consist of three lines plotted around the price:

  • Middle Band: Rolling mean of the close price over N candles.
  • Upper Band: Middle Band + (K × rolling standard deviation).
  • Lower Band: Middle Band − (K × rolling standard deviation).

The bands expand when volatility is high (standard deviation is large) and contract when volatility is low. Price touching or crossing the upper band indicates the price is unusually high relative to recent history; the lower band indicates unusually low. The standard parameters are N=20 candles and K=2 standard deviations.

%B (Percent B) %B measures where the current close price sits within the Bollinger Bands, expressed as a value between 0 and 1:

  • %B = 1.0: Close is exactly on the upper band.
  • %B = 0.5: Close is exactly on the middle band.
  • %B = 0.0: Close is exactly on the lower band.
  • %B > 1.0: Close is above the upper band (extreme high).
  • %B < 0.0: Close is below the lower band (extreme low).

Annualized Volatility Volatility computed on a per-minute basis is difficult to compare across assets or timeframes. Annualized volatility scales the per-period volatility up to a full year by multiplying by the square root of the number of periods in a year. For 1-minute data: √(525,600 minutes/year). This produces a percentage that can be compared directly to annual volatility figures published for stocks, bonds, or other assets.


4. Dummy Dataset

[30]
raw_data = {
    "datetime": [
        "2024-01-01 00:00:00+00:00",
        "2024-01-01 00:01:00+00:00",
        "2024-01-01 00:02:00+00:00",
        "2024-01-01 00:03:00+00:00",
        "2024-01-01 00:04:00+00:00",
        "2024-01-01 00:05:00+00:00",
        "2024-01-01 00:06:00+00:00",
        "2024-01-01 00:07:00+00:00",
        "2024-01-01 00:08:00+00:00",
        "2024-01-01 00:09:00+00:00",
    ],
    "open":   [42100, 42200, 42150, 42300, 42250,
               42400, 42350, 42500, 42450, 42600],
    "high":   [42300, 42400, 42350, 42500, 42450,
               42600, 42550, 42700, 42650, 42800],
    "low":    [41900, 42000, 41950, 42100, 42050,
               42200, 42150, 42300, 42250, 42400],
    "close":  [42200, 42150, 42300, 42250, 42400,
               42350, 42500, 42450, 42600, 42550],
    "volume": [10.5, 8.2, 9.1, 11.3, 7.6,
               12.4, 6.8, 13.1, 9.9, 10.2],
}

df = pd.DataFrame(raw_data)
df["datetime"] = pd.to_datetime(df["datetime"], utc=True)

print("--- Raw OHLCV Data ---")
display(df)
--- Raw OHLCV Data ---
datetime open high low close volume
0 2024-01-01 00:00:00+00:00 42100 42300 41900 42200 10.5
1 2024-01-01 00:01:00+00:00 42200 42400 42000 42150 8.2
2 2024-01-01 00:02:00+00:00 42150 42350 41950 42300 9.1
3 2024-01-01 00:03:00+00:00 42300 42500 42100 42250 11.3
4 2024-01-01 00:04:00+00:00 42250 42450 42050 42400 7.6
5 2024-01-01 00:05:00+00:00 42400 42600 42200 42350 12.4
6 2024-01-01 00:06:00+00:00 42350 42550 42150 42500 6.8
7 2024-01-01 00:07:00+00:00 42500 42700 42300 42450 13.1
8 2024-01-01 00:08:00+00:00 42450 42650 42250 42600 9.9
9 2024-01-01 00:09:00+00:00 42600 42800 42400 42550 10.2
[31]
def analyze_volatility(
    df:             pd.DataFrame,
    rolling_window: int   = 5,
    bb_window:      int   = 5,
    bb_std:         float = 2.0,
    periods_per_year: int = 525_600   # minutes in a year
) -> pd.DataFrame:
    """
    Compute volatility indicators from OHLCV data.

    Args:
        df               (pd.DataFrame): Cleaned OHLCV DataFrame with UTC datetime column.
        rolling_window   (int):          Window size for rolling std and ATR.
        bb_window        (int):          Window size for Bollinger Bands.
        bb_std           (float):        Number of standard deviations for Bollinger Band width.
        periods_per_year (int):          Number of candle periods in one year for annualization.

    Returns:
        pd.DataFrame: Input DataFrame extended with computed volatility metrics.
    """
    df = df.copy().sort_values("datetime", ignore_index=True)

    # Log return: required for standard deviation volatility calculation
    log_return = np.log(df["close"] / df["close"].shift(1))

    # Rolling standard deviation of log returns
    df["rolling_std"] = log_return.rolling(window=rolling_window).std()

    # Annualized volatility: scales per-minute std to a full-year equivalent
    df["annualized_vol"] = df["rolling_std"] * np.sqrt(periods_per_year)

    # True Range components
    hl  = df["high"] - df["low"]                           # high minus low
    hpc = (df["high"] - df["close"].shift(1)).abs()        # high minus previous close
    lpc = (df["low"]  - df["close"].shift(1)).abs()        # low  minus previous close

    # True Range: maximum of the three components per candle
    df["true_range"] = pd.concat([hl, hpc, lpc], axis=1).max(axis=1)

    # Average True Range: rolling mean of True Range
    df["atr"] = df["true_range"].rolling(window=rolling_window).mean()

    # Bollinger Bands
    bb_mean          = df["close"].rolling(window=bb_window).mean()
    bb_std_val       = df["close"].rolling(window=bb_window).std()

    df["bb_upper"]   = bb_mean + (bb_std * bb_std_val)
    df["bb_middle"]  = bb_mean
    df["bb_lower"]   = bb_mean - (bb_std * bb_std_val)

    # %B: position of close within Bollinger Bands (0 = lower, 1 = upper)
    df["bb_pct_b"]   = (df["close"] - df["bb_lower"]) / (df["bb_upper"] - df["bb_lower"])

    # Bandwidth: width of the bands relative to the middle band
    # High bandwidth = high volatility; low bandwidth = low volatility (squeeze)
    df["bb_bandwidth"] = (df["bb_upper"] - df["bb_lower"]) / df["bb_middle"]

    return df[[
        "datetime", "open", "high", "low", "close", "volume",
        "rolling_std", "annualized_vol",
        "true_range", "atr",
        "bb_upper", "bb_middle", "bb_lower", "bb_pct_b", "bb_bandwidth",
    ]]

Code Logic

Log return

  • np.log(df["close"] / df["close"].shift(1)): Computed internally as the input to standard deviation. Log returns are used rather than simple returns because their statistical distribution is better behaved for volatility estimation — see Notebook 15 for full definition.

Rolling standard deviation

  • log_return.rolling(window=rolling_window).std(): At each row, computes the standard deviation of the last rolling_window log returns. The first rolling_window − 1 rows produce NaN. A larger value means returns have been more spread out (more volatile) over the recent window.

Annualized volatility

  • df["rolling_std"] * np.sqrt(periods_per_year): Volatility scales with the square root of time — a mathematical property of how random price movements accumulate. Multiplying by √(periods per year) converts per-minute volatility to the annual equivalent. See Section 3 for full definition.

True Range

  • pd.concat([hl, hpc, lpc], axis=1).max(axis=1): Stacks the three True Range components as columns and takes the row-wise maximum. abs() is applied to components 2 and 3 because the gap can be in either direction — both upward and downward gaps represent full movement that must be captured. See Section 3 for full definition.

ATR

  • df["true_range"].rolling(window=rolling_window).mean(): Rolling average of True Range. Smooths candle-to-candle spikes to produce a stable estimate of typical price movement per period.

Bollinger Bands

  • bb_mean: Rolling mean computed once and reused for both the middle band and standard deviation calculation — avoids redundant computation.
  • df["bb_upper"] / df["bb_lower"]: Middle band ± (K × rolling standard deviation). The bands widen when bb_std_val is large (volatile period) and narrow when it is small (quiet period). See Section 3 for full definition.

%B

  • (df["close"] - df["bb_lower"]) / (df["bb_upper"] - df["bb_lower"]): Normalizes the close price to the 0–1 range defined by the lower and upper bands. Values outside 0–1 indicate the price has moved beyond the bands. See Section 3 for full definition.

Bandwidth

  • (df["bb_upper"] - df["bb_lower"]) / df["bb_middle"]: Expresses band width as a fraction of the middle band price, making it comparable across different price levels. A narrow bandwidth (Bollinger Squeeze) often precedes a large directional move as compressed volatility is released.

6. Execution

[32]
df_volatility = analyze_volatility(
    df,
    rolling_window   = 5,
    bb_window        = 5,
    bb_std           = 2.0,
    periods_per_year = 525_600,
)

print("--- Volatility Analysis Output ---")
display(df_volatility)
--- Volatility Analysis Output ---
datetime open high low close volume rolling_std annualized_vol true_range atr bb_upper bb_middle bb_lower bb_pct_b bb_bandwidth
0 2024-01-01 00:00:00+00:00 42100 42300 41900 42200 10.5 NaN NaN 400.0 NaN NaN NaN NaN NaN NaN
1 2024-01-01 00:01:00+00:00 42200 42400 42000 42150 8.2 NaN NaN 400.0 NaN NaN NaN NaN NaN NaN
2 2024-01-01 00:02:00+00:00 42150 42350 41950 42300 9.1 NaN NaN 400.0 NaN NaN NaN NaN NaN NaN
3 2024-01-01 00:03:00+00:00 42300 42500 42100 42250 11.3 NaN NaN 400.0 NaN NaN NaN NaN NaN NaN
4 2024-01-01 00:04:00+00:00 42250 42450 42050 42400 7.6 NaN NaN 400.0 400.0 42452.353841 42260.0 42067.646159 0.863913 0.009103
5 2024-01-01 00:05:00+00:00 42400 42600 42200 42350 12.4 0.002591 1.878609 400.0 400.0 42482.353841 42290.0 42097.646159 0.655963 0.009097
6 2024-01-01 00:06:00+00:00 42350 42550 42150 42500 6.8 0.002588 1.876395 400.0 400.0 42552.353841 42360.0 42167.646159 0.863913 0.009082
7 2024-01-01 00:07:00+00:00 42500 42700 42300 42450 13.1 0.002585 1.874175 400.0 400.0 42582.353841 42390.0 42197.646159 0.655963 0.009075
8 2024-01-01 00:08:00+00:00 42450 42650 42250 42600 9.9 0.002582 1.871972 400.0 400.0 42652.353841 42460.0 42267.646159 0.863913 0.009060
9 2024-01-01 00:09:00+00:00 42600 42800 42400 42550 10.2 0.002579 1.869763 400.0 400.0 42682.353841 42490.0 42297.646159 0.655963 0.009054

7. Summary Statistics

[33]
print("--- Volatility Summary ---")
display(df_volatility[[
    "rolling_std", "annualized_vol", "true_range", "atr",
    "bb_bandwidth", "bb_pct_b"
]].describe().round(6))

print("\n--- Schema Summary ---")
df_volatility.info()
--- Volatility Summary ---
rolling_std annualized_vol true_range atr bb_bandwidth bb_pct_b
count 5.000000 5.000000 10.0 6.0 6.000000 6.000000
mean 0.002585 1.874183 400.0 400.0 0.009079 0.759938
std 0.000005 0.003497 0.0 0.0 0.000019 0.113899
min 0.002579 1.869763 400.0 400.0 0.009054 0.655963
25% 0.002582 1.871972 400.0 400.0 0.009064 0.655963
50% 0.002585 1.874175 400.0 400.0 0.009079 0.759938
75% 0.002588 1.876395 400.0 400.0 0.009093 0.863913
max 0.002591 1.878609 400.0 400.0 0.009103 0.863913

--- Schema Summary ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype              
---  ------          --------------  -----              
 0   datetime        10 non-null     datetime64[ns, UTC]
 1   open            10 non-null     int64              
 2   high            10 non-null     int64              
 3   low             10 non-null     int64              
 4   close           10 non-null     int64              
 5   volume          10 non-null     float64            
 6   rolling_std     5 non-null      float64            
 7   annualized_vol  5 non-null      float64            
 8   true_range      10 non-null     float64            
 9   atr             6 non-null      float64            
 10  bb_upper        6 non-null      float64            
 11  bb_middle       6 non-null      float64            
 12  bb_lower        6 non-null      float64            
 13  bb_pct_b        6 non-null      float64            
 14  bb_bandwidth    6 non-null      float64            
dtypes: datetime64[ns, UTC](1), float64(10), int64(4)
memory usage: 1.3 KB
[33]