Data·Cleaning·Beginner

Resample Timeframes

Aggregate tick, 1m, and 5m OHLCV data into higher timeframes with correct open/high/low/close/volume resampling semantics.

resamplingtimeframesaggregation

OHLCV Timeframe Resampling Framework

This notebook defines a standardized protocol for resampling 1-minute OHLCV data into higher timeframes (5-minute, 15-minute, 1-hour) using standard OHLCV aggregation rules on a representative dummy dataset. All timestamps are represented as UTC datetime objects.


1. Dependency Installation

[1]
!pip install pandas
Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2)
Requirement already satisfied: numpy>=1.26.0 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.0.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2026.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)

2. Library Imports

[2]
import warnings
warnings.filterwarnings("ignore")

import pandas as pd

3. What Is Timeframe Resampling?

Exchange APIs deliver OHLCV data at a base resolution — typically 1-minute bars. Quantitative strategies frequently operate on higher timeframes: 5-minute bars for short-term momentum, 1-hour bars for intraday trend following, or daily bars for position-based models. Rather than making separate API calls for each timeframe, higher timeframes are derived from the 1-minute base data using aggregation.

Resampling collapses multiple consecutive 1-minute candles into a single candle for the target window using the following aggregation rules — these are fixed conventions in financial data and do not vary by asset or exchange:

FieldAggregation RuleRationale
openFirst value in the windowThe first trade of the window defines the bar's open
highMaximum value in the windowThe highest price reached at any point in the window
lowMinimum value in the windowThe lowest price reached at any point in the window
closeLast value in the windowThe final trade of the window defines the bar's close
volumeSum of all values in the windowTotal quantity traded across all constituent bars

4. Dummy Dataset

[6]
raw_data = {
    "timestamp": [
        1704067200000, 1704067260000, 1704067320000, 1704067380000, 1704067440000,
        1704067500000, 1704067560000, 1704067620000, 1704067680000, 1704067740000,
        1704067800000, 1704067860000, 1704067920000, 1704067980000, 1704068040000,
        1704068100000, 1704068160000, 1704068220000, 1704068280000, 1704068340000,
    ],
    "open":   [42100, 42200, 42150, 42300, 42250,
               42400, 42350, 42500, 42450, 42600,
               42550, 42700, 42650, 42800, 42750,
               42900, 42850, 43000, 42950, 43100],
    "high":   [42300, 42400, 42350, 42500, 42450,
               42600, 42550, 42700, 42650, 42800,
               42750, 42900, 42850, 43000, 42950,
               43100, 43050, 43200, 43150, 43300],
    "low":    [41900, 42000, 41950, 42100, 42050,
               42200, 42150, 42300, 42250, 42400,
               42350, 42500, 42450, 42600, 42550,
               42700, 42650, 42800, 42750, 42900],
    "close":  [42200, 42150, 42300, 42250, 42400,
               42350, 42500, 42450, 42600, 42550,
               42700, 42650, 42800, 42750, 42900,
               42850, 43000, 42950, 43100, 43050],
    "volume": [10.5, 8.2, 9.1, 11.3, 7.6,
               12.4, 6.8, 13.1, 9.9, 10.2,
               8.7, 11.5, 7.3, 12.8, 9.4,
               10.1, 8.9, 13.5, 7.8, 11.0],
}

df_raw = pd.DataFrame(raw_data)

# Convert Unix millisecond timestamp to UTC datetime and set as index in-place
df_raw["timestamp"] = pd.to_datetime(df_raw["timestamp"], unit="ms", utc=True)
df_raw = df_raw.set_index("timestamp")
df_raw.index.name = "datetime"

print("--- Raw 1-Minute OHLCV Data ---")
display(df_raw.head())
--- Raw 1-Minute OHLCV Data ---
open high low close volume
datetime
2024-01-01 00:00:00+00:00 42100 42300 41900 42200 10.5
2024-01-01 00:01:00+00:00 42200 42400 42000 42150 8.2
2024-01-01 00:02:00+00:00 42150 42350 41950 42300 9.1
2024-01-01 00:03:00+00:00 42300 42500 42100 42250 11.3
2024-01-01 00:04:00+00:00 42250 42450 42050 42400 7.6
[ ]
raw_data = {
    "timestamp": [
        1704067200000, 1704067260000, 1704067320000, 1704067380000, 1704067440000,
        1704067500000, 1704067560000, 1704067620000, 1704067680000, 1704067740000,
        1704067800000, 1704067860000, 1704067920000, 1704067980000, 1704068040000,
        1704068100000, 1704068160000, 1704068220000, 1704068280000, 1704068340000,
    ],
    "open":   [42100, 42200, 42150, 42300, 42250,
               42400, 42350, 42500, 42450, 42600,
               42550, 42700, 42650, 42800, 42750,
               42900, 42850, 43000, 42950, 43100],
    "high":   [42300, 42400, 42350, 42500, 42450,
               42600, 42550, 42700, 42650, 42800,
               42750, 42900, 42850, 43000, 42950,
               43100, 43050, 43200, 43150, 43300],
    "low":    [41900, 42000, 41950, 42100, 42050,
               42200, 42150, 42300, 42250, 42400,
               42350, 42500, 42450, 42600, 42550,
               42700, 42650, 42800, 42750, 42900],
    "close":  [42200, 42150, 42300, 42250, 42400,
               42350, 42500, 42450, 42600, 42550,
               42700, 42650, 42800, 42750, 42900,
               42850, 43000, 42950, 43100, 43050],
    "volume": [10.5, 8.2, 9.1, 11.3, 7.6,
               12.4, 6.8, 13.1, 9.9, 10.2,
               8.7, 11.5, 7.3, 12.8, 9.4,
               10.1, 8.9, 13.5, 7.8, 11.0],
}

df_raw = pd.DataFrame(raw_data)

# Convert Unix millisecond timestamp to UTC datetime and set as index in-place
df_raw["timestamp"] = pd.to_datetime(df_raw["timestamp"], unit="ms", utc=True)
df_raw = df_raw.set_index("timestamp")
df_raw.index.name = "datetime"

print("--- Raw 1-Minute OHLCV Data ---")
display(df_raw.head())
--- Raw 1-Minute OHLCV Data ---
open high low close volume
datetime
2024-01-01 00:00:00+00:00 42100 42300 41900 42200 10.5
2024-01-01 00:01:00+00:00 42200 42400 42000 42150 8.2
2024-01-01 00:02:00+00:00 42150 42350 41950 42300 9.1
2024-01-01 00:03:00+00:00 42300 42500 42100 42250 11.3
2024-01-01 00:04:00+00:00 42250 42450 42050 42400 7.6

Code Logic

  • Twenty 1-minute candles spanning approximately 20 minutes — sufficient to demonstrate 5-minute and 15-minute resampling.
  • pd.to_datetime(..., unit="ms", utc=True): Converts Unix millisecond integers to UTC-aware datetime objects. The pandas resample engine requires a DatetimeIndex and will raise an error if the index is a raw integer.
  • set_index("datetime"): Promotes the datetime column to the DataFrame index, which is the structure the resample method operates on.

5. Resampling Function

[4]
def resample_ohlcv(df: pd.DataFrame, timeframe: str) -> pd.DataFrame:
    agg_rules = {
        "open":   "first",
        "high":   "max",
        "low":    "min",
        "close":  "last",
        "volume": "sum",
    }

    df_resampled = (
        df.resample(timeframe)
          .agg(agg_rules)
          .dropna(subset=["open", "close"])
          .reset_index()
    )

    df_resampled = df_resampled.rename(columns={"datetime": "datetime"})

    df_resampled = df_resampled.astype({
        "open":   "float64",
        "high":   "float64",
        "low":    "float64",
        "close":  "float64",
        "volume": "float64",
    })

    return df_resampled[["datetime", "open", "high", "low", "close", "volume"]]

Code Logic

  • agg_rules: The standard OHLCV aggregation contract. See Section 3 for the rationale behind each rule.
  • df.resample(timeframe): Groups the DatetimeIndex into non-overlapping contiguous windows of the specified frequency. Accepted frequency strings: "5min", "15min", "1h", "4h", "1D". The window always aligns to calendar boundaries — a "5min" resample starting at 00:01 produces windows 00:00–00:05, 00:05–00:10, etc., not 00:01–00:06.
  • .dropna(subset=["open", "close"]): Removes empty windows produced by resample when no source candles fall within that period — common at series edges or during exchange maintenance windows.
  • .reset_index(): Converts the DatetimeIndex produced by resample back into a regular datetime column, restoring the flat DataFrame structure used throughout this pipeline.
  • .astype({...}): Re-enforces numeric types after aggregation, which can produce object dtype on edge-case empty windows.

6. Execution

[5]
df_5min  = resample_ohlcv(df_raw, "5min")
df_15min = resample_ohlcv(df_raw, "15min")
df_1h    = resample_ohlcv(df_raw, "1h")

print("--- 5-Minute OHLCV ---")
display(df_5min)

print("\n--- 15-Minute OHLCV ---")
display(df_15min)

print("\n--- 1-Hour OHLCV ---")
display(df_1h)

print("\n--- Schema Summary (5-Minute) ---")
df_5min.info()
--- 5-Minute OHLCV ---
datetime open high low close volume
0 2024-01-01 00:00:00+00:00 42100.0 42500.0 41900.0 42400.0 46.7
1 2024-01-01 00:05:00+00:00 42400.0 42800.0 42150.0 42550.0 52.4
2 2024-01-01 00:10:00+00:00 42550.0 43000.0 42350.0 42900.0 49.7
3 2024-01-01 00:15:00+00:00 42900.0 43300.0 42650.0 43050.0 51.3

--- 15-Minute OHLCV ---
datetime open high low close volume
0 2024-01-01 00:00:00+00:00 42100.0 43000.0 41900.0 42900.0 148.8
1 2024-01-01 00:15:00+00:00 42900.0 43300.0 42650.0 43050.0 51.3

--- 1-Hour OHLCV ---
datetime open high low close volume
0 2024-01-01 00:00:00+00:00 42100.0 43300.0 41900.0 43050.0 200.1

--- Schema Summary (5-Minute) ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype              
---  ------    --------------  -----              
 0   datetime  4 non-null      datetime64[ns, UTC]
 1   open      4 non-null      float64            
 2   high      4 non-null      float64            
 3   low       4 non-null      float64            
 4   close     4 non-null      float64            
 5   volume    4 non-null      float64            
dtypes: datetime64[ns, UTC](1), float64(5)
memory usage: 324.0 bytes
[ ]