OHLCV Timeframe Resampling Framework

This notebook defines a standardized protocol for resampling 1-minute OHLCV data into higher timeframes (5-minute, 15-minute, 1-hour) using standard OHLCV aggregation rules on a representative dummy dataset. All timestamps are represented as UTC datetime objects.

1. Dependency Installation

[1]

!pip install pandas

Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2)
Requirement already satisfied: numpy>=1.26.0 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.0.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2026.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)

2. Library Imports

[2]

import warnings
warnings.filterwarnings("ignore")

import pandas as pd

3. What Is Timeframe Resampling?

Exchange APIs deliver OHLCV data at a base resolution — typically 1-minute bars. Quantitative strategies frequently operate on higher timeframes: 5-minute bars for short-term momentum, 1-hour bars for intraday trend following, or daily bars for position-based models. Rather than making separate API calls for each timeframe, higher timeframes are derived from the 1-minute base data using aggregation.

Resampling collapses multiple consecutive 1-minute candles into a single candle for the target window using the following aggregation rules — these are fixed conventions in financial data and do not vary by asset or exchange:

Field	Aggregation Rule	Rationale
`open`	First value in the window	The first trade of the window defines the bar's open
`high`	Maximum value in the window	The highest price reached at any point in the window
`low`	Minimum value in the window	The lowest price reached at any point in the window
`close`	Last value in the window	The final trade of the window defines the bar's close
`volume`	Sum of all values in the window	Total quantity traded across all constituent bars

4. Dummy Dataset

[6]

raw_data = {
    "timestamp": [
        1704067200000, 1704067260000, 1704067320000, 1704067380000, 1704067440000,
        1704067500000, 1704067560000, 1704067620000, 1704067680000, 1704067740000,
        1704067800000, 1704067860000, 1704067920000, 1704067980000, 1704068040000,
        1704068100000, 1704068160000, 1704068220000, 1704068280000, 1704068340000,
    ],
    "open":   [42100, 42200, 42150, 42300, 42250,
               42400, 42350, 42500, 42450, 42600,
               42550, 42700, 42650, 42800, 42750,
               42900, 42850, 43000, 42950, 43100],
    "high":   [42300, 42400, 42350, 42500, 42450,
               42600, 42550, 42700, 42650, 42800,
               42750, 42900, 42850, 43000, 42950,
               43100, 43050, 43200, 43150, 43300],
    "low":    [41900, 42000, 41950, 42100, 42050,
               42200, 42150, 42300, 42250, 42400,
               42350, 42500, 42450, 42600, 42550,
               42700, 42650, 42800, 42750, 42900],
    "close":  [42200, 42150, 42300, 42250, 42400,
               42350, 42500, 42450, 42600, 42550,
               42700, 42650, 42800, 42750, 42900,
               42850, 43000, 42950, 43100, 43050],
    "volume": [10.5, 8.2, 9.1, 11.3, 7.6,
               12.4, 6.8, 13.1, 9.9, 10.2,
               8.7, 11.5, 7.3, 12.8, 9.4,
               10.1, 8.9, 13.5, 7.8, 11.0],
}

df_raw = pd.DataFrame(raw_data)

# Convert Unix millisecond timestamp to UTC datetime and set as index in-place
df_raw["timestamp"] = pd.to_datetime(df_raw["timestamp"], unit="ms", utc=True)
df_raw = df_raw.set_index("timestamp")
df_raw.index.name = "datetime"

print("--- Raw 1-Minute OHLCV Data ---")
display(df_raw.head())

--- Raw 1-Minute OHLCV Data ---

	open	high	low	close	volume
datetime
2024-01-01 00:00:00+00:00	42100	42300	41900	42200	10.5
2024-01-01 00:01:00+00:00	42200	42400	42000	42150	8.2
2024-01-01 00:02:00+00:00	42150	42350	41950	42300	9.1
2024-01-01 00:03:00+00:00	42300	42500	42100	42250	11.3
2024-01-01 00:04:00+00:00	42250	42450	42050	42400	7.6

[ ]

raw_data = {
    "timestamp": [
        1704067200000, 1704067260000, 1704067320000, 1704067380000, 1704067440000,
        1704067500000, 1704067560000, 1704067620000, 1704067680000, 1704067740000,
        1704067800000, 1704067860000, 1704067920000, 1704067980000, 1704068040000,
        1704068100000, 1704068160000, 1704068220000, 1704068280000, 1704068340000,
    ],
    "open":   [42100, 42200, 42150, 42300, 42250,
               42400, 42350, 42500, 42450, 42600,
               42550, 42700, 42650, 42800, 42750,
               42900, 42850, 43000, 42950, 43100],
    "high":   [42300, 42400, 42350, 42500, 42450,
               42600, 42550, 42700, 42650, 42800,
               42750, 42900, 42850, 43000, 42950,
               43100, 43050, 43200, 43150, 43300],
    "low":    [41900, 42000, 41950, 42100, 42050,
               42200, 42150, 42300, 42250, 42400,
               42350, 42500, 42450, 42600, 42550,
               42700, 42650, 42800, 42750, 42900],
    "close":  [42200, 42150, 42300, 42250, 42400,
               42350, 42500, 42450, 42600, 42550,
               42700, 42650, 42800, 42750, 42900,
               42850, 43000, 42950, 43100, 43050],
    "volume": [10.5, 8.2, 9.1, 11.3, 7.6,
               12.4, 6.8, 13.1, 9.9, 10.2,
               8.7, 11.5, 7.3, 12.8, 9.4,
               10.1, 8.9, 13.5, 7.8, 11.0],
}

df_raw = pd.DataFrame(raw_data)

# Convert Unix millisecond timestamp to UTC datetime and set as index in-place
df_raw["timestamp"] = pd.to_datetime(df_raw["timestamp"], unit="ms", utc=True)
df_raw = df_raw.set_index("timestamp")
df_raw.index.name = "datetime"

print("--- Raw 1-Minute OHLCV Data ---")
display(df_raw.head())

--- Raw 1-Minute OHLCV Data ---

	open	high	low	close	volume
datetime
2024-01-01 00:00:00+00:00	42100	42300	41900	42200	10.5
2024-01-01 00:01:00+00:00	42200	42400	42000	42150	8.2
2024-01-01 00:02:00+00:00	42150	42350	41950	42300	9.1
2024-01-01 00:03:00+00:00	42300	42500	42100	42250	11.3
2024-01-01 00:04:00+00:00	42250	42450	42050	42400	7.6

Code Logic

Twenty 1-minute candles spanning approximately 20 minutes — sufficient to demonstrate 5-minute and 15-minute resampling.
pd.to_datetime(..., unit="ms", utc=True): Converts Unix millisecond integers to UTC-aware datetime objects. The pandas resample engine requires a DatetimeIndex and will raise an error if the index is a raw integer.
set_index("datetime"): Promotes the datetime column to the DataFrame index, which is the structure the resample method operates on.

5. Resampling Function

[4]

def resample_ohlcv(df: pd.DataFrame, timeframe: str) -> pd.DataFrame:
    agg_rules = {
        "open":   "first",
        "high":   "max",
        "low":    "min",
        "close":  "last",
        "volume": "sum",
    }

    df_resampled = (
        df.resample(timeframe)
          .agg(agg_rules)
          .dropna(subset=["open", "close"])
          .reset_index()
    )

    df_resampled = df_resampled.rename(columns={"datetime": "datetime"})

    df_resampled = df_resampled.astype({
        "open":   "float64",
        "high":   "float64",
        "low":    "float64",
        "close":  "float64",
        "volume": "float64",
    })

    return df_resampled[["datetime", "open", "high", "low", "close", "volume"]]

Code Logic

agg_rules: The standard OHLCV aggregation contract. See Section 3 for the rationale behind each rule.
df.resample(timeframe): Groups the DatetimeIndex into non-overlapping contiguous windows of the specified frequency. Accepted frequency strings: "5min", "15min", "1h", "4h", "1D". The window always aligns to calendar boundaries — a "5min" resample starting at 00:01 produces windows 00:00–00:05, 00:05–00:10, etc., not 00:01–00:06.
.dropna(subset=["open", "close"]): Removes empty windows produced by resample when no source candles fall within that period — common at series edges or during exchange maintenance windows.
.reset_index(): Converts the DatetimeIndex produced by resample back into a regular datetime column, restoring the flat DataFrame structure used throughout this pipeline.
.astype({...}): Re-enforces numeric types after aggregation, which can produce object dtype on edge-case empty windows.

6. Execution

[5]

df_5min  = resample_ohlcv(df_raw, "5min")
df_15min = resample_ohlcv(df_raw, "15min")
df_1h    = resample_ohlcv(df_raw, "1h")

print("--- 5-Minute OHLCV ---")
display(df_5min)

print("\n--- 15-Minute OHLCV ---")
display(df_15min)

print("\n--- 1-Hour OHLCV ---")
display(df_1h)

print("\n--- Schema Summary (5-Minute) ---")
df_5min.info()

--- 5-Minute OHLCV ---

	datetime	open	high	low	close	volume
0	2024-01-01 00:00:00+00:00	42100.0	42500.0	41900.0	42400.0	46.7
1	2024-01-01 00:05:00+00:00	42400.0	42800.0	42150.0	42550.0	52.4
2	2024-01-01 00:10:00+00:00	42550.0	43000.0	42350.0	42900.0	49.7
3	2024-01-01 00:15:00+00:00	42900.0	43300.0	42650.0	43050.0	51.3


--- 15-Minute OHLCV ---

	datetime	open	high	low	close	volume
0	2024-01-01 00:00:00+00:00	42100.0	43000.0	41900.0	42900.0	148.8
1	2024-01-01 00:15:00+00:00	42900.0	43300.0	42650.0	43050.0	51.3


--- 1-Hour OHLCV ---

	datetime	open	high	low	close	volume
0	2024-01-01 00:00:00+00:00	42100.0	43300.0	41900.0	43050.0	200.1


--- Schema Summary (5-Minute) ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype              
---  ------    --------------  -----              
 0   datetime  4 non-null      datetime64[ns, UTC]
 1   open      4 non-null      float64            
 2   high      4 non-null      float64            
 3   low       4 non-null      float64            
 4   close     4 non-null      float64            
 5   volume    4 non-null      float64            
dtypes: datetime64[ns, UTC](1), float64(5)
memory usage: 324.0 bytes

[ ]

Resample Timeframes

OHLCV Timeframe Resampling Framework

1. Dependency Installation

2. Library Imports

3. What Is Timeframe Resampling?

4. Dummy Dataset

Code Logic

5. Resampling Function

Code Logic

6. Execution