Resample Timeframes
Aggregate tick, 1m, and 5m OHLCV data into higher timeframes with correct open/high/low/close/volume resampling semantics.
OHLCV Timeframe Resampling Framework
This notebook defines a standardized protocol for resampling 1-minute OHLCV data into higher timeframes (5-minute, 15-minute, 1-hour) using standard OHLCV aggregation rules on a representative dummy dataset. All timestamps are represented as UTC datetime objects.
1. Dependency Installation
!pip install pandasRequirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2) Requirement already satisfied: numpy>=1.26.0 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.0.2) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2026.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)
2. Library Imports
import warnings
warnings.filterwarnings("ignore")
import pandas as pd3. What Is Timeframe Resampling?
Exchange APIs deliver OHLCV data at a base resolution — typically 1-minute bars. Quantitative strategies frequently operate on higher timeframes: 5-minute bars for short-term momentum, 1-hour bars for intraday trend following, or daily bars for position-based models. Rather than making separate API calls for each timeframe, higher timeframes are derived from the 1-minute base data using aggregation.
Resampling collapses multiple consecutive 1-minute candles into a single candle for the target window using the following aggregation rules — these are fixed conventions in financial data and do not vary by asset or exchange:
| Field | Aggregation Rule | Rationale |
|---|---|---|
open | First value in the window | The first trade of the window defines the bar's open |
high | Maximum value in the window | The highest price reached at any point in the window |
low | Minimum value in the window | The lowest price reached at any point in the window |
close | Last value in the window | The final trade of the window defines the bar's close |
volume | Sum of all values in the window | Total quantity traded across all constituent bars |
4. Dummy Dataset
raw_data = {
"timestamp": [
1704067200000, 1704067260000, 1704067320000, 1704067380000, 1704067440000,
1704067500000, 1704067560000, 1704067620000, 1704067680000, 1704067740000,
1704067800000, 1704067860000, 1704067920000, 1704067980000, 1704068040000,
1704068100000, 1704068160000, 1704068220000, 1704068280000, 1704068340000,
],
"open": [42100, 42200, 42150, 42300, 42250,
42400, 42350, 42500, 42450, 42600,
42550, 42700, 42650, 42800, 42750,
42900, 42850, 43000, 42950, 43100],
"high": [42300, 42400, 42350, 42500, 42450,
42600, 42550, 42700, 42650, 42800,
42750, 42900, 42850, 43000, 42950,
43100, 43050, 43200, 43150, 43300],
"low": [41900, 42000, 41950, 42100, 42050,
42200, 42150, 42300, 42250, 42400,
42350, 42500, 42450, 42600, 42550,
42700, 42650, 42800, 42750, 42900],
"close": [42200, 42150, 42300, 42250, 42400,
42350, 42500, 42450, 42600, 42550,
42700, 42650, 42800, 42750, 42900,
42850, 43000, 42950, 43100, 43050],
"volume": [10.5, 8.2, 9.1, 11.3, 7.6,
12.4, 6.8, 13.1, 9.9, 10.2,
8.7, 11.5, 7.3, 12.8, 9.4,
10.1, 8.9, 13.5, 7.8, 11.0],
}
df_raw = pd.DataFrame(raw_data)
# Convert Unix millisecond timestamp to UTC datetime and set as index in-place
df_raw["timestamp"] = pd.to_datetime(df_raw["timestamp"], unit="ms", utc=True)
df_raw = df_raw.set_index("timestamp")
df_raw.index.name = "datetime"
print("--- Raw 1-Minute OHLCV Data ---")
display(df_raw.head())--- Raw 1-Minute OHLCV Data ---
| open | high | low | close | volume | |
|---|---|---|---|---|---|
| datetime | |||||
| 2024-01-01 00:00:00+00:00 | 42100 | 42300 | 41900 | 42200 | 10.5 |
| 2024-01-01 00:01:00+00:00 | 42200 | 42400 | 42000 | 42150 | 8.2 |
| 2024-01-01 00:02:00+00:00 | 42150 | 42350 | 41950 | 42300 | 9.1 |
| 2024-01-01 00:03:00+00:00 | 42300 | 42500 | 42100 | 42250 | 11.3 |
| 2024-01-01 00:04:00+00:00 | 42250 | 42450 | 42050 | 42400 | 7.6 |
raw_data = {
"timestamp": [
1704067200000, 1704067260000, 1704067320000, 1704067380000, 1704067440000,
1704067500000, 1704067560000, 1704067620000, 1704067680000, 1704067740000,
1704067800000, 1704067860000, 1704067920000, 1704067980000, 1704068040000,
1704068100000, 1704068160000, 1704068220000, 1704068280000, 1704068340000,
],
"open": [42100, 42200, 42150, 42300, 42250,
42400, 42350, 42500, 42450, 42600,
42550, 42700, 42650, 42800, 42750,
42900, 42850, 43000, 42950, 43100],
"high": [42300, 42400, 42350, 42500, 42450,
42600, 42550, 42700, 42650, 42800,
42750, 42900, 42850, 43000, 42950,
43100, 43050, 43200, 43150, 43300],
"low": [41900, 42000, 41950, 42100, 42050,
42200, 42150, 42300, 42250, 42400,
42350, 42500, 42450, 42600, 42550,
42700, 42650, 42800, 42750, 42900],
"close": [42200, 42150, 42300, 42250, 42400,
42350, 42500, 42450, 42600, 42550,
42700, 42650, 42800, 42750, 42900,
42850, 43000, 42950, 43100, 43050],
"volume": [10.5, 8.2, 9.1, 11.3, 7.6,
12.4, 6.8, 13.1, 9.9, 10.2,
8.7, 11.5, 7.3, 12.8, 9.4,
10.1, 8.9, 13.5, 7.8, 11.0],
}
df_raw = pd.DataFrame(raw_data)
# Convert Unix millisecond timestamp to UTC datetime and set as index in-place
df_raw["timestamp"] = pd.to_datetime(df_raw["timestamp"], unit="ms", utc=True)
df_raw = df_raw.set_index("timestamp")
df_raw.index.name = "datetime"
print("--- Raw 1-Minute OHLCV Data ---")
display(df_raw.head())--- Raw 1-Minute OHLCV Data ---
| open | high | low | close | volume | |
|---|---|---|---|---|---|
| datetime | |||||
| 2024-01-01 00:00:00+00:00 | 42100 | 42300 | 41900 | 42200 | 10.5 |
| 2024-01-01 00:01:00+00:00 | 42200 | 42400 | 42000 | 42150 | 8.2 |
| 2024-01-01 00:02:00+00:00 | 42150 | 42350 | 41950 | 42300 | 9.1 |
| 2024-01-01 00:03:00+00:00 | 42300 | 42500 | 42100 | 42250 | 11.3 |
| 2024-01-01 00:04:00+00:00 | 42250 | 42450 | 42050 | 42400 | 7.6 |
Code Logic
- Twenty 1-minute candles spanning approximately 20 minutes — sufficient to demonstrate 5-minute and 15-minute resampling.
pd.to_datetime(..., unit="ms", utc=True): Converts Unix millisecond integers to UTC-aware datetime objects. The pandasresampleengine requires aDatetimeIndexand will raise an error if the index is a raw integer.set_index("datetime"): Promotes the datetime column to the DataFrame index, which is the structure theresamplemethod operates on.
5. Resampling Function
def resample_ohlcv(df: pd.DataFrame, timeframe: str) -> pd.DataFrame:
agg_rules = {
"open": "first",
"high": "max",
"low": "min",
"close": "last",
"volume": "sum",
}
df_resampled = (
df.resample(timeframe)
.agg(agg_rules)
.dropna(subset=["open", "close"])
.reset_index()
)
df_resampled = df_resampled.rename(columns={"datetime": "datetime"})
df_resampled = df_resampled.astype({
"open": "float64",
"high": "float64",
"low": "float64",
"close": "float64",
"volume": "float64",
})
return df_resampled[["datetime", "open", "high", "low", "close", "volume"]]Code Logic
agg_rules: The standard OHLCV aggregation contract. See Section 3 for the rationale behind each rule.df.resample(timeframe): Groups theDatetimeIndexinto non-overlapping contiguous windows of the specified frequency. Accepted frequency strings:"5min","15min","1h","4h","1D". The window always aligns to calendar boundaries — a"5min"resample starting at 00:01 produces windows 00:00–00:05, 00:05–00:10, etc., not 00:01–00:06..dropna(subset=["open", "close"]): Removes empty windows produced byresamplewhen no source candles fall within that period — common at series edges or during exchange maintenance windows..reset_index(): Converts theDatetimeIndexproduced byresampleback into a regulardatetimecolumn, restoring the flat DataFrame structure used throughout this pipeline..astype({...}): Re-enforces numeric types after aggregation, which can produceobjectdtype on edge-case empty windows.
6. Execution
df_5min = resample_ohlcv(df_raw, "5min")
df_15min = resample_ohlcv(df_raw, "15min")
df_1h = resample_ohlcv(df_raw, "1h")
print("--- 5-Minute OHLCV ---")
display(df_5min)
print("\n--- 15-Minute OHLCV ---")
display(df_15min)
print("\n--- 1-Hour OHLCV ---")
display(df_1h)
print("\n--- Schema Summary (5-Minute) ---")
df_5min.info()--- 5-Minute OHLCV ---
| datetime | open | high | low | close | volume | |
|---|---|---|---|---|---|---|
| 0 | 2024-01-01 00:00:00+00:00 | 42100.0 | 42500.0 | 41900.0 | 42400.0 | 46.7 |
| 1 | 2024-01-01 00:05:00+00:00 | 42400.0 | 42800.0 | 42150.0 | 42550.0 | 52.4 |
| 2 | 2024-01-01 00:10:00+00:00 | 42550.0 | 43000.0 | 42350.0 | 42900.0 | 49.7 |
| 3 | 2024-01-01 00:15:00+00:00 | 42900.0 | 43300.0 | 42650.0 | 43050.0 | 51.3 |
--- 15-Minute OHLCV ---
| datetime | open | high | low | close | volume | |
|---|---|---|---|---|---|---|
| 0 | 2024-01-01 00:00:00+00:00 | 42100.0 | 43000.0 | 41900.0 | 42900.0 | 148.8 |
| 1 | 2024-01-01 00:15:00+00:00 | 42900.0 | 43300.0 | 42650.0 | 43050.0 | 51.3 |
--- 1-Hour OHLCV ---
| datetime | open | high | low | close | volume | |
|---|---|---|---|---|---|---|
| 0 | 2024-01-01 00:00:00+00:00 | 42100.0 | 43300.0 | 41900.0 | 43050.0 | 200.1 |
--- Schema Summary (5-Minute) --- <class 'pandas.core.frame.DataFrame'> RangeIndex: 4 entries, 0 to 3 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 datetime 4 non-null datetime64[ns, UTC] 1 open 4 non-null float64 2 high 4 non-null float64 3 low 4 non-null float64 4 close 4 non-null float64 5 volume 4 non-null float64 dtypes: datetime64[ns, UTC](1), float64(5) memory usage: 324.0 bytes