Timezone Standardisation
Convert timestamps across exchange timezones to UTC, handle DST edge cases, and align multi-source data to a common index.
Timezone Standardization Framework
This notebook defines a standardized protocol for normalizing OHLCV timestamp columns to UTC. It covers naive timestamp localization, timezone-aware conversion, and UTC datetime derivation on a representative dummy dataset containing mixed timezone sources.
1. Dependency Installation
!pip install pandasRequirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2) Requirement already satisfied: numpy>=1.26.0 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.0.2) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2026.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)
2. Library Imports
import warnings
warnings.filterwarnings("ignore")
import pandas as pd3. Why Timezone Standardization Matters
Every exchange operates its API in its own timezone or returns timestamps with local offsets. When data from multiple exchanges is merged without normalization, the same real-world moment in time appears as different values, causing misaligned joins, incorrect resampling, and silently corrupted backtests.
UTC (Coordinated Universal Time) is the international standard reference timezone — it has no offset, no daylight saving adjustment, and no regional variation. All financial data pipelines standardize to UTC because it is the only timezone that remains constant and unambiguous across all exchanges, jurisdictions, and calendar events.
Timezone-naive vs timezone-aware datetimes
A naive datetime has no timezone attached — 2024-01-01 00:00:00
could mean midnight in any timezone. A timezone-aware datetime carries
an explicit offset — 2024-01-01 00:00:00+05:30 means midnight Indian
Standard Time, which equals 2024-12-31 18:30:00 UTC. Naive datetimes
must be localized (assigned a timezone) before they can be converted
to UTC. Aware datetimes are directly converted.
4. Dummy Dataset
raw_data = {
"exchange": ["Bybit", "Kraken", "OKX", "Binance", "Coinbase" ],
"datetime_str": [
"2024-01-01 00:00:00", # UTC naive
"2024-01-01 05:30:00+05:30", # IST (UTC+5:30)
"2024-01-01 09:00:00+09:00", # JST (UTC+9)
"2023-12-31 19:00:00-05:00", # EST (UTC-5) - Corrected to map to 2024-01-01 00:00:00 UTC
"2024-01-01 08:00:00+08:00", # SGT (UTC+8)
],
"open": [42100.0, 42200.0, 42150.0, 42300.0, 42250.0],
"high": [42300.0, 42400.0, 42350.0, 42500.0, 42450.0],
"low": [41900.0, 42000.0, 41950.0, 42100.0, 42050.0],
"close": [42200.0, 42150.0, 42300.0, 42250.0, 42400.0],
"volume": [10.5, 8.2, 9.1, 11.3, 7.6 ],
}
df_raw = pd.DataFrame(raw_data)
print("--- Raw Data (Mixed Timezones) ---")
display(df_raw[["exchange", "datetime_str"]])--- Raw Data (Mixed Timezones) ---
| exchange | datetime_str | |
|---|---|---|
| 0 | Bybit | 2024-01-01 00:00:00 |
| 1 | Kraken | 2024-01-01 05:30:00+05:30 |
| 2 | OKX | 2024-01-01 09:00:00+09:00 |
| 3 | Binance | 2023-12-31 19:00:00-05:00 |
| 4 | Coinbase | 2024-01-01 08:00:00+08:00 |
Code Logic
- All five rows represent the same absolute moment in time:
2024-01-01 00:00:00 UTC. After normalization alldatetimevalues must be identical — this is used as the verification condition in Section 6. - Row 0 is timezone-naive (Bybit returns UTC strings without an offset marker); all other rows carry explicit UTC offsets.
5. Timezone Standardization Function
def standardize_timezone(df: pd.DataFrame, datetime_col: str = "datetime_str") -> pd.DataFrame:
# Parse all datetime strings to UTC-aware datetime
# utc=True coerces all offsets to UTC; naive strings are treated as UTC
df["datetime"] = pd.to_datetime(df[datetime_col], utc=True, format='ISO8601')
# Explicitly convert to UTC to guarantee uniform tzinfo across all rows
df["datetime"] = df["datetime"].dt.tz_convert("UTC")
# Drop the raw source column
df = df.drop(columns=[datetime_col])
return df[["datetime", "exchange", "open", "high", "low", "close", "volume"]]Code Logic
pd.to_datetime(df[datetime_col], utc=True): Parses a mixed column of timezone-naive and timezone-aware ISO 8601 strings in a single vectorized pass.utc=Trueapplies two rules simultaneously — aware strings are converted to UTC by applying their offset arithmetic; naive strings (no offset marker) are assumed to be UTC and localized directly.dt.tz_convert("UTC"): Applies an explicit final conversion pass to guarantee that every row carries a uniformUTCtzinfo object regardless of how it was parsed. Without this step, edge-case aware strings in non-UTC zones may retain their original offset internally even afterutc=Trueparsing.df.drop(columns=[datetime_col]): Removes the raw source string column, which is fully superseded by the normalizeddatetimecolumn.
6. Execution
df_utc = standardize_timezone(df_raw, datetime_col="datetime_str")
print("--- Standardized UTC Datetimes ---")
display(df_utc[["exchange", "datetime"]])
print("\n--- Verification: All datetimes must be equal (same UTC moment) ---")
n_unique = df_utc["datetime"].nunique()
print(f"{n_unique} unique datetime value(s) — {'PASS' if n_unique == 1 else 'FAIL'}")
print("\n--- Schema Summary ---")
df_utc.info()--- Standardized UTC Datetimes ---
| exchange | datetime | |
|---|---|---|
| 0 | Bybit | 2024-01-01 00:00:00+00:00 |
| 1 | Kraken | 2024-01-01 00:00:00+00:00 |
| 2 | OKX | 2024-01-01 00:00:00+00:00 |
| 3 | Binance | 2024-01-01 00:00:00+00:00 |
| 4 | Coinbase | 2024-01-01 00:00:00+00:00 |
--- Verification: All datetimes must be equal (same UTC moment) --- 1 unique datetime value(s) — PASS --- Schema Summary --- <class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 datetime 5 non-null datetime64[ns, UTC] 1 exchange 5 non-null object 2 open 5 non-null float64 3 high 5 non-null float64 4 low 5 non-null float64 5 close 5 non-null float64 6 volume 5 non-null float64 dtypes: datetime64[ns, UTC](1), float64(5), object(1) memory usage: 412.0+ bytes