Symbol Format Handling
Normalise trading pair formats across exchanges — handle BTC/USDT, BTCUSDT, BTC-USDT variants in a unified pipeline.
Symbol Format Standardization Framework
This notebook defines a standardized protocol for normalizing trading symbol identifiers across exchange-specific formats into a unified canonical schema. It covers delimiter normalization, quote currency inference, instrument type tagging, and case enforcement on a representative dummy dataset.
1. Dependency Installation
!pip install pandasRequirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2) Requirement already satisfied: numpy>=1.26.0 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.0.2) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2026.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)
2. Library Imports
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import reCode Logic
pandas: Provides the DataFrame structure for tabular symbol mapping operations.re: Supplies regex pattern matching for delimiter-agnostic symbol parsing.
3. Why Symbol Normalization Matters
Every exchange uses its own naming convention for the same underlying instrument. The Bitcoin/USDT perpetual futures contract appears as:
| Exchange | Raw Symbol | Problem |
|---|---|---|
| Bybit | BTCUSDT | No delimiter — base and quote are concatenated |
| Kraken | XBT/USD | Slash delimiter; XBT is a non-standard alias for BTC |
| OKX | BTC-USDT-SWAP | Hyphen delimiter with instrument type suffix |
| Binance | ETHUSDT | Same concatenation format as Bybit |
| Coinbase | BTC-USD | Hyphen delimiter, clean format |
| Bitfinex | tBTCUSD | Lowercase t prefix; no delimiter |
| Deribit | BTC-31JAN25-50000-C | Options contract with expiry, strike, and type |
Without normalization, a database join or strategy lookup on BTCUSDT
and XBT/USD will fail to match even though both refer to the same
instrument. Canonical normalization resolves all variants to a single
standard form: BTC-USDT.
4. Dummy Dataset
raw_data = {
"exchange": ["Bybit", "Kraken", "OKX", "Binance", "Coinbase", "Bitfinex", "Deribit" ],
"symbol": ["BTCUSDT", "XBT/USD", "BTC-USDT-SWAP", "ETHUSDT", "BTC-USD", "tBTCUSD", "BTC-31JAN25-50000-C" ],
"open": [42100.0, 42150.0, 42200.0, 3200.0, 42180.0, 42170.0, 42160.0 ],
"high": [42300.0, 42350.0, 42400.0, 3250.0, 42380.0, 42370.0, 42360.0 ],
"low": [41900.0, 41950.0, 42000.0, 3150.0, 41980.0, 41970.0, 41960.0 ],
"close": [42200.0, 42250.0, 42300.0, 3220.0, 42280.0, 42270.0, 42260.0 ],
"volume": [10.5, 8.2, 9.1, 150.3, 7.6, 6.4, 3.2 ]
}
df_raw = pd.DataFrame(raw_data)
print("--- Raw Symbol Formats ---")
display(df_raw[["exchange", "symbol"]])--- Raw Symbol Formats ---
| exchange | symbol | |
|---|---|---|
| 0 | Bybit | BTCUSDT |
| 1 | Kraken | XBT/USD |
| 2 | OKX | BTC-USDT-SWAP |
| 3 | Binance | ETHUSDT |
| 4 | Coinbase | BTC-USD |
| 5 | Bitfinex | tBTCUSD |
| 6 | Deribit | BTC-31JAN25-50000-C |
5. Symbol Normalization Function
ALIAS_MAP = {
"XBT": "BTC",
"XETH": "ETH",
"XXBT": "BTC",
"ZUSD": "USD",
}
QUOTE_CURRENCIES = ["USDT", "USDC", "USD", "BTC", "ETH", "BUSD", "EUR"]
def normalize_symbol(raw_symbol: str, exchange: str) -> dict:
symbol = raw_symbol.upper().strip()
# Strip Bitfinex leading 't' prefix
if exchange.lower() == "bitfinex" and symbol.startswith("T"):
symbol = symbol[1:]
# Detect Deribit options/futures by expiry date pattern
if re.search(r"-\d{2}[A-Z]{3}\d{2}-", symbol):
parts = symbol.split("-")
base = ALIAS_MAP.get(parts[0], parts[0])
return {
"canonical": symbol,
"base": base,
"quote": "USD",
"type": "option" if symbol.endswith(("-C", "-P")) else "future",
}
# Normalize slash delimiter to hyphen
symbol = symbol.replace("/", "-")
# Strip instrument type suffixes
for suffix in ["-SWAP", "-PERP", "-PERPETUAL"]:
if symbol.endswith(suffix):
symbol = symbol[: -len(suffix)]
break
# Split on hyphen delimiter
if "-" in symbol:
parts = symbol.split("-", 1)
base = ALIAS_MAP.get(parts[0], parts[0])
quote = ALIAS_MAP.get(parts[1], parts[1])
return {"canonical": f"{base}-{quote}", "base": base, "quote": quote, "type": "spot"}
# Split delimiter-free symbols using known quote currency list
for q in QUOTE_CURRENCIES:
if symbol.endswith(q):
base = ALIAS_MAP.get(symbol[: -len(q)], symbol[: -len(q)])
quote = ALIAS_MAP.get(q, q)
return {"canonical": f"{base}-{quote}", "base": base, "quote": quote, "type": "spot"}
return {"canonical": symbol, "base": symbol, "quote": "UNKNOWN", "type": "unknown"}
def handle_symbols(df: pd.DataFrame) -> pd.DataFrame:
df = df.copy()
normalized = df.apply(
lambda row: normalize_symbol(row["symbol"], row["exchange"]), axis=1
)
df["canonical"] = normalized.apply(lambda x: x["canonical"])
df["base"] = normalized.apply(lambda x: x["base"])
df["quote"] = normalized.apply(lambda x: x["quote"])
df["type"] = normalized.apply(lambda x: x["type"])
return df[["exchange", "symbol", "canonical", "base", "quote", "type",
"open", "high", "low", "close", "volume"]]Code Logic
ALIAS_MAP: Maps exchange-specific base currency aliases to canonical equivalents. Kraken usesXBTas its internal code for Bitcoin — a legacy identifier from before theBTCticker became universal. This table resolves it and similar aliases to the standard form.QUOTE_CURRENCIES: Ordered list of known quote currencies for delimiter-free symbol splitting. Order is critical —USDTmust appear beforeUSDto preventBTCUSDTbeing incorrectly split into baseBTCUSand quoteT.symbol.startswith("T")with Bitfinex guard: Strips the Bitfinex trading pair prefixt(uppercased toT). Bitfinex prependstto all spot trading pairs andfto all funding symbols — the guard ensures this strip is not applied to other exchanges where a leadingTwould be part of the base currency.re.search(r"-\d{2}[A-Z]{3}\d{2}-", symbol): Detects Deribit derivative contracts. The regex matches a segment like-31JAN25-— a two-digit day, three-letter month abbreviation, and two-digit year — that appears in Deribit expiry-dated instruments. Regular spot or perpetual symbols never contain this pattern.symbol.replace("/", "-"): Normalizes Kraken-style slash delimiters to the uniform hyphen separator used throughout the pipeline.- Suffix stripping loop: Removes OKX-style instrument type suffixes before base/quote extraction. Without this step,
-SWAPwould be treated as the quote currency. symbol.split("-", 1): Splits on the first hyphen only (maxsplit=1). Without the limit,BTC-USDT-EXTRAwould produce three parts instead of two.df.apply(lambda row: ..., axis=1): Applies row-level normalization, passing bothsymbolandexchange— exchange identity is required for the Bitfinex prefix rule and any future exchange-specific branching.
6. Execution
df_normalized = handle_symbols(df_raw)
print("--- Normalized Symbol Mapping ---")
display(df_normalized[["exchange", "symbol", "canonical", "base", "quote", "type"]])
print("\n--- Full Output with OHLCV ---")
display(df_normalized)
print("\n--- Schema Summary ---")
df_normalized.info()--- Normalized Symbol Mapping ---
| exchange | symbol | canonical | base | quote | type | |
|---|---|---|---|---|---|---|
| 0 | Bybit | BTCUSDT | BTC-USDT | BTC | USDT | spot |
| 1 | Kraken | XBT/USD | BTC-USD | BTC | USD | spot |
| 2 | OKX | BTC-USDT-SWAP | BTC-USDT | BTC | USDT | spot |
| 3 | Binance | ETHUSDT | ETH-USDT | ETH | USDT | spot |
| 4 | Coinbase | BTC-USD | BTC-USD | BTC | USD | spot |
| 5 | Bitfinex | tBTCUSD | BTC-USD | BTC | USD | spot |
| 6 | Deribit | BTC-31JAN25-50000-C | BTC-31JAN25-50000-C | BTC | USD | option |
--- Full Output with OHLCV ---
| exchange | symbol | canonical | base | quote | type | open | high | low | close | volume | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Bybit | BTCUSDT | BTC-USDT | BTC | USDT | spot | 42100.0 | 42300.0 | 41900.0 | 42200.0 | 10.5 |
| 1 | Kraken | XBT/USD | BTC-USD | BTC | USD | spot | 42150.0 | 42350.0 | 41950.0 | 42250.0 | 8.2 |
| 2 | OKX | BTC-USDT-SWAP | BTC-USDT | BTC | USDT | spot | 42200.0 | 42400.0 | 42000.0 | 42300.0 | 9.1 |
| 3 | Binance | ETHUSDT | ETH-USDT | ETH | USDT | spot | 3200.0 | 3250.0 | 3150.0 | 3220.0 | 150.3 |
| 4 | Coinbase | BTC-USD | BTC-USD | BTC | USD | spot | 42180.0 | 42380.0 | 41980.0 | 42280.0 | 7.6 |
| 5 | Bitfinex | tBTCUSD | BTC-USD | BTC | USD | spot | 42170.0 | 42370.0 | 41970.0 | 42270.0 | 6.4 |
| 6 | Deribit | BTC-31JAN25-50000-C | BTC-31JAN25-50000-C | BTC | USD | option | 42160.0 | 42360.0 | 41960.0 | 42260.0 | 3.2 |
--- Schema Summary --- <class 'pandas.core.frame.DataFrame'> RangeIndex: 7 entries, 0 to 6 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 exchange 7 non-null object 1 symbol 7 non-null object 2 canonical 7 non-null object 3 base 7 non-null object 4 quote 7 non-null object 5 type 7 non-null object 6 open 7 non-null float64 7 high 7 non-null float64 8 low 7 non-null float64 9 close 7 non-null float64 10 volume 7 non-null float64 dtypes: float64(5), object(6) memory usage: 748.0+ bytes
7. Conclusion
This notebook demonstrates a robust framework for normalizing cryptocurrency trading symbols across various exchanges. By standardizing diverse naming conventions into a unified canonical format, it addresses critical challenges in data integration, ensuring consistent and reliable analysis across different platforms. The implemented normalize_symbol function effectively handles common variations such as delimiter differences, exchange-specific prefixes, instrument type suffixes, and alias mapping, providing a clean and actionable representation of trading instruments. This standardization is crucial for accurate data aggregation, backtesting, and live trading strategies, preventing mismatches and enabling seamless data interoperability.