Price Analysis
Exploratory price analysis: return distributions, autocorrelation, fat tails, and stationarity tests on OHLCV data.
Price Analysis Framework
This notebook defines a standardized protocol for analyzing price movements in OHLCV time-series data. It covers return calculation, trend detection, price range measurement, and summary statistics on a representative dummy dataset.
1. Dependency Installation
!pip install pandas numpyRequirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2) Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (2.0.2) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2026.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)
2. Library Imports
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import numpy as npCode Logic
pandas: Provides the DataFrame structure and vectorized price computation methods.numpy: Supplies mathematical functions used in return and range calculations.
3. Key Concepts
Close Price The close price is the final traded price within a candle window. It is the most widely used price field in analysis because it represents the market's last agreed-upon value for that period — buyers and sellers both accepted this price at the end of the window.
Return A return measures how much the price changed between two periods, expressed as a percentage. A return of +2% means the price rose by 2% from the previous candle's close to the current candle's close. Returns are used instead of raw price differences because they are comparable across assets with different price levels — a $100 move in BTC (worth $42,000) and a $100 move in a $200 stock are very different in relative terms, but their returns express this clearly.
Simple Return Simple return = (current close − previous close) / previous close. This is the standard percentage change between two consecutive periods.
Log Return Log return = natural log(current close / previous close). Log returns are preferred in quantitative analysis because they are additive across time — the log return over two periods equals the sum of the individual period log returns. This property makes them easier to aggregate and model statistically.
Cumulative Return Cumulative return shows the total percentage gain or loss from the starting price to any given point in time. It answers the question: "If purchased at the first candle, what is the total profit or loss at this point?"
Rolling Mean (Moving Average) A rolling mean computes the average close price over a fixed window of recent candles. At each point in time, it looks back N candles and averages their close prices. It smooths out short-term price noise and reveals the underlying trend direction. A price consistently above its moving average indicates an uptrend; below indicates a downtrend.
Price Range
The price range of a candle is the difference between its high and
low: high − low. It measures how much the price moved within a
single candle window — a large range indicates high activity or
volatility in that period; a small range indicates a quiet period.
4. Dummy Dataset
raw_data = {
"datetime": [
"2024-01-01 00:00:00+00:00",
"2024-01-01 00:01:00+00:00",
"2024-01-01 00:02:00+00:00",
"2024-01-01 00:03:00+00:00",
"2024-01-01 00:04:00+00:00",
"2024-01-01 00:05:00+00:00",
"2024-01-01 00:06:00+00:00",
"2024-01-01 00:07:00+00:00",
"2024-01-01 00:08:00+00:00",
"2024-01-01 00:09:00+00:00",
],
"open": [42100, 42200, 42150, 42300, 42250,
42400, 42350, 42500, 42450, 42600],
"high": [42300, 42400, 42350, 42500, 42450,
42600, 42550, 42700, 42650, 42800],
"low": [41900, 42000, 41950, 42100, 42050,
42200, 42150, 42300, 42250, 42400],
"close": [42200, 42150, 42300, 42250, 42400,
42350, 42500, 42450, 42600, 42550],
"volume": [10.5, 8.2, 9.1, 11.3, 7.6,
12.4, 6.8, 13.1, 9.9, 10.2],
}
df = pd.DataFrame(raw_data)
df["datetime"] = pd.to_datetime(df["datetime"], utc=True)
print("--- Raw OHLCV Data ---")
display(df)--- Raw OHLCV Data ---
| datetime | open | high | low | close | volume | |
|---|---|---|---|---|---|---|
| 0 | 2024-01-01 00:00:00+00:00 | 42100 | 42300 | 41900 | 42200 | 10.5 |
| 1 | 2024-01-01 00:01:00+00:00 | 42200 | 42400 | 42000 | 42150 | 8.2 |
| 2 | 2024-01-01 00:02:00+00:00 | 42150 | 42350 | 41950 | 42300 | 9.1 |
| 3 | 2024-01-01 00:03:00+00:00 | 42300 | 42500 | 42100 | 42250 | 11.3 |
| 4 | 2024-01-01 00:04:00+00:00 | 42250 | 42450 | 42050 | 42400 | 7.6 |
| 5 | 2024-01-01 00:05:00+00:00 | 42400 | 42600 | 42200 | 42350 | 12.4 |
| 6 | 2024-01-01 00:06:00+00:00 | 42350 | 42550 | 42150 | 42500 | 6.8 |
| 7 | 2024-01-01 00:07:00+00:00 | 42500 | 42700 | 42300 | 42450 | 13.1 |
| 8 | 2024-01-01 00:08:00+00:00 | 42450 | 42650 | 42250 | 42600 | 9.9 |
| 9 | 2024-01-01 00:09:00+00:00 | 42600 | 42800 | 42400 | 42550 | 10.2 |
5. Price Analysis Function
def analyze_price(df: pd.DataFrame, rolling_window: int = 5) -> pd.DataFrame:
"""
Compute price movement indicators from OHLCV data.
Args:
df (pd.DataFrame): Cleaned OHLCV DataFrame with UTC datetime column.
rolling_window (int): Number of candles for rolling mean calculation.
Returns:
pd.DataFrame: Input DataFrame extended with computed price metrics.
"""
df = df.copy().sort_values("datetime", ignore_index=True)
# Simple return: percentage change from previous close to current close
df["simple_return"] = df["close"].pct_change()
# Log return: natural log of the price ratio between consecutive closes
df["log_return"] = np.log(df["close"] / df["close"].shift(1))
# Cumulative return: total percentage change from the first close price
df["cumulative_return"] = (df["close"] / df["close"].iloc[0]) - 1
# Rolling mean: average close price over the last N candles
df["rolling_mean"] = df["close"].rolling(window=rolling_window).mean()
# Price range: distance between high and low within each candle
df["price_range"] = df["high"] - df["low"]
# Price direction: +1 if close is higher than open (up candle),
# -1 if close is lower than open (down candle),
# 0 if unchanged
df["direction"] = np.sign(df["close"] - df["open"]).astype(int)
return df[[
"datetime", "open", "high", "low", "close", "volume",
"simple_return", "log_return", "cumulative_return",
"rolling_mean", "price_range", "direction",
]]Code Logic
Simple return
df["close"].pct_change(): Computes(current − previous) / previousfor each row. The first row producesNaNbecause no prior close exists. A positive value indicates a price increase; negative indicates a decrease.
Log return
np.log(df["close"] / df["close"].shift(1)): Divides each close by the previous close (shift(1)moves the column down by one row) and takes the natural logarithm. The first row producesNaN. Log returns are numerically close to simple returns for small price changes but become more accurate for large moves and are mathematically easier to aggregate across time.
Cumulative return
df["close"] / df["close"].iloc[0]) - 1: Divides every close price by the very first close price in the series (iloc[0]). A value of 0.05 means the price is 5% above where it started; −0.03 means 3% below.
Rolling mean
df["close"].rolling(window=rolling_window).mean(): At each row, computes the average of the current and the previousrolling_window − 1close prices. The firstrolling_window − 1rows produceNaNbecause insufficient prior data exists to fill the window.
Price range
df["high"] - df["low"]: The difference between the highest and lowest price within each candle. A large value indicates significant price movement within the period; a small value indicates a quiet, consolidating period.
Direction
np.sign(df["close"] - df["open"]): Returns +1 when close is above open (buyers dominated the candle), −1 when close is below open (sellers dominated), and 0 when they are equal. This is the standard definition of a bullish (+1) or bearish (−1) candle.
6. Execution
ROLLING_WINDOW = 5
df_analysis = analyze_price(df, rolling_window=ROLLING_WINDOW)
print("--- Price Analysis Output ---")
display(df_analysis)--- Price Analysis Output ---
| datetime | open | high | low | close | volume | simple_return | log_return | cumulative_return | rolling_mean | price_range | direction | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2024-01-01 00:00:00+00:00 | 42100 | 42300 | 41900 | 42200 | 10.5 | NaN | NaN | 0.000000 | NaN | 400 | 1 |
| 1 | 2024-01-01 00:01:00+00:00 | 42200 | 42400 | 42000 | 42150 | 8.2 | -0.001185 | -0.001186 | -0.001185 | NaN | 400 | -1 |
| 2 | 2024-01-01 00:02:00+00:00 | 42150 | 42350 | 41950 | 42300 | 9.1 | 0.003559 | 0.003552 | 0.002370 | NaN | 400 | 1 |
| 3 | 2024-01-01 00:03:00+00:00 | 42300 | 42500 | 42100 | 42250 | 11.3 | -0.001182 | -0.001183 | 0.001185 | NaN | 400 | -1 |
| 4 | 2024-01-01 00:04:00+00:00 | 42250 | 42450 | 42050 | 42400 | 7.6 | 0.003550 | 0.003544 | 0.004739 | 42260.0 | 400 | 1 |
| 5 | 2024-01-01 00:05:00+00:00 | 42400 | 42600 | 42200 | 42350 | 12.4 | -0.001179 | -0.001180 | 0.003555 | 42290.0 | 400 | -1 |
| 6 | 2024-01-01 00:06:00+00:00 | 42350 | 42550 | 42150 | 42500 | 6.8 | 0.003542 | 0.003536 | 0.007109 | 42360.0 | 400 | 1 |
| 7 | 2024-01-01 00:07:00+00:00 | 42500 | 42700 | 42300 | 42450 | 13.1 | -0.001176 | -0.001177 | 0.005924 | 42390.0 | 400 | -1 |
| 8 | 2024-01-01 00:08:00+00:00 | 42450 | 42650 | 42250 | 42600 | 9.9 | 0.003534 | 0.003527 | 0.009479 | 42460.0 | 400 | 1 |
| 9 | 2024-01-01 00:09:00+00:00 | 42600 | 42800 | 42400 | 42550 | 10.2 | -0.001174 | -0.001174 | 0.008294 | 42490.0 | 400 | -1 |
7. Summary Statistics
print("--- Return Summary ---")
display(df_analysis[["simple_return", "log_return", "cumulative_return"]].describe().round(6))
print("\n--- Price Range Summary ---")
display(df_analysis[["price_range"]].describe().round(2))
print("\n--- Direction Distribution ---")
direction_map = {1: "Up", -1: "Down", 0: "Flat"}
print(df_analysis["direction"].map(direction_map).value_counts())
print("\n--- Schema Summary ---")
df_analysis.info()--- Return Summary ---
| simple_return | log_return | cumulative_return | |
|---|---|---|---|
| count | 9.000000 | 9.000000 | 10.000000 |
| mean | 0.000921 | 0.000918 | 0.004147 |
| std | 0.002491 | 0.002488 | 0.003587 |
| min | -0.001185 | -0.001186 | -0.001185 |
| 25% | -0.001179 | -0.001180 | 0.001481 |
| 50% | -0.001174 | -0.001174 | 0.004147 |
| 75% | 0.003542 | 0.003536 | 0.006813 |
| max | 0.003559 | 0.003552 | 0.009479 |
--- Price Range Summary ---
| price_range | |
|---|---|
| count | 10.0 |
| mean | 400.0 |
| std | 0.0 |
| min | 400.0 |
| 25% | 400.0 |
| 50% | 400.0 |
| 75% | 400.0 |
| max | 400.0 |
--- Direction Distribution --- direction Up 5 Down 5 Name: count, dtype: int64 --- Schema Summary --- <class 'pandas.core.frame.DataFrame'> RangeIndex: 10 entries, 0 to 9 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 datetime 10 non-null datetime64[ns, UTC] 1 open 10 non-null int64 2 high 10 non-null int64 3 low 10 non-null int64 4 close 10 non-null int64 5 volume 10 non-null float64 6 simple_return 9 non-null float64 7 log_return 9 non-null float64 8 cumulative_return 10 non-null float64 9 rolling_mean 6 non-null float64 10 price_range 10 non-null int64 11 direction 10 non-null int64 dtypes: datetime64[ns, UTC](1), float64(5), int64(6) memory usage: 1.1 KB
Code Logic
.describe(): Produces count, mean, standard deviation, min, 25th percentile, median, 75th percentile, and max for each numeric column — a complete statistical summary in one call..value_counts(): Counts the number of up, down, and flat candles — provides a quick read on the proportion of bullish versus bearish candles in the sample.
8. Visualizations
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(12, 6))
sns.lineplot(x='datetime', y='close', data=df_analysis, label='Close Price')
sns.lineplot(x='datetime', y='rolling_mean', data=df_analysis, label='Rolling Mean')
plt.title('Close Price and Rolling Mean Over Time')
plt.xlabel('Datetime')
plt.ylabel('Price')
plt.legend()
plt.grid(True)
plt.show()plt.figure(figsize=(12, 6))
sns.lineplot(x='datetime', y='simple_return', data=df_analysis, label='Simple Return')
sns.lineplot(x='datetime', y='log_return', data=df_analysis, label='Log Return')
plt.title('Simple and Log Returns Over Time')
plt.xlabel('Datetime')
plt.ylabel('Return')
plt.legend()
plt.grid(True)
plt.show()plt.figure(figsize=(12, 6))
sns.lineplot(x='datetime', y='cumulative_return', data=df_analysis, label='Cumulative Return', color='green')
plt.title('Cumulative Return Over Time')
plt.xlabel('Datetime')
plt.ylabel('Cumulative Return')
plt.legend()
plt.grid(True)
plt.show()plt.figure(figsize=(10, 5))
sns.barplot(x='datetime', y='price_range', data=df_analysis, color='purple')
plt.title('Price Range per Candle')
plt.xlabel('Datetime')
plt.ylabel('Price Range (High - Low)')
plt.xticks(rotation=45)
plt.grid(axis='y')
plt.tight_layout()
plt.show()plt.figure(figsize=(8, 5))
direction_counts = df_analysis['direction'].map({1: 'Up', -1: 'Down', 0: 'Flat'}).value_counts()
sns.barplot(x=direction_counts.index, y=direction_counts.values, palette='viridis')
plt.title('Distribution of Price Direction (Up/Down Candles)')
plt.xlabel('Direction')
plt.ylabel('Count')
plt.show()