Data·Analysis·Beginner

Price Analysis

Exploratory price analysis: return distributions, autocorrelation, fat tails, and stationarity tests on OHLCV data.

returnsstatisticsEDA

Price Analysis Framework

This notebook defines a standardized protocol for analyzing price movements in OHLCV time-series data. It covers return calculation, trend detection, price range measurement, and summary statistics on a representative dummy dataset.


1. Dependency Installation

[9]
!pip install pandas numpy
Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (2.0.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2026.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)

2. Library Imports

[10]
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np

Code Logic

  • pandas: Provides the DataFrame structure and vectorized price computation methods.
  • numpy: Supplies mathematical functions used in return and range calculations.

3. Key Concepts

Close Price The close price is the final traded price within a candle window. It is the most widely used price field in analysis because it represents the market's last agreed-upon value for that period — buyers and sellers both accepted this price at the end of the window.

Return A return measures how much the price changed between two periods, expressed as a percentage. A return of +2% means the price rose by 2% from the previous candle's close to the current candle's close. Returns are used instead of raw price differences because they are comparable across assets with different price levels — a $100 move in BTC (worth $42,000) and a $100 move in a $200 stock are very different in relative terms, but their returns express this clearly.

Simple Return Simple return = (current close − previous close) / previous close. This is the standard percentage change between two consecutive periods.

Log Return Log return = natural log(current close / previous close). Log returns are preferred in quantitative analysis because they are additive across time — the log return over two periods equals the sum of the individual period log returns. This property makes them easier to aggregate and model statistically.

Cumulative Return Cumulative return shows the total percentage gain or loss from the starting price to any given point in time. It answers the question: "If purchased at the first candle, what is the total profit or loss at this point?"

Rolling Mean (Moving Average) A rolling mean computes the average close price over a fixed window of recent candles. At each point in time, it looks back N candles and averages their close prices. It smooths out short-term price noise and reveals the underlying trend direction. A price consistently above its moving average indicates an uptrend; below indicates a downtrend.

Price Range The price range of a candle is the difference between its high and low: high − low. It measures how much the price moved within a single candle window — a large range indicates high activity or volatility in that period; a small range indicates a quiet period.


4. Dummy Dataset

[11]
raw_data = {
    "datetime": [
        "2024-01-01 00:00:00+00:00",
        "2024-01-01 00:01:00+00:00",
        "2024-01-01 00:02:00+00:00",
        "2024-01-01 00:03:00+00:00",
        "2024-01-01 00:04:00+00:00",
        "2024-01-01 00:05:00+00:00",
        "2024-01-01 00:06:00+00:00",
        "2024-01-01 00:07:00+00:00",
        "2024-01-01 00:08:00+00:00",
        "2024-01-01 00:09:00+00:00",
    ],
    "open":   [42100, 42200, 42150, 42300, 42250,
               42400, 42350, 42500, 42450, 42600],
    "high":   [42300, 42400, 42350, 42500, 42450,
               42600, 42550, 42700, 42650, 42800],
    "low":    [41900, 42000, 41950, 42100, 42050,
               42200, 42150, 42300, 42250, 42400],
    "close":  [42200, 42150, 42300, 42250, 42400,
               42350, 42500, 42450, 42600, 42550],
    "volume": [10.5, 8.2, 9.1, 11.3, 7.6,
               12.4, 6.8, 13.1, 9.9, 10.2],
}

df = pd.DataFrame(raw_data)
df["datetime"] = pd.to_datetime(df["datetime"], utc=True)

print("--- Raw OHLCV Data ---")
display(df)
--- Raw OHLCV Data ---
datetime open high low close volume
0 2024-01-01 00:00:00+00:00 42100 42300 41900 42200 10.5
1 2024-01-01 00:01:00+00:00 42200 42400 42000 42150 8.2
2 2024-01-01 00:02:00+00:00 42150 42350 41950 42300 9.1
3 2024-01-01 00:03:00+00:00 42300 42500 42100 42250 11.3
4 2024-01-01 00:04:00+00:00 42250 42450 42050 42400 7.6
5 2024-01-01 00:05:00+00:00 42400 42600 42200 42350 12.4
6 2024-01-01 00:06:00+00:00 42350 42550 42150 42500 6.8
7 2024-01-01 00:07:00+00:00 42500 42700 42300 42450 13.1
8 2024-01-01 00:08:00+00:00 42450 42650 42250 42600 9.9
9 2024-01-01 00:09:00+00:00 42600 42800 42400 42550 10.2

5. Price Analysis Function

[12]
def analyze_price(df: pd.DataFrame, rolling_window: int = 5) -> pd.DataFrame:
    """
    Compute price movement indicators from OHLCV data.

    Args:
        df             (pd.DataFrame): Cleaned OHLCV DataFrame with UTC datetime column.
        rolling_window (int):          Number of candles for rolling mean calculation.

    Returns:
        pd.DataFrame: Input DataFrame extended with computed price metrics.
    """
    df = df.copy().sort_values("datetime", ignore_index=True)

    # Simple return: percentage change from previous close to current close
    df["simple_return"]     = df["close"].pct_change()

    # Log return: natural log of the price ratio between consecutive closes
    df["log_return"]        = np.log(df["close"] / df["close"].shift(1))

    # Cumulative return: total percentage change from the first close price
    df["cumulative_return"] = (df["close"] / df["close"].iloc[0]) - 1

    # Rolling mean: average close price over the last N candles
    df["rolling_mean"]      = df["close"].rolling(window=rolling_window).mean()

    # Price range: distance between high and low within each candle
    df["price_range"]       = df["high"] - df["low"]

    # Price direction: +1 if close is higher than open (up candle),
    #                 -1 if close is lower than open (down candle),
    #                  0 if unchanged
    df["direction"]         = np.sign(df["close"] - df["open"]).astype(int)

    return df[[
        "datetime", "open", "high", "low", "close", "volume",
        "simple_return", "log_return", "cumulative_return",
        "rolling_mean", "price_range", "direction",
    ]]

Code Logic

Simple return

  • df["close"].pct_change(): Computes (current − previous) / previous for each row. The first row produces NaN because no prior close exists. A positive value indicates a price increase; negative indicates a decrease.

Log return

  • np.log(df["close"] / df["close"].shift(1)): Divides each close by the previous close (shift(1) moves the column down by one row) and takes the natural logarithm. The first row produces NaN. Log returns are numerically close to simple returns for small price changes but become more accurate for large moves and are mathematically easier to aggregate across time.

Cumulative return

  • df["close"] / df["close"].iloc[0]) - 1: Divides every close price by the very first close price in the series (iloc[0]). A value of 0.05 means the price is 5% above where it started; −0.03 means 3% below.

Rolling mean

  • df["close"].rolling(window=rolling_window).mean(): At each row, computes the average of the current and the previous rolling_window − 1 close prices. The first rolling_window − 1 rows produce NaN because insufficient prior data exists to fill the window.

Price range

  • df["high"] - df["low"]: The difference between the highest and lowest price within each candle. A large value indicates significant price movement within the period; a small value indicates a quiet, consolidating period.

Direction

  • np.sign(df["close"] - df["open"]): Returns +1 when close is above open (buyers dominated the candle), −1 when close is below open (sellers dominated), and 0 when they are equal. This is the standard definition of a bullish (+1) or bearish (−1) candle.

6. Execution

[13]
ROLLING_WINDOW = 5

df_analysis = analyze_price(df, rolling_window=ROLLING_WINDOW)

print("--- Price Analysis Output ---")
display(df_analysis)
--- Price Analysis Output ---
datetime open high low close volume simple_return log_return cumulative_return rolling_mean price_range direction
0 2024-01-01 00:00:00+00:00 42100 42300 41900 42200 10.5 NaN NaN 0.000000 NaN 400 1
1 2024-01-01 00:01:00+00:00 42200 42400 42000 42150 8.2 -0.001185 -0.001186 -0.001185 NaN 400 -1
2 2024-01-01 00:02:00+00:00 42150 42350 41950 42300 9.1 0.003559 0.003552 0.002370 NaN 400 1
3 2024-01-01 00:03:00+00:00 42300 42500 42100 42250 11.3 -0.001182 -0.001183 0.001185 NaN 400 -1
4 2024-01-01 00:04:00+00:00 42250 42450 42050 42400 7.6 0.003550 0.003544 0.004739 42260.0 400 1
5 2024-01-01 00:05:00+00:00 42400 42600 42200 42350 12.4 -0.001179 -0.001180 0.003555 42290.0 400 -1
6 2024-01-01 00:06:00+00:00 42350 42550 42150 42500 6.8 0.003542 0.003536 0.007109 42360.0 400 1
7 2024-01-01 00:07:00+00:00 42500 42700 42300 42450 13.1 -0.001176 -0.001177 0.005924 42390.0 400 -1
8 2024-01-01 00:08:00+00:00 42450 42650 42250 42600 9.9 0.003534 0.003527 0.009479 42460.0 400 1
9 2024-01-01 00:09:00+00:00 42600 42800 42400 42550 10.2 -0.001174 -0.001174 0.008294 42490.0 400 -1

7. Summary Statistics

[14]
print("--- Return Summary ---")
display(df_analysis[["simple_return", "log_return", "cumulative_return"]].describe().round(6))

print("\n--- Price Range Summary ---")
display(df_analysis[["price_range"]].describe().round(2))

print("\n--- Direction Distribution ---")
direction_map = {1: "Up", -1: "Down", 0: "Flat"}
print(df_analysis["direction"].map(direction_map).value_counts())

print("\n--- Schema Summary ---")
df_analysis.info()
--- Return Summary ---
simple_return log_return cumulative_return
count 9.000000 9.000000 10.000000
mean 0.000921 0.000918 0.004147
std 0.002491 0.002488 0.003587
min -0.001185 -0.001186 -0.001185
25% -0.001179 -0.001180 0.001481
50% -0.001174 -0.001174 0.004147
75% 0.003542 0.003536 0.006813
max 0.003559 0.003552 0.009479

--- Price Range Summary ---
price_range
count 10.0
mean 400.0
std 0.0
min 400.0
25% 400.0
50% 400.0
75% 400.0
max 400.0

--- Direction Distribution ---
direction
Up      5
Down    5
Name: count, dtype: int64

--- Schema Summary ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype              
---  ------             --------------  -----              
 0   datetime           10 non-null     datetime64[ns, UTC]
 1   open               10 non-null     int64              
 2   high               10 non-null     int64              
 3   low                10 non-null     int64              
 4   close              10 non-null     int64              
 5   volume             10 non-null     float64            
 6   simple_return      9 non-null      float64            
 7   log_return         9 non-null      float64            
 8   cumulative_return  10 non-null     float64            
 9   rolling_mean       6 non-null      float64            
 10  price_range        10 non-null     int64              
 11  direction          10 non-null     int64              
dtypes: datetime64[ns, UTC](1), float64(5), int64(6)
memory usage: 1.1 KB

Code Logic

  • .describe(): Produces count, mean, standard deviation, min, 25th percentile, median, 75th percentile, and max for each numeric column — a complete statistical summary in one call.
  • .value_counts(): Counts the number of up, down, and flat candles — provides a quick read on the proportion of bullish versus bearish candles in the sample.

8. Visualizations

[15]
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(12, 6))
sns.lineplot(x='datetime', y='close', data=df_analysis, label='Close Price')
sns.lineplot(x='datetime', y='rolling_mean', data=df_analysis, label='Rolling Mean')
plt.title('Close Price and Rolling Mean Over Time')
plt.xlabel('Datetime')
plt.ylabel('Price')
plt.legend()
plt.grid(True)
plt.show()
cell output
[16]
plt.figure(figsize=(12, 6))
sns.lineplot(x='datetime', y='simple_return', data=df_analysis, label='Simple Return')
sns.lineplot(x='datetime', y='log_return', data=df_analysis, label='Log Return')
plt.title('Simple and Log Returns Over Time')
plt.xlabel('Datetime')
plt.ylabel('Return')
plt.legend()
plt.grid(True)
plt.show()
cell output
[17]
plt.figure(figsize=(12, 6))
sns.lineplot(x='datetime', y='cumulative_return', data=df_analysis, label='Cumulative Return', color='green')
plt.title('Cumulative Return Over Time')
plt.xlabel('Datetime')
plt.ylabel('Cumulative Return')
plt.legend()
plt.grid(True)
plt.show()
cell output
[18]
plt.figure(figsize=(10, 5))
sns.barplot(x='datetime', y='price_range', data=df_analysis, color='purple')
plt.title('Price Range per Candle')
plt.xlabel('Datetime')
plt.ylabel('Price Range (High - Low)')
plt.xticks(rotation=45)
plt.grid(axis='y')
plt.tight_layout()
plt.show()
cell output
[19]
plt.figure(figsize=(8, 5))
direction_counts = df_analysis['direction'].map({1: 'Up', -1: 'Down', 0: 'Flat'}).value_counts()
sns.barplot(x=direction_counts.index, y=direction_counts.values, palette='viridis')
plt.title('Distribution of Price Direction (Up/Down Candles)')
plt.xlabel('Direction')
plt.ylabel('Count')
plt.show()
cell output
[14]