Price Analysis Framework

This notebook defines a standardized protocol for analyzing price movements in OHLCV time-series data. It covers return calculation, trend detection, price range measurement, and summary statistics on a representative dummy dataset.

1. Dependency Installation

[9]

!pip install pandas numpy

Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (2.0.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2026.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)

2. Library Imports

[10]

import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np

Code Logic

pandas: Provides the DataFrame structure and vectorized price computation methods.
numpy: Supplies mathematical functions used in return and range calculations.

3. Key Concepts

Close Price The close price is the final traded price within a candle window. It is the most widely used price field in analysis because it represents the market's last agreed-upon value for that period — buyers and sellers both accepted this price at the end of the window.

Return A return measures how much the price changed between two periods, expressed as a percentage. A return of +2% means the price rose by 2% from the previous candle's close to the current candle's close. Returns are used instead of raw price differences because they are comparable across assets with different price levels — a $100 move in BTC (worth $42,000) and a $100 move in a $200 stock are very different in relative terms, but their returns express this clearly.

Simple Return Simple return = (current close − previous close) / previous close. This is the standard percentage change between two consecutive periods.

Log Return Log return = natural log(current close / previous close). Log returns are preferred in quantitative analysis because they are additive across time — the log return over two periods equals the sum of the individual period log returns. This property makes them easier to aggregate and model statistically.

Cumulative Return Cumulative return shows the total percentage gain or loss from the starting price to any given point in time. It answers the question: "If purchased at the first candle, what is the total profit or loss at this point?"

Rolling Mean (Moving Average) A rolling mean computes the average close price over a fixed window of recent candles. At each point in time, it looks back N candles and averages their close prices. It smooths out short-term price noise and reveals the underlying trend direction. A price consistently above its moving average indicates an uptrend; below indicates a downtrend.

Price Range The price range of a candle is the difference between its high and low: high − low. It measures how much the price moved within a single candle window — a large range indicates high activity or volatility in that period; a small range indicates a quiet period.

4. Dummy Dataset

[11]

raw_data = {
    "datetime": [
        "2024-01-01 00:00:00+00:00",
        "2024-01-01 00:01:00+00:00",
        "2024-01-01 00:02:00+00:00",
        "2024-01-01 00:03:00+00:00",
        "2024-01-01 00:04:00+00:00",
        "2024-01-01 00:05:00+00:00",
        "2024-01-01 00:06:00+00:00",
        "2024-01-01 00:07:00+00:00",
        "2024-01-01 00:08:00+00:00",
        "2024-01-01 00:09:00+00:00",
    ],
    "open":   [42100, 42200, 42150, 42300, 42250,
               42400, 42350, 42500, 42450, 42600],
    "high":   [42300, 42400, 42350, 42500, 42450,
               42600, 42550, 42700, 42650, 42800],
    "low":    [41900, 42000, 41950, 42100, 42050,
               42200, 42150, 42300, 42250, 42400],
    "close":  [42200, 42150, 42300, 42250, 42400,
               42350, 42500, 42450, 42600, 42550],
    "volume": [10.5, 8.2, 9.1, 11.3, 7.6,
               12.4, 6.8, 13.1, 9.9, 10.2],
}

df = pd.DataFrame(raw_data)
df["datetime"] = pd.to_datetime(df["datetime"], utc=True)

print("--- Raw OHLCV Data ---")
display(df)

--- Raw OHLCV Data ---

	datetime	open	high	low	close	volume
0	2024-01-01 00:00:00+00:00	42100	42300	41900	42200	10.5
1	2024-01-01 00:01:00+00:00	42200	42400	42000	42150	8.2
2	2024-01-01 00:02:00+00:00	42150	42350	41950	42300	9.1
3	2024-01-01 00:03:00+00:00	42300	42500	42100	42250	11.3
4	2024-01-01 00:04:00+00:00	42250	42450	42050	42400	7.6
5	2024-01-01 00:05:00+00:00	42400	42600	42200	42350	12.4
6	2024-01-01 00:06:00+00:00	42350	42550	42150	42500	6.8
7	2024-01-01 00:07:00+00:00	42500	42700	42300	42450	13.1
8	2024-01-01 00:08:00+00:00	42450	42650	42250	42600	9.9
9	2024-01-01 00:09:00+00:00	42600	42800	42400	42550	10.2

5. Price Analysis Function

[12]

def analyze_price(df: pd.DataFrame, rolling_window: int = 5) -> pd.DataFrame:
    """
    Compute price movement indicators from OHLCV data.

    Args:
        df             (pd.DataFrame): Cleaned OHLCV DataFrame with UTC datetime column.
        rolling_window (int):          Number of candles for rolling mean calculation.

    Returns:
        pd.DataFrame: Input DataFrame extended with computed price metrics.
    """
    df = df.copy().sort_values("datetime", ignore_index=True)

    # Simple return: percentage change from previous close to current close
    df["simple_return"]     = df["close"].pct_change()

    # Log return: natural log of the price ratio between consecutive closes
    df["log_return"]        = np.log(df["close"] / df["close"].shift(1))

    # Cumulative return: total percentage change from the first close price
    df["cumulative_return"] = (df["close"] / df["close"].iloc[0]) - 1

    # Rolling mean: average close price over the last N candles
    df["rolling_mean"]      = df["close"].rolling(window=rolling_window).mean()

    # Price range: distance between high and low within each candle
    df["price_range"]       = df["high"] - df["low"]

    # Price direction: +1 if close is higher than open (up candle),
    #                 -1 if close is lower than open (down candle),
    #                  0 if unchanged
    df["direction"]         = np.sign(df["close"] - df["open"]).astype(int)

    return df[[
        "datetime", "open", "high", "low", "close", "volume",
        "simple_return", "log_return", "cumulative_return",
        "rolling_mean", "price_range", "direction",
    ]]

Code Logic

Simple return

df["close"].pct_change(): Computes (current − previous) / previous for each row. The first row produces NaN because no prior close exists. A positive value indicates a price increase; negative indicates a decrease.

Log return

np.log(df["close"] / df["close"].shift(1)): Divides each close by the previous close (shift(1) moves the column down by one row) and takes the natural logarithm. The first row produces NaN. Log returns are numerically close to simple returns for small price changes but become more accurate for large moves and are mathematically easier to aggregate across time.

Cumulative return

df["close"] / df["close"].iloc[0]) - 1: Divides every close price by the very first close price in the series (iloc[0]). A value of 0.05 means the price is 5% above where it started; −0.03 means 3% below.

Rolling mean

df["close"].rolling(window=rolling_window).mean(): At each row, computes the average of the current and the previous rolling_window − 1 close prices. The first rolling_window − 1 rows produce NaN because insufficient prior data exists to fill the window.

Price range

df["high"] - df["low"]: The difference between the highest and lowest price within each candle. A large value indicates significant price movement within the period; a small value indicates a quiet, consolidating period.

Direction

np.sign(df["close"] - df["open"]): Returns +1 when close is above open (buyers dominated the candle), −1 when close is below open (sellers dominated), and 0 when they are equal. This is the standard definition of a bullish (+1) or bearish (−1) candle.

6. Execution

[13]

ROLLING_WINDOW = 5

df_analysis = analyze_price(df, rolling_window=ROLLING_WINDOW)

print("--- Price Analysis Output ---")
display(df_analysis)

--- Price Analysis Output ---

	datetime	open	high	low	close	volume	simple_return	log_return	cumulative_return	rolling_mean	price_range	direction
0	2024-01-01 00:00:00+00:00	42100	42300	41900	42200	10.5	NaN	NaN	0.000000	NaN	400	1
1	2024-01-01 00:01:00+00:00	42200	42400	42000	42150	8.2	-0.001185	-0.001186	-0.001185	NaN	400	-1
2	2024-01-01 00:02:00+00:00	42150	42350	41950	42300	9.1	0.003559	0.003552	0.002370	NaN	400	1
3	2024-01-01 00:03:00+00:00	42300	42500	42100	42250	11.3	-0.001182	-0.001183	0.001185	NaN	400	-1
4	2024-01-01 00:04:00+00:00	42250	42450	42050	42400	7.6	0.003550	0.003544	0.004739	42260.0	400	1
5	2024-01-01 00:05:00+00:00	42400	42600	42200	42350	12.4	-0.001179	-0.001180	0.003555	42290.0	400	-1
6	2024-01-01 00:06:00+00:00	42350	42550	42150	42500	6.8	0.003542	0.003536	0.007109	42360.0	400	1
7	2024-01-01 00:07:00+00:00	42500	42700	42300	42450	13.1	-0.001176	-0.001177	0.005924	42390.0	400	-1
8	2024-01-01 00:08:00+00:00	42450	42650	42250	42600	9.9	0.003534	0.003527	0.009479	42460.0	400	1
9	2024-01-01 00:09:00+00:00	42600	42800	42400	42550	10.2	-0.001174	-0.001174	0.008294	42490.0	400	-1

7. Summary Statistics

[14]

print("--- Return Summary ---")
display(df_analysis[["simple_return", "log_return", "cumulative_return"]].describe().round(6))

print("\n--- Price Range Summary ---")
display(df_analysis[["price_range"]].describe().round(2))

print("\n--- Direction Distribution ---")
direction_map = {1: "Up", -1: "Down", 0: "Flat"}
print(df_analysis["direction"].map(direction_map).value_counts())

print("\n--- Schema Summary ---")
df_analysis.info()

--- Return Summary ---

	simple_return	log_return	cumulative_return
count	9.000000	9.000000	10.000000
mean	0.000921	0.000918	0.004147
std	0.002491	0.002488	0.003587
min	-0.001185	-0.001186	-0.001185
25%	-0.001179	-0.001180	0.001481
50%	-0.001174	-0.001174	0.004147
75%	0.003542	0.003536	0.006813
max	0.003559	0.003552	0.009479


--- Price Range Summary ---

	price_range
count	10.0
mean	400.0
std	0.0
min	400.0
25%	400.0
50%	400.0
75%	400.0
max	400.0


--- Direction Distribution ---
direction
Up      5
Down    5
Name: count, dtype: int64

--- Schema Summary ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype              
---  ------             --------------  -----              
 0   datetime           10 non-null     datetime64[ns, UTC]
 1   open               10 non-null     int64              
 2   high               10 non-null     int64              
 3   low                10 non-null     int64              
 4   close              10 non-null     int64              
 5   volume             10 non-null     float64            
 6   simple_return      9 non-null      float64            
 7   log_return         9 non-null      float64            
 8   cumulative_return  10 non-null     float64            
 9   rolling_mean       6 non-null      float64            
 10  price_range        10 non-null     int64              
 11  direction          10 non-null     int64              
dtypes: datetime64[ns, UTC](1), float64(5), int64(6)
memory usage: 1.1 KB

Code Logic

.describe(): Produces count, mean, standard deviation, min, 25th percentile, median, 75th percentile, and max for each numeric column — a complete statistical summary in one call.
.value_counts(): Counts the number of up, down, and flat candles — provides a quick read on the proportion of bullish versus bearish candles in the sample.

8. Visualizations

[15]

import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(12, 6))
sns.lineplot(x='datetime', y='close', data=df_analysis, label='Close Price')
sns.lineplot(x='datetime', y='rolling_mean', data=df_analysis, label='Rolling Mean')
plt.title('Close Price and Rolling Mean Over Time')
plt.xlabel('Datetime')
plt.ylabel('Price')
plt.legend()
plt.grid(True)
plt.show()

[16]

plt.figure(figsize=(12, 6))
sns.lineplot(x='datetime', y='simple_return', data=df_analysis, label='Simple Return')
sns.lineplot(x='datetime', y='log_return', data=df_analysis, label='Log Return')
plt.title('Simple and Log Returns Over Time')
plt.xlabel('Datetime')
plt.ylabel('Return')
plt.legend()
plt.grid(True)
plt.show()

[17]

plt.figure(figsize=(12, 6))
sns.lineplot(x='datetime', y='cumulative_return', data=df_analysis, label='Cumulative Return', color='green')
plt.title('Cumulative Return Over Time')
plt.xlabel('Datetime')
plt.ylabel('Cumulative Return')
plt.legend()
plt.grid(True)
plt.show()

[18]

plt.figure(figsize=(10, 5))
sns.barplot(x='datetime', y='price_range', data=df_analysis, color='purple')
plt.title('Price Range per Candle')
plt.xlabel('Datetime')
plt.ylabel('Price Range (High - Low)')
plt.xticks(rotation=45)
plt.grid(axis='y')
plt.tight_layout()
plt.show()

[19]

plt.figure(figsize=(8, 5))
direction_counts = df_analysis['direction'].map({1: 'Up', -1: 'Down', 0: 'Flat'}).value_counts()
sns.barplot(x=direction_counts.index, y=direction_counts.values, palette='viridis')
plt.title('Distribution of Price Direction (Up/Down Candles)')
plt.xlabel('Direction')
plt.ylabel('Count')
plt.show()

[14]