Signals·ML Models·Advanced

LSTM Price Prediction

Build and train an LSTM neural network on OHLCV sequences for price direction prediction — includes feature scaling and walk-forward validation.

LSTMdeep learningprediction
[3]

Signals — LSTM for Price Prediction


1. Dependency Installation

[4]
!pip install pandas numpy plotly scikit-learn tensorflow
Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (2.0.2)
Requirement already satisfied: plotly in /usr/local/lib/python3.12/dist-packages (5.24.1)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.12/dist-packages (1.6.1)
Requirement already satisfied: tensorflow in /usr/local/lib/python3.12/dist-packages (2.20.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2026.1)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.12/dist-packages (from plotly) (9.1.4)
Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from plotly) (26.1)
Requirement already satisfied: scipy>=1.6.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn) (1.16.3)
Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn) (1.5.3)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn) (3.6.0)
Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (1.4.0)
Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (1.6.3)
Requirement already satisfied: flatbuffers>=24.3.25 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (25.12.19)
Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (0.7.0)
Requirement already satisfied: google_pasta>=0.1.1 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (0.2.0)
Requirement already satisfied: libclang>=13.0.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (18.1.1)
Requirement already satisfied: opt_einsum>=2.3.2 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (3.4.0)
Requirement already satisfied: protobuf>=5.28.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (5.29.6)
Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (2.32.4)
Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from tensorflow) (75.2.0)
Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (1.17.0)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (3.3.0)
Requirement already satisfied: typing_extensions>=3.6.6 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (4.15.0)
Requirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (2.1.2)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (1.80.0)
Requirement already satisfied: tensorboard~=2.20.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (2.20.0)
Requirement already satisfied: keras>=3.10.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (3.13.2)
Requirement already satisfied: h5py>=3.11.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (3.16.0)
Requirement already satisfied: ml_dtypes<1.0.0,>=0.5.1 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (0.5.4)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.12/dist-packages (from astunparse>=1.6.0->tensorflow) (0.47.0)
Requirement already satisfied: rich in /usr/local/lib/python3.12/dist-packages (from keras>=3.10.0->tensorflow) (13.9.4)
Requirement already satisfied: namex in /usr/local/lib/python3.12/dist-packages (from keras>=3.10.0->tensorflow) (0.1.0)
Requirement already satisfied: optree in /usr/local/lib/python3.12/dist-packages (from keras>=3.10.0->tensorflow) (0.19.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests<3,>=2.21.0->tensorflow) (3.4.7)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests<3,>=2.21.0->tensorflow) (3.13)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests<3,>=2.21.0->tensorflow) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests<3,>=2.21.0->tensorflow) (2026.4.22)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.12/dist-packages (from tensorboard~=2.20.0->tensorflow) (3.10.2)
Requirement already satisfied: pillow in /usr/local/lib/python3.12/dist-packages (from tensorboard~=2.20.0->tensorflow) (11.3.0)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.12/dist-packages (from tensorboard~=2.20.0->tensorflow) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from tensorboard~=2.20.0->tensorflow) (3.1.8)
Requirement already satisfied: markupsafe>=2.1.1 in /usr/local/lib/python3.12/dist-packages (from werkzeug>=1.0.1->tensorboard~=2.20.0->tensorflow) (3.0.3)
Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich->keras>=3.10.0->tensorflow) (4.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from rich->keras>=3.10.0->tensorflow) (2.20.0)
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich->keras>=3.10.0->tensorflow) (0.1.2)

2. Library Imports

[5]
import warnings; warnings.filterwarnings("ignore")
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

3. Strategy Overview

Long Short-Term Memory (LSTM) networks are a class of recurrent neural networks (RNN) designed to capture temporal dependencies in sequential data. They are well-suited for price time-series because they maintain a cell state that can retain information over hundreds of timesteps.

Architecture:

  • Input layer: sliding window of seq_length bars × feature count.
  • LSTM layers with dropout regularisation to prevent overfitting.
  • Dense output layer: single neuron predicting the next bar's normalised close price.

Training procedure:

  1. Normalise close prices to [0, 1] using MinMaxScaler.
  2. Construct overlapping sequences of length seq_length.
  3. Train with early stopping on validation loss.
  4. Inverse-transform predictions to original price scale.

Signal derivation: Predicted close > current close → Buy (+1); < current close → Sell (−1).

Limitation: LSTMs require GPU resources for large datasets and hyperparameter tuning; on CPU they are slow. Random-walk synthetic data has minimal autocorrelation, so validation loss will be near-random.

4. Data Generation

[6]
def generate_data(periods: int) -> pd.DataFrame:
    """
    Generate synthetic OHLCV price data using a geometric random walk.

    Parameters
    ----------
    periods : int
        Number of 1-minute bars to generate.

    Returns
    -------
    pd.DataFrame
        DataFrame with columns: open, high, low, close, volume, datetime.
    """
    start_date     = pd.to_datetime("2024-01-01 00:00:00+00:00")
    datetime_index = pd.date_range(start_date, periods=periods, freq="1min", tz="UTC")
    price_data = []
    last_close = 42000
    for i in range(periods):
        open_price  = last_close + np.random.normal(0, last_close * 0.0005)
        close_price = open_price + np.random.normal(0, last_close * 0.005)
        body_high   = max(open_price, close_price)
        body_low    = min(open_price, close_price)
        high_price  = max(body_high + abs(np.random.normal(0, last_close * 0.002)), open_price, close_price)
        low_price   = min(body_low  - abs(np.random.normal(0, last_close * 0.002)), open_price, close_price)
        if high_price < low_price:
            high_price, low_price = low_price, high_price
        price_data.append({
            "open":  max(1, int(open_price)),
            "high":  max(1, int(high_price)),
            "low":   max(1, int(low_price)),
            "close": max(1, int(close_price)),
        })
        last_close = close_price
    df = pd.DataFrame(price_data, index=datetime_index)
    df.index.name = "datetime"
    df["volume"]   = np.random.uniform(100.0, 500.0, periods)
    df["datetime"] = df.index.to_series()
    return df.reset_index(drop=True)

df = generate_data(500)
display(df.head())
open high low close volume datetime
0 42016 42052 41727 41781 107.627914 2024-01-01 00:00:00+00:00
1 41767 41866 41675 41691 410.389430 2024-01-01 00:01:00+00:00
2 41701 41706 41269 41349 428.486384 2024-01-01 00:02:00+00:00
3 41340 41350 41182 41185 444.410807 2024-01-01 00:03:00+00:00
4 41190 41467 41075 41440 262.143437 2024-01-01 00:04:00+00:00

5. LSTM Model

[7]
def lstm_model(
    df: pd.DataFrame,
    seq_length: int = 30,
    epochs: int = 50,
    batch_size: int = 32,
    test_size: float = 0.2,
) -> tuple:
    """
    Build, train, and evaluate an LSTM model for next-bar close price prediction.

    Core logic
    ----------
    1. Extract and normalise the close price series with MinMaxScaler.
    2. Construct (X, y) pairs: X = sliding window of seq_length bars,
       y = the close price of the bar immediately following the window.
    3. Split chronologically into train/test sets.
    4. Define a two-layer LSTM with dropout, compiled with Adam and MSE loss.
    5. Train with early stopping (monitor val_loss, patience=10).
    6. Inverse-transform predictions and compute error metrics.

    Parameters
    ----------
    df : pd.DataFrame   OHLCV DataFrame with 'close' column.
    seq_length : int    Number of historical bars per input sequence.
    epochs : int        Maximum training epochs.
    batch_size : int    Mini-batch size.
    test_size : float   Fraction of data reserved for testing.

    Returns
    -------
    tuple
        (model, predictions, y_test_inv, scaler, history)
    """
    df = df.copy().sort_values("datetime", ignore_index=True)
    close = df["close"].values.reshape(-1, 1)

    # ── Normalisation ─────────────────────────────────────────────────────────
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled = scaler.fit_transform(close)

    # ── Sequence construction ─────────────────────────────────────────────────
    X, y = [], []
    for i in range(seq_length, len(scaled)):
        X.append(scaled[i - seq_length: i, 0])   # lookback window
        y.append(scaled[i, 0])                    # next bar target
    X, y = np.array(X), np.array(y)
    X = X.reshape(X.shape[0], X.shape[1], 1)      # (samples, timesteps, features)

    # ── Train/test split ──────────────────────────────────────────────────────
    split   = int(len(X) * (1 - test_size))
    X_train, X_test = X[:split], X[split:]
    y_train, y_test = y[:split], y[split:]

    # ── Model architecture ────────────────────────────────────────────────────
    model = Sequential([
        LSTM(64, return_sequences=True, input_shape=(seq_length, 1)),
        Dropout(0.2),
        LSTM(32, return_sequences=False),
        Dropout(0.2),
        Dense(1),
    ])
    model.compile(optimizer="adam", loss="mse")
    model.summary()

    # ── Training ──────────────────────────────────────────────────────────────
    es = EarlyStopping(monitor="val_loss", patience=10, restore_best_weights=True)
    history = model.fit(
        X_train, y_train,
        validation_split=0.1,
        epochs=epochs,
        batch_size=batch_size,
        callbacks=[es],
        verbose=1,
    )

    # ── Inference and inverse transform ──────────────────────────────────────
    preds    = model.predict(X_test)
    preds_inv = scaler.inverse_transform(preds)
    y_inv     = scaler.inverse_transform(y_test.reshape(-1, 1))

    rmse = np.sqrt(mean_squared_error(y_inv, preds_inv))
    mae  = mean_absolute_error(y_inv, preds_inv)
    print(f"\nTest RMSE: {rmse:.2f}  |  MAE: {mae:.2f}")

    return model, preds_inv, y_inv, scaler, history

model, preds, actuals, scaler, history = lstm_model(df, seq_length=30, epochs=30)
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ lstm (LSTM)                     │ (None, 30, 64)         │        16,896 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 30, 64)         │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_1 (LSTM)                   │ (None, 32)             │        12,416 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_1 (Dropout)             │ (None, 32)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 1)              │            33 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 29,345 (114.63 KB)
 Trainable params: 29,345 (114.63 KB)
 Non-trainable params: 0 (0.00 B)
Epoch 1/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 9s 235ms/step - loss: 0.0330 - val_loss: 0.0717
Epoch 2/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 2s 140ms/step - loss: 0.0103 - val_loss: 0.1011
Epoch 3/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 1s 102ms/step - loss: 0.0071 - val_loss: 0.0639
Epoch 4/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 1s 84ms/step - loss: 0.0066 - val_loss: 0.0774
Epoch 5/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 1s 104ms/step - loss: 0.0066 - val_loss: 0.0611
Epoch 6/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 1s 33ms/step - loss: 0.0067 - val_loss: 0.0528
Epoch 7/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step - loss: 0.0058 - val_loss: 0.0492
Epoch 8/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 34ms/step - loss: 0.0061 - val_loss: 0.0388
Epoch 9/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - loss: 0.0062 - val_loss: 0.0537
Epoch 10/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step - loss: 0.0057 - val_loss: 0.0346
Epoch 11/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - loss: 0.0052 - val_loss: 0.0291
Epoch 12/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step - loss: 0.0059 - val_loss: 0.0341
Epoch 13/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 1s 33ms/step - loss: 0.0049 - val_loss: 0.0363
Epoch 14/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 38ms/step - loss: 0.0052 - val_loss: 0.0226
Epoch 15/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 34ms/step - loss: 0.0053 - val_loss: 0.0214
Epoch 16/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - loss: 0.0056 - val_loss: 0.0194
Epoch 17/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step - loss: 0.0052 - val_loss: 0.0433
Epoch 18/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - loss: 0.0056 - val_loss: 0.0366
Epoch 19/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - loss: 0.0051 - val_loss: 0.0167
Epoch 20/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - loss: 0.0042 - val_loss: 0.0198
Epoch 21/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 32ms/step - loss: 0.0045 - val_loss: 0.0314
Epoch 22/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step - loss: 0.0045 - val_loss: 0.0203
Epoch 23/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 1s 55ms/step - loss: 0.0043 - val_loss: 0.0129
Epoch 24/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 1s 57ms/step - loss: 0.0041 - val_loss: 0.0166
Epoch 25/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 1s 57ms/step - loss: 0.0045 - val_loss: 0.0175
Epoch 26/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 34ms/step - loss: 0.0044 - val_loss: 0.0164
Epoch 27/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step - loss: 0.0044 - val_loss: 0.0157
Epoch 28/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - loss: 0.0040 - val_loss: 0.0087
Epoch 29/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 32ms/step - loss: 0.0041 - val_loss: 0.0121
Epoch 30/30
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - loss: 0.0037 - val_loss: 0.0116
3/3 ━━━━━━━━━━━━━━━━━━━━ 1s 152ms/step

Test RMSE: 453.48  |  MAE: 365.23

6. Visualization — Training Loss and Predictions

[8]
fig = make_subplots(rows=1, cols=2,
    subplot_titles=["Training / Validation Loss", "Predicted vs Actual Close"])

fig.add_trace(go.Scatter(y=history.history["loss"],     name="Train Loss",
    line=dict(color="blue")),  row=1, col=1)
fig.add_trace(go.Scatter(y=history.history["val_loss"], name="Val Loss",
    line=dict(color="orange")), row=1, col=1)

fig.add_trace(go.Scatter(y=actuals[:, 0], name="Actual",    line=dict(color="blue")),  row=1, col=2)
fig.add_trace(go.Scatter(y=preds[:, 0],   name="Predicted", line=dict(color="red",  dash="dash")), row=1, col=2)

fig.update_layout(title_text="LSTM Price Prediction", height=500)
fig.show()