LSTM Price Prediction
Build and train an LSTM neural network on OHLCV sequences for price direction prediction — includes feature scaling and walk-forward validation.
Signals — LSTM for Price Prediction
1. Dependency Installation
!pip install pandas numpy plotly scikit-learn tensorflowRequirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2) Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (2.0.2) Requirement already satisfied: plotly in /usr/local/lib/python3.12/dist-packages (5.24.1) Requirement already satisfied: scikit-learn in /usr/local/lib/python3.12/dist-packages (1.6.1) Requirement already satisfied: tensorflow in /usr/local/lib/python3.12/dist-packages (2.20.0) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2026.1) Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.12/dist-packages (from plotly) (9.1.4) Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from plotly) (26.1) Requirement already satisfied: scipy>=1.6.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn) (1.16.3) Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn) (1.5.3) Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn) (3.6.0) Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (1.4.0) Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (1.6.3) Requirement already satisfied: flatbuffers>=24.3.25 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (25.12.19) Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (0.7.0) Requirement already satisfied: google_pasta>=0.1.1 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (0.2.0) Requirement already satisfied: libclang>=13.0.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (18.1.1) Requirement already satisfied: opt_einsum>=2.3.2 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (3.4.0) Requirement already satisfied: protobuf>=5.28.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (5.29.6) Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (2.32.4) Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from tensorflow) (75.2.0) Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (1.17.0) Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (3.3.0) Requirement already satisfied: typing_extensions>=3.6.6 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (4.15.0) Requirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (2.1.2) Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (1.80.0) Requirement already satisfied: tensorboard~=2.20.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (2.20.0) Requirement already satisfied: keras>=3.10.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (3.13.2) Requirement already satisfied: h5py>=3.11.0 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (3.16.0) Requirement already satisfied: ml_dtypes<1.0.0,>=0.5.1 in /usr/local/lib/python3.12/dist-packages (from tensorflow) (0.5.4) Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.12/dist-packages (from astunparse>=1.6.0->tensorflow) (0.47.0) Requirement already satisfied: rich in /usr/local/lib/python3.12/dist-packages (from keras>=3.10.0->tensorflow) (13.9.4) Requirement already satisfied: namex in /usr/local/lib/python3.12/dist-packages (from keras>=3.10.0->tensorflow) (0.1.0) Requirement already satisfied: optree in /usr/local/lib/python3.12/dist-packages (from keras>=3.10.0->tensorflow) (0.19.0) Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests<3,>=2.21.0->tensorflow) (3.4.7) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests<3,>=2.21.0->tensorflow) (3.13) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests<3,>=2.21.0->tensorflow) (2.5.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests<3,>=2.21.0->tensorflow) (2026.4.22) Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.12/dist-packages (from tensorboard~=2.20.0->tensorflow) (3.10.2) Requirement already satisfied: pillow in /usr/local/lib/python3.12/dist-packages (from tensorboard~=2.20.0->tensorflow) (11.3.0) Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.12/dist-packages (from tensorboard~=2.20.0->tensorflow) (0.7.2) Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from tensorboard~=2.20.0->tensorflow) (3.1.8) Requirement already satisfied: markupsafe>=2.1.1 in /usr/local/lib/python3.12/dist-packages (from werkzeug>=1.0.1->tensorboard~=2.20.0->tensorflow) (3.0.3) Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich->keras>=3.10.0->tensorflow) (4.0.0) Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from rich->keras>=3.10.0->tensorflow) (2.20.0) Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich->keras>=3.10.0->tensorflow) (0.1.2)
2. Library Imports
import warnings; warnings.filterwarnings("ignore")
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping3. Strategy Overview
Long Short-Term Memory (LSTM) networks are a class of recurrent neural networks (RNN) designed to capture temporal dependencies in sequential data. They are well-suited for price time-series because they maintain a cell state that can retain information over hundreds of timesteps.
Architecture:
- Input layer: sliding window of
seq_lengthbars × feature count. - LSTM layers with dropout regularisation to prevent overfitting.
- Dense output layer: single neuron predicting the next bar's normalised close price.
Training procedure:
- Normalise close prices to [0, 1] using MinMaxScaler.
- Construct overlapping sequences of length
seq_length. - Train with early stopping on validation loss.
- Inverse-transform predictions to original price scale.
Signal derivation: Predicted close > current close → Buy (+1); < current close → Sell (−1).
Limitation: LSTMs require GPU resources for large datasets and hyperparameter tuning; on CPU they are slow. Random-walk synthetic data has minimal autocorrelation, so validation loss will be near-random.
4. Data Generation
def generate_data(periods: int) -> pd.DataFrame:
"""
Generate synthetic OHLCV price data using a geometric random walk.
Parameters
----------
periods : int
Number of 1-minute bars to generate.
Returns
-------
pd.DataFrame
DataFrame with columns: open, high, low, close, volume, datetime.
"""
start_date = pd.to_datetime("2024-01-01 00:00:00+00:00")
datetime_index = pd.date_range(start_date, periods=periods, freq="1min", tz="UTC")
price_data = []
last_close = 42000
for i in range(periods):
open_price = last_close + np.random.normal(0, last_close * 0.0005)
close_price = open_price + np.random.normal(0, last_close * 0.005)
body_high = max(open_price, close_price)
body_low = min(open_price, close_price)
high_price = max(body_high + abs(np.random.normal(0, last_close * 0.002)), open_price, close_price)
low_price = min(body_low - abs(np.random.normal(0, last_close * 0.002)), open_price, close_price)
if high_price < low_price:
high_price, low_price = low_price, high_price
price_data.append({
"open": max(1, int(open_price)),
"high": max(1, int(high_price)),
"low": max(1, int(low_price)),
"close": max(1, int(close_price)),
})
last_close = close_price
df = pd.DataFrame(price_data, index=datetime_index)
df.index.name = "datetime"
df["volume"] = np.random.uniform(100.0, 500.0, periods)
df["datetime"] = df.index.to_series()
return df.reset_index(drop=True)
df = generate_data(500)
display(df.head())| open | high | low | close | volume | datetime | |
|---|---|---|---|---|---|---|
| 0 | 42016 | 42052 | 41727 | 41781 | 107.627914 | 2024-01-01 00:00:00+00:00 |
| 1 | 41767 | 41866 | 41675 | 41691 | 410.389430 | 2024-01-01 00:01:00+00:00 |
| 2 | 41701 | 41706 | 41269 | 41349 | 428.486384 | 2024-01-01 00:02:00+00:00 |
| 3 | 41340 | 41350 | 41182 | 41185 | 444.410807 | 2024-01-01 00:03:00+00:00 |
| 4 | 41190 | 41467 | 41075 | 41440 | 262.143437 | 2024-01-01 00:04:00+00:00 |
5. LSTM Model
def lstm_model(
df: pd.DataFrame,
seq_length: int = 30,
epochs: int = 50,
batch_size: int = 32,
test_size: float = 0.2,
) -> tuple:
"""
Build, train, and evaluate an LSTM model for next-bar close price prediction.
Core logic
----------
1. Extract and normalise the close price series with MinMaxScaler.
2. Construct (X, y) pairs: X = sliding window of seq_length bars,
y = the close price of the bar immediately following the window.
3. Split chronologically into train/test sets.
4. Define a two-layer LSTM with dropout, compiled with Adam and MSE loss.
5. Train with early stopping (monitor val_loss, patience=10).
6. Inverse-transform predictions and compute error metrics.
Parameters
----------
df : pd.DataFrame OHLCV DataFrame with 'close' column.
seq_length : int Number of historical bars per input sequence.
epochs : int Maximum training epochs.
batch_size : int Mini-batch size.
test_size : float Fraction of data reserved for testing.
Returns
-------
tuple
(model, predictions, y_test_inv, scaler, history)
"""
df = df.copy().sort_values("datetime", ignore_index=True)
close = df["close"].values.reshape(-1, 1)
# ── Normalisation ─────────────────────────────────────────────────────────
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(close)
# ── Sequence construction ─────────────────────────────────────────────────
X, y = [], []
for i in range(seq_length, len(scaled)):
X.append(scaled[i - seq_length: i, 0]) # lookback window
y.append(scaled[i, 0]) # next bar target
X, y = np.array(X), np.array(y)
X = X.reshape(X.shape[0], X.shape[1], 1) # (samples, timesteps, features)
# ── Train/test split ──────────────────────────────────────────────────────
split = int(len(X) * (1 - test_size))
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]
# ── Model architecture ────────────────────────────────────────────────────
model = Sequential([
LSTM(64, return_sequences=True, input_shape=(seq_length, 1)),
Dropout(0.2),
LSTM(32, return_sequences=False),
Dropout(0.2),
Dense(1),
])
model.compile(optimizer="adam", loss="mse")
model.summary()
# ── Training ──────────────────────────────────────────────────────────────
es = EarlyStopping(monitor="val_loss", patience=10, restore_best_weights=True)
history = model.fit(
X_train, y_train,
validation_split=0.1,
epochs=epochs,
batch_size=batch_size,
callbacks=[es],
verbose=1,
)
# ── Inference and inverse transform ──────────────────────────────────────
preds = model.predict(X_test)
preds_inv = scaler.inverse_transform(preds)
y_inv = scaler.inverse_transform(y_test.reshape(-1, 1))
rmse = np.sqrt(mean_squared_error(y_inv, preds_inv))
mae = mean_absolute_error(y_inv, preds_inv)
print(f"\nTest RMSE: {rmse:.2f} | MAE: {mae:.2f}")
return model, preds_inv, y_inv, scaler, history
model, preds, actuals, scaler, history = lstm_model(df, seq_length=30, epochs=30)Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ lstm (LSTM) │ (None, 30, 64) │ 16,896 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout (Dropout) │ (None, 30, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ lstm_1 (LSTM) │ (None, 32) │ 12,416 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_1 (Dropout) │ (None, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense (Dense) │ (None, 1) │ 33 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 29,345 (114.63 KB)
Trainable params: 29,345 (114.63 KB)
Non-trainable params: 0 (0.00 B)
Epoch 1/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 235ms/step - loss: 0.0330 - val_loss: 0.0717 Epoch 2/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 140ms/step - loss: 0.0103 - val_loss: 0.1011 Epoch 3/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 102ms/step - loss: 0.0071 - val_loss: 0.0639 Epoch 4/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 84ms/step - loss: 0.0066 - val_loss: 0.0774 Epoch 5/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 104ms/step - loss: 0.0066 - val_loss: 0.0611 Epoch 6/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 33ms/step - loss: 0.0067 - val_loss: 0.0528 Epoch 7/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step - loss: 0.0058 - val_loss: 0.0492 Epoch 8/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step - loss: 0.0061 - val_loss: 0.0388 Epoch 9/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - loss: 0.0062 - val_loss: 0.0537 Epoch 10/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step - loss: 0.0057 - val_loss: 0.0346 Epoch 11/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - loss: 0.0052 - val_loss: 0.0291 Epoch 12/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step - loss: 0.0059 - val_loss: 0.0341 Epoch 13/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 33ms/step - loss: 0.0049 - val_loss: 0.0363 Epoch 14/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step - loss: 0.0052 - val_loss: 0.0226 Epoch 15/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step - loss: 0.0053 - val_loss: 0.0214 Epoch 16/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - loss: 0.0056 - val_loss: 0.0194 Epoch 17/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step - loss: 0.0052 - val_loss: 0.0433 Epoch 18/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - loss: 0.0056 - val_loss: 0.0366 Epoch 19/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - loss: 0.0051 - val_loss: 0.0167 Epoch 20/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - loss: 0.0042 - val_loss: 0.0198 Epoch 21/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step - loss: 0.0045 - val_loss: 0.0314 Epoch 22/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step - loss: 0.0045 - val_loss: 0.0203 Epoch 23/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 55ms/step - loss: 0.0043 - val_loss: 0.0129 Epoch 24/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 57ms/step - loss: 0.0041 - val_loss: 0.0166 Epoch 25/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 57ms/step - loss: 0.0045 - val_loss: 0.0175 Epoch 26/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step - loss: 0.0044 - val_loss: 0.0164 Epoch 27/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step - loss: 0.0044 - val_loss: 0.0157 Epoch 28/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - loss: 0.0040 - val_loss: 0.0087 Epoch 29/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step - loss: 0.0041 - val_loss: 0.0121 Epoch 30/30 [1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - loss: 0.0037 - val_loss: 0.0116 [1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 152ms/step Test RMSE: 453.48 | MAE: 365.23
6. Visualization — Training Loss and Predictions
fig = make_subplots(rows=1, cols=2,
subplot_titles=["Training / Validation Loss", "Predicted vs Actual Close"])
fig.add_trace(go.Scatter(y=history.history["loss"], name="Train Loss",
line=dict(color="blue")), row=1, col=1)
fig.add_trace(go.Scatter(y=history.history["val_loss"], name="Val Loss",
line=dict(color="orange")), row=1, col=1)
fig.add_trace(go.Scatter(y=actuals[:, 0], name="Actual", line=dict(color="blue")), row=1, col=2)
fig.add_trace(go.Scatter(y=preds[:, 0], name="Predicted", line=dict(color="red", dash="dash")), row=1, col=2)
fig.update_layout(title_text="LSTM Price Prediction", height=500)
fig.show()