Predicting the Price of Bitcoin, Intro to LSTM

Use artificial intelligence to predict the value of Bitcoin

Feature Image

Today we are going to discuss how to predict the price of Bitcoin by analyzing the pricing information for the last 6 years. Note that we already have established that our analysis will only focus on the pricing information, leaving aside any factor which may impact the price of Bitcoin, like for example, news, which can play a very important rule. For this reason, I say that this article and related project are only intended for educational purposes and should not be used in production. It is an overly simplistic model that will help us explain and understand time series forecasting using Python and Recurrent Neural Networks (RNNs), more precisely we will build an LSTM (Long-Short Term Memory) model.

All the code for the article can be found here.

Introductory concepts

What is an RNN?

A recurrent neural network (RNN) is a class of artificial neural networks where connections between the nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior and makes them great for time series analysis, speech recognition, grammar learning, literal compositions, etc.

What is an LSTM?

Long-Short Term Memory (LSTM) is a type of RNN that allows us to process not only single data points (such as images) but also entire sequences of data (such as speech or video). They are a great choice for time series forecasting, and they are the type of architecture we will be using today.



Data Exploration

Before we do anything we need to gather the data (which I already did for you) and we need to understand the data. Let’s first load the dataset with Python and take a first look:

data = pd.read_csv("data/bitcoin.csv")
data = data.sort_values('Date')
Bitcoin Data, source: coinbase

Bitcoin Data, source: coinbase

The head() function gives us already some valuable information about the columns of the dataset and what the information could look like. For our purposes, we are interested in the Close column, which contains the price of Bitcoin at the end of the day, for that particular date. Let’s see if we can build a chart that shows us the price of Bitcoin over time using our data set.

price = data[['Close']]

plt.figure(figsize = (15,9))
plt.xticks(range(0, data.shape[0],50), data['Date'].loc[::50],rotation=45)
plt.title("Bitcoin Price",fontsize=18, fontweight='bold')
plt.ylabel('Close Price (USD)',fontsize=18)
Price of Bitcoin from Dec, 2014 to Jun 2020

Price of Bitcoin from Dec, 2014 to Jun 2020

Remember those days bitcoin was almost 20k? Let’s not get distracted and let’s see if our Close information contains any null values, null values are of no use for us and we should work them out if there’s any.
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2001 entries, 2000 to 0
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Close   2001 non-null   float64
dtypes: float64(1)
memory usage: 31.3 KB

A simple .info() over our dataframe gives us some useful information, like the number of entries, and the number of Non-Null entries, which in our case are the same, so nothing else to do here.

Data Preparation


The first step we will take to our data is to normalize its values. The goal of normalization is to change the values of numeric columns in the data set to a common scale, without distorting differences in the ranges of values.

For our purposes, we will use MinMaxScaler from the sklearn library

from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()

norm_data = min_max_scaler.fit_transform(price.values)

Let’s try to compare our values before and after normalizing:

Real: [370.], Normalized: [0.01280082]
Real: [426.1], Normalized: [0.01567332]
Real: [8259.99], Normalized: [0.41679416]

Data Split

During this step we are going to actually tackle 2 problems, the first is that we need to split our data set into training data and test data. Training data we will use to teach our model while the test data we will use as a baseline for comparison for our predictions. This is very important, as we want to make sure our predictions make sense, but we can’t test on the same data we train our network as we can run into the risk of overfitting.

In addition here we will also prepare our data for the LSTM network. This particular type of network requires us to send the time series in chunks of data, separating the “history” data we will use for training and our “target” which tell how far in the future does the model needs to learn to predict.

The responsible for this second part will be our univariate_data function:

def univariate_data(dataset, start_index, end_index, history_size, target_size):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
    indices = range(i-history_size, i)
    # Reshape data from (history_size,) to (history_size, 1)
    data.append(np.reshape(dataset[indices], (history_size, 1)))
  return np.array(data), np.array(labels)

And the split will happen here:

past_history = 5
future_target = 0

TRAIN_SPLIT = int(len(norm_data) * 0.8)

x_train, y_train = univariate_data(norm_data,

x_test, y_test = univariate_data(norm_data,

By using past_history, we call tell our network that we need to use 5 days of data to learn to predict the next point in the time series future_target.

Build the model

The next step is to build our model architecture. Finding the right model is an art, and it will take several tries plus experience to find the right layers and hyper-parameters for each one of them.

We won’t go into the details on each layer, there’s enough complexity to write a post for each. But I’ll highlight that the model we built here is fairly simple and pretty standard for this kind of problems, at least in the type of layers used.

from keras.models import Sequential
from keras.optimizers import Adam
from keras.layers import Dense, LSTM, LeakyReLU, Dropout

num_units = 64
learning_rate = 0.0001
activation_function = 'sigmoid'
adam = Adam(lr=learning_rate)
loss_function = 'mse'
batch_size = 5
num_epochs = 50

# Initialize the RNN
model = Sequential()
model.add(LSTM(units = num_units, activation=activation_function, input_shape=(None, 1)))
model.add(Dense(units = 1))

# Compiling the RNN
model.compile(optimizer=adam, loss=loss_function)

Let’s see how our architecture looks like:

Model: "sequential_13"
Layer (type)                 Output Shape              Param #   
lstm_6 (LSTM)                (None, 64)                16896     
leaky_re_lu_4 (LeakyReLU)    (None, 64)                0         
dropout_4 (Dropout)          (None, 64)                0         
dense_6 (Dense)              (None, 1)                 65        
Total params: 16,961
Trainable params: 16,961
Non-trainable params: 0

Train the model

Now that we have our data ready, and our model compiled, we can start training, and with Keras is as simple as one line of code:

# Using the training set to train the model
history =

Finding the right hyper-parameters here is also part of the art, however, it is very important that the parameter shuffle is set to False. Our analysis depends completely on the order of the information, if we change the order our results will make no sense at all.

Training for this model is something you can do, even without GPU, the amount of data is very low, and the network architecture is very simple. On more advanced models, and with more granular information, these models can take hours or days to train.

After you finish training, it’s important that we evaluate the result of our training, did we do good? what’s our training loss and validation loss function look like? If something doesn’t look right, then you probably need to go back to the previous steps and play with you hyper-parameters, or maybe even revise your model architecture.

Here is a nice chart you can use to compare both functions:

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(loss))


plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title("Training and Validation Loss")
Training and Validation Loss

Training and Validation Loss

My results may not be ideal, but they are enough for our purposes. There’s an interesting article which explains some issues you can visually see in this chart, I recommend that you take a look at it:


With our model now trained, we can start making some predictions and evaluating those predictions to our test data to see how well our model is doing:

original = pd.DataFrame(min_max_scaler.inverse_transform(y_test))
predictions = pd.DataFrame(min_max_scaler.inverse_transform(model.predict(x_test)))

ax = sns.lineplot(x=original.index, y=original[0], label="Test Data", color='royalblue')
ax = sns.lineplot(x=predictions.index, y=predictions[0], label="Prediction", color='tomato')
ax.set_title('Bitcoin price', size = 14, fontweight='bold')
ax.set_xlabel("Days", size = 14)
ax.set_ylabel("Cost (USD)", size = 14)
ax.set_xticklabels('', size=10)
Predicted vs Actual Bitcoin price

Predicted vs Actual Bitcoin price

That chart looks pretty good to me, very happy with the results, what do you think?


RNNs and LSTM are great architectures we can use to analyze and predict time-series information. In this post, we focused more on the story than the technical details of the implementation, but if you are interested in the topic, do your research, check out the code, play with it, change the layers, the hyper-parameters, try different things, use different columns, or normalization methods, read more detailed articles, and papers on the subject.

Thanks for reading!

Join the Free Newsletter

A free, weekly e-mail with the best new articles.

We won't send you spam. Unsubscribe at any time.


If you like this article and you want to support my work, you can: