Predicting the Price of Bitcoin, Intro to LSTM
Today we are going to discuss how to predict the price of Bitcoin by analyzing the pricing information for the last 6 years. Note that we already have established that our analysis will only focus on the pricing information, leaving aside any factor which may impact the price of Bitcoin, like for example, news, which can play a very important rule. For this reason, I say that this article and related project are only intended for educational purposes and should not be used in production. It is an overly simplistic model that will help us explain and understand time series forecasting using Python and Recurrent Neural Networks (RNNs), more precisely we will build an LSTM (Long-Short Term Memory) model.
All the code for the article can be found here .
What is an RNN?
A recurrent neural network (RNN) is a class of artificial neural networks where connections between the nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior and makes them great for time series analysis, speech recognition, grammar learning, literal compositions, etc.
What is an LSTM?
Long-Short Term Memory (LSTM) is a type of RNN that allows us to process not only single data points (such as images) but also entire sequences of data (such as speech or video). They are a great choice for time series forecasting, and they are the type of architecture we will be using today.
Before we do anything we need to gather the data (which I already did for you) and we need to understand the data. Let’s first load the dataset with Python and take a first look:
data = pd.read_csv("data/bitcoin.csv") data = data.sort_values('Date') data.head()
head() function gives us already some valuable information about the columns of the dataset and what the information could look like. For our purposes, we are interested in the
Close column, which contains the price of Bitcoin at the end of the day, for that particular date.
Let’s see if we can build a chart that shows us the price of Bitcoin over time using our data set.
price = data[['Close']] plt.figure(figsize = (15,9)) plt.plot(price) plt.xticks(range(0, data.shape,50), data['Date'].loc[::50],rotation=45) plt.title("Bitcoin Price",fontsize=18, fontweight='bold') plt.xlabel('Date',fontsize=18) plt.ylabel('Close Price (USD)',fontsize=18) plt.show()
Remember those days bitcoin was almost 20k? Let’s not get distracted and let’s see if our
Close information contains any null values, null values are of no use for us and we should work them out if there’s any.
price.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 2001 entries, 2000 to 0 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Close 2001 non-null float64 dtypes: float64(1) memory usage: 31.3 KB
A simple .info() over our dataframe gives us some useful information, like the number of entries, and the number of Non-Null entries, which in our case are the same, so nothing else to do here.
The first step we will take to our data is to normalize its values. The goal of normalization is to change the values of numeric columns in the data set to a common scale, without distorting differences in the ranges of values.
For our purposes, we will use
MinMaxScaler from the
from sklearn.preprocessing import MinMaxScaler min_max_scaler = MinMaxScaler() norm_data = min_max_scaler.fit_transform(price.values)
Let’s try to compare our values before and after normalizing:
Real: [370.], Normalized: [0.01280082] Real: [426.1], Normalized: [0.01567332] Real: [8259.99], Normalized: [0.41679416]
During this step we are going to actually tackle 2 problems, the first is that we need to split our data set into training data and test data. Training data we will use to teach our model while the test data we will use as a baseline for comparison for our predictions. This is very important, as we want to make sure our predictions make sense, but we can’t test on the same data we train our network as we can run into the risk of overfitting.
In addition here we will also prepare our data for the LSTM network. This particular type of network requires us to send the time series in chunks of data, separating the “history” data we will use for training and our “target” which tell how far in the future does the model needs to learn to predict.
The responsible for this second part will be our
def univariate_data(dataset, start_index, end_index, history_size, target_size): data =  labels =  start_index = start_index + history_size if end_index is None: end_index = len(dataset) - target_size for i in range(start_index, end_index): indices = range(i-history_size, i) # Reshape data from (history_size,) to (history_size, 1) data.append(np.reshape(dataset[indices], (history_size, 1))) labels.append(dataset[i+target_size]) return np.array(data), np.array(labels)
And the split will happen here:
past_history = 5 future_target = 0 TRAIN_SPLIT = int(len(norm_data) * 0.8) x_train, y_train = univariate_data(norm_data, 0, TRAIN_SPLIT, past_history, future_target) x_test, y_test = univariate_data(norm_data, TRAIN_SPLIT, None, past_history, future_target)
past_history, we call tell our network that we need to use 5 days of data to learn to predict the next point in the time series
Build the model
The next step is to build our model architecture. Finding the right model is an art, and it will take several tries plus experience to find the right layers and hyper-parameters for each one of them.
We won’t go into the details on each layer, there’s enough complexity to write a post for each. But I’ll highlight that the model we built here is fairly simple and pretty standard for this kind of problems, at least in the type of layers used.
from keras.models import Sequential from keras.optimizers import Adam from keras.layers import Dense, LSTM, LeakyReLU, Dropout num_units = 64 learning_rate = 0.0001 activation_function = 'sigmoid' adam = Adam(lr=learning_rate) loss_function = 'mse' batch_size = 5 num_epochs = 50 # Initialize the RNN model = Sequential() model.add(LSTM(units = num_units, activation=activation_function, input_shape=(None, 1))) model.add(LeakyReLU(alpha=0.5)) model.add(Dropout(0.1)) model.add(Dense(units = 1)) # Compiling the RNN model.compile(optimizer=adam, loss=loss_function)
Let’s see how our architecture looks like:
Model: "sequential_13" _______________________________________________________________ Layer (type) Output Shape Param # =============================================================== lstm_6 (LSTM) (None, 64) 16896 _______________________________________________________________ leaky_re_lu_4 (LeakyReLU) (None, 64) 0 _______________________________________________________________ dropout_4 (Dropout) (None, 64) 0 _______________________________________________________________ dense_6 (Dense) (None, 1) 65 =============================================================== Total params: 16,961 Trainable params: 16,961 Non-trainable params: 0 _______________________________________________________________
Train the model
Now that we have our data ready, and our model compiled, we can start training, and with Keras is as simple as one line of code:
# Using the training set to train the model history = model.fit( x_train, y_train, validation_split=0.1, batch_size=batch_size, epochs=num_epochs, shuffle=False )
Finding the right hyper-parameters here is also part of the art, however, it is very important that the parameter
shuffle is set to
False. Our analysis depends completely on the order of the information, if we change the order our results will make no sense at all.
Training for this model is something you can do, even without GPU, the amount of data is very low, and the network architecture is very simple. On more advanced models, and with more granular information, these models can take hours or days to train.
After you finish training, it’s important that we evaluate the result of our training, did we do good? what’s our training loss and validation loss function look like? If something doesn’t look right, then you probably need to go back to the previous steps and play with you hyper-parameters, or maybe even revise your model architecture.
Here is a nice chart you can use to compare both functions:
loss = history.history['loss'] val_loss = history.history['val_loss'] epochs = range(len(loss)) plt.figure() plt.plot(epochs, loss, 'b', label='Training loss') plt.plot(epochs, val_loss, 'r', label='Validation loss') plt.title("Training and Validation Loss") plt.legend() plt.show()
My results may not be ideal, but they are enough for our purposes. There’s an interesting article which explains some issues you can visually see in this chart, I recommend that you take a look at it: https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
With our model now trained, we can start making some predictions and evaluating those predictions to our test data to see how well our model is doing:
original = pd.DataFrame(min_max_scaler.inverse_transform(y_test)) predictions = pd.DataFrame(min_max_scaler.inverse_transform(model.predict(x_test))) ax = sns.lineplot(x=original.index, y=original, label="Test Data", color='royalblue') ax = sns.lineplot(x=predictions.index, y=predictions, label="Prediction", color='tomato') ax.set_title('Bitcoin price', size = 14, fontweight='bold') ax.set_xlabel("Days", size = 14) ax.set_ylabel("Cost (USD)", size = 14) ax.set_xticklabels('', size=10)
That chart looks pretty good to me, very happy with the results, what do you think?
RNNs and LSTM are great architectures we can use to analyze and predict time-series information. In this post, we focused more on the story than the technical details of the implementation, but if you are interested in the topic, do your research, check out the code , play with it, change the layers, the hyper-parameters, try different things, use different columns, or normalization methods, read more detailed articles, and papers on the subject.
Thanks for reading!