In this article we will have a look at a way how to normalize time series data. Specifically we will be normalizing stock price data on daily timeframe. Normalizing time series data is benefitial when we want to compare multiple time series/stock price trends.
The benefit is that whatever the price range was originally, after normalization the price will be ranging in an interval between 0 and 1. So normalization is convenient for comparing price trends for multiple stocks in one graph.
Normalizing time series is also good as preprocessing technique for later usage in predictions via neural networks. Neural networks prefer standardized/normalized input so they can learn faster.
How to normalize time series:
# time series normalization equation
y = (x - min) / (max - min)
x ... price from input time series
y ... normalized time series value
min ... minimum price in the time seies
max ... maximum price of the time series
#optional installations:
#!pip install yfinance --upgrade --no-cache-dir
#!pip3 install pandas_datareader
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
# ___library_import_statements___
import pandas as pd
# for pandas_datareader, otherwise it might have issues, sometimes there is some version mismatch
pd.core.common.is_list_like = pd.api.types.is_list_like
# make pandas to print dataframes nicely
pd.set_option('expand_frame_repr', False)
import pandas_datareader.data as web
import numpy as np
import matplotlib.pyplot as plt
import datetime
import time
#newest yahoo API
import yfinance as yahoo_finance
#optional
#yahoo_finance.pdr_override()
%matplotlib inline
# ___variables___
ticker = 'TSLA'
start_time = datetime.datetime(2017, 10, 1)
#end_time = datetime.datetime(2019, 1, 20)
end_time = datetime.datetime.now().date().isoformat() # today
def get_data(ticker, start_time, end_time):
# yahoo gives only daily historical data
connected = False
while not connected:
try:
ticker_df = web.get_data_yahoo(ticker, start=start_time, end=end_time)
connected = True
print('connected to yahoo')
except Exception as e:
print("type error: " + str(e))
time.sleep( 5 )
pass
# use numerical integer index instead of date
ticker_df = ticker_df.reset_index()
print(ticker_df.head(5))
return ticker_df
df = get_data(ticker, start_time, end_time)
def normalize_data(df):
# df on input should contain only one column with the price data (plus dataframe index)
min = df.min()
max = df.max()
x = df
# time series normalization part
# y will be a column in a dataframe
y = (x - min) / (max - min)
return y
df['norm'] = normalize_data(df['Adj Close'])
print(df.head())
print(df.tail())
Plot regular price chart:
# plot price
plt.figure(figsize=(15,5))
plt.plot(df['Date'], df['Adj Close'])
plt.title('Price chart (Adj Close) ' + ticker)
plt.show()
When comparing to above picture we see that the trend looks identical with normalized time series as with the regular prices. The only difference is that normalized price time series ranges between 0 and 1. This makes it very convenient to plot multiple stocks in one picture to compare trends.
# plot normalized price chart
plt.figure(figsize=(15,5))
plt.title('Normalized price chart ' + ticker)
plt.plot(df['Date'], df['norm'])
plt.show()
Let's download data for another company and visualize both stocks in graph with normalized prices.
ticker1='GE'
df1 = get_data(ticker1, start_time, end_time)
df1['norm'] = normalize_data(df1['Adj Close'])
# plot normalized price chart
plt.figure(figsize=(15,5))
plt.title('Normalized price chart ' + ticker + ' ' + ticker1)
plt.plot(df['Date'], df['norm'])
plt.plot(df1['Date'], df1['norm'])
plt.show()
As we can see, now we can compare the trends in stock prices easily in the normalized graph.