There are many ways how to smooth noisy time series data. One of the more sophisticated options is to use Savitzky-Golay (savgol) filter. Main advantage is tah Savitzky Golay filter in its approximation remains true to the original trend of the noisy dataset. It just supresses high frequency noise.
It is utilizing low degree polynomial fits and least squares method. The polynomial fits are not done over whole dataset (would lead to inaccurate smoothing), but only over certain amount of datapoinst that are close to each other.
To implement Savitzky-Golay filter we will be using function provided by Scipy library.
# __Savitzky-Golay smoothing__
# savgol_filter(input data, window size, polynomial order)
y_sg = signal.savgol_filter(y_data, 101, 8)
Inputs for function are:
In this example we are looking for ways how to smooth stock price data. Smoothed stock data can be then used for further processing, for example for trend analytics and many other applications.
#optional installations:
#!pip install yfinance --upgrade --no-cache-dir
#!pip3 install pandas_datareader
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
# ___library_import_statements___
import pandas as pd
# for pandas_datareader, otherwise it might have issues, sometimes there is some version mismatch
pd.core.common.is_list_like = pd.api.types.is_list_like
import pandas_datareader.data as web
import numpy as np
import scipy.signal as signal
import matplotlib.pyplot as plt
import datetime
import time
#newest yahoo API
import yfinance as yahoo_finance
%matplotlib inline
pd.set_option('display.expand_frame_repr', False)
# ___variables___
ticker = 'NGG'
#ticker = 'SNP' # SINOPEC China Petroleum and Chemical Corp
#ticker = 'CHA' # China Telecom
#ticker = 'CHL' # China Mobile
#ticker = 'NGG' # National Grid
start_time = datetime.datetime(2019, 1, 1)
#end_time = datetime.datetime(2019, 1, 20)
end_time = datetime.datetime.now().date().isoformat() # today
# yahoo gives only daily historical data
connected = False
while not connected:
try:
ticker_df = web.get_data_yahoo(ticker, start=start_time, end=end_time)
connected = True
print('connected to yahoo')
except Exception as e:
print("type error: " + str(e))
time.sleep( 5 )
pass
# use numerical integer index instead of date
ticker_df = ticker_df.reset_index()
print(ticker_df.head(5))
# discrete dataset
x_data = ticker_df.index.tolist() # the index will be our x axis, not date
y_data = ticker_df['Low']
# x values for the savgol filter
x = np.linspace(0, max(ticker_df.index.tolist()), max(ticker_df.index.tolist()) + 1)
# __ Savitzky-Golay smoothing __
# savgol_filter(input data, window size, polynomial order)
#y_sg = signal.savgol_filter(y_data, 101, 8)
y_sg_1 = signal.savgol_filter(y_data, 51, 3)
y_sg_2 = signal.savgol_filter(y_data, 61, 4)
y_sg_3 = signal.savgol_filter(y_data, 71, 5)
y_sg_4 = signal.savgol_filter(y_data, 81, 6)
# ___ plotting ___
plt.figure(figsize=(15, 6), dpi= 120, facecolor='w', edgecolor='k')
# plot stock data
plt.plot(x_data, y_data, 'o', markersize=1.5, color='grey', alpha=0.7)
# plot savgol fit
plt.plot(x, y_sg_1, '-', markersize=1.0, alpha=0.5)
plt.plot(x, y_sg_2, '-', markersize=1.0, alpha=0.5)
plt.plot(x, y_sg_3, '-', markersize=1.0, alpha=0.5)
plt.plot(x, y_sg_4, '-', markersize=1.0, alpha=0.5)
plt.legend([ ticker + ' ' + 'closing price', 'savgol fits'])
plt.show()
We see that the savgol fit works pretty nicely. Follows the price trend of original input data, does not wildly oscillate as polynomial fits of higher degrees like to do. Also is not delayed like moving averages are. So far savgol filter works the best of all the price approximation techniques I have tried so far (we just need to tweak window and polynomial parameters a bit). Very nice is to see that the fit behaqves consistently for a range of different smoothing windows and polynomial degrees.