There are many ways how to smooth noisy time series data. One of the more sophisticated options is to use Savitzky-Golay (savgol) filter. Main advantage is tah Savitzky Golay filter in its approximation remains true to the original trend of the noisy dataset. It just supresses high frequency noise.

It is utilizing low degree polynomial fits and least squares method. The polynomial fits are not done over whole dataset (would lead to inaccurate smoothing), but only over certain amount of datapoinst that are close to each other.

To implement Savitzky-Golay filter we will be using function provided by Scipy library.

# __Savitzky-Golay smoothing__
# savgol_filter(input data, window size, polynomial order)
y_sg = signal.savgol_filter(y_data, 101, 8)

Inputs for function are:

  • initial time series data
  • window size for smoothing (has to be odd number)
  • order of polynomial that will be used for partial data fits

In this example we are looking for ways how to smooth stock price data. Smoothed stock data can be then used for further processing, for example for trend analytics and many other applications.

Import libraries

In [70]:
#optional installations: 
#!pip install yfinance --upgrade --no-cache-dir
#!pip3 install pandas_datareader

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

# ___library_import_statements___
import pandas as pd

# for pandas_datareader, otherwise it might have issues, sometimes there is some version mismatch
pd.core.common.is_list_like = pd.api.types.is_list_like
import pandas_datareader.data as web
import numpy as np
import scipy.signal as signal
import matplotlib.pyplot as plt
import datetime
import time

#newest yahoo API 
import yfinance as yahoo_finance

%matplotlib inline
pd.set_option('display.expand_frame_repr', False)

Select index/company on the stock market

In [71]:
# ___variables___
ticker = 'NGG'   
#ticker = 'SNP'    # SINOPEC China Petroleum and Chemical Corp 
#ticker = 'CHA'    # China Telecom
#ticker = 'CHL'    # China Mobile
#ticker = 'NGG'    # National Grid

start_time = datetime.datetime(2019, 1, 1)
#end_time = datetime.datetime(2019, 1, 20)
end_time = datetime.datetime.now().date().isoformat()         # today

Get the price data from yahoo API

In [72]:
# yahoo gives only daily historical data
connected = False
while not connected:
    try:
        ticker_df = web.get_data_yahoo(ticker, start=start_time, end=end_time)
        connected = True
        print('connected to yahoo')
    except Exception as e:
        print("type error: " + str(e))
        time.sleep( 5 )
        pass   

# use numerical integer index instead of date    
ticker_df = ticker_df.reset_index()
print(ticker_df.head(5))
connected to yahoo
        Date       High        Low       Open      Close     Volume  Adj Close
0 2019-01-02  48.849998  47.959999  48.279999  48.750000  1329800.0  45.964916
1 2019-01-03  49.299999  48.720001  48.900002  48.990002  1022300.0  46.191208
2 2019-01-04  49.790001  49.090000  49.150002  49.779999   670200.0  46.936069
3 2019-01-07  49.930000  49.500000  49.630001  49.770000   820100.0  46.926643
4 2019-01-08  50.320000  49.610001  49.720001  50.299999   776900.0  47.426365

Savitzky-Golay implementation with Scipy

In [73]:
# discrete dataset
x_data = ticker_df.index.tolist()      # the index will be our x axis, not date
y_data = ticker_df['Low']

# x values for the savgol filter
x = np.linspace(0, max(ticker_df.index.tolist()), max(ticker_df.index.tolist()) + 1)

# __ Savitzky-Golay smoothing __
# savgol_filter(input data, window size, polynomial order)
#y_sg = signal.savgol_filter(y_data, 101, 8)
y_sg_1 = signal.savgol_filter(y_data, 51, 3)
y_sg_2 = signal.savgol_filter(y_data, 61, 4)
y_sg_3 = signal.savgol_filter(y_data, 71, 5)
y_sg_4 = signal.savgol_filter(y_data, 81, 6)

# ___ plotting ___
plt.figure(figsize=(15, 6), dpi= 120, facecolor='w', edgecolor='k')

# plot stock data
plt.plot(x_data, y_data, 'o', markersize=1.5, color='grey', alpha=0.7)

# plot savgol fit
plt.plot(x, y_sg_1, '-', markersize=1.0, alpha=0.5)
plt.plot(x, y_sg_2, '-', markersize=1.0, alpha=0.5)
plt.plot(x, y_sg_3, '-', markersize=1.0, alpha=0.5)
plt.plot(x, y_sg_4, '-', markersize=1.0, alpha=0.5)

plt.legend([ ticker + ' ' + 'closing price', 'savgol fits'])

plt.show()

Summary:

We see that the savgol fit works pretty nicely. Follows the price trend of original input data, does not wildly oscillate as polynomial fits of higher degrees like to do. Also is not delayed like moving averages are. So far savgol filter works the best of all the price approximation techniques I have tried so far (we just need to tweak window and polynomial parameters a bit). Very nice is to see that the fit behaqves consistently for a range of different smoothing windows and polynomial degrees.