Here we will see how we can aggregate daily OHLC stock data into weekly time window.
Will be using pandas library to perform the resampling
df.resample('W').agg(agg_dict)
resample('W')
means we will be using Weekly time window for aggregationagg(agg_dict)
takes dictionary as a parameter, the dictionary says in which way we will aggregate column data (we can use mean, min, max and so on)Weekly resampled OHLC data can be used for further computation of indicators (such as weekly RSI or weekly moving averages).
Import the libraries
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
# ___library_import_statements___
import pandas as pd
# make pandas to print dataframes nicely
pd.set_option('expand_frame_repr', False)
import pandas_datareader.data as web
import matplotlib.pyplot as plt
import datetime
#newest yahoo API
import yfinance as yahoo_finance
%matplotlib inline
Specify stock ticker and time range for historical data
# ___variables___
ticker = 'AAPL'
start_time = datetime.datetime(2017, 10, 1)
#end_time = datetime.datetime(2019, 1, 20)
end_time = datetime.datetime.now().date().isoformat() # today
Download the data from Yahoo! Finance
# yahoo gives only daily historical data
def get_data(ticker, start_time, end_time):
connected = False
while not connected:
try:
df = web.get_data_yahoo(ticker, start=start_time, end=end_time)
connected = True
print('connected to yahoo')
except Exception as e:
print("type error: " + str(e))
time.sleep( 5 )
pass
return df
# run the function
df = get_data(ticker, start_time, end_time)
print(df.head(7))
print(df.tail(7))
Resample using dictionary, each dataframe column will be aggregated differently. For example when we want lowest price during the weekly window, we will use 'Low': 'min'
. Aggregation of other columns has similar logic.
agg_dict = {'Open': 'first',
'High': 'max',
'Low': 'min',
'Close': 'last',
'Adj Close': 'last',
'Volume': 'mean'}
# resampled dataframe
# 'W' means weekly aggregation
r_df = df.resample('W').agg(agg_dict)
print(r_df.head(7))
print('---------')
print(r_df.tail(7))
def plot_OHLC(data, ticker, window):
#Run these commands to avoid Future warning for matplotlib
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
plt.figure(figsize=(15,5))
plt.title('{} price data in {} timeframe'.format(ticker, window))
plt.plot(data['Open'])
plt.plot(data['High'])
plt.plot(data['Low'])
plt.plot(data['Close'])
plt.plot(data['Adj Close'])
plt.legend()
plt.show()
def plot_volume(data, ticker, window):
plt.figure(figsize=(15,2))
plt.title('{} volume data in {} timeframe'.format(ticker, window))
plt.plot(data['Volume'])
plt.legend()
plt.show()
Plot price:
plot_OHLC(df, ticker, 'daily')
plot_OHLC(r_df, ticker, 'weekly')
Plot volume:
plot_volume(df, ticker, 'daily')
plot_volume(r_df, ticker, 'weekly')
We have transformed daily stock data to weekly time window. This transformation might be benefitial for writing stock screening scripts for swing trading on larger timeframes. The weekly window should get us focused on the bigger picture and get rid of great part of the noise in the data.
The only concern is that we are aggregating on weekly time window, but stock market trades only 5 days a week. Will need to check further, maybe we will need 5 day window. But so far it looks like Pandas is clever, since weekly aggregation ends on Sunday every time. If today is Tuesday and we pass today's data into weekly aggregation, resampling shows the value for upcoming Sunday, which is exactly what we wanted. So we should be fine.
Further reading:
https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/
https://medium.com/python-data/time-series-aggregation-techniques-with-python-a-look-at-major-cryptocurrencies-a9eb1dd49c1b
https://towardsdatascience.com/basic-time-series-manipulation-with-pandas-4432afee64ea
https://chrisalbon.com/python/data_wrangling/pandas_group_data_by_time/