In this article we will get stock price data from an API, temporarily store it in dataframe and persistently move it to a database.

For a database we will choose sqlite3 that is part of standard library in python. There will be no need to install any database or to run any database daemon all the time on our system. Another advantage of sqlite3 is the fact that it makes database backups super easy, the database is just a file and can be moved around and backed up as any other file.

Let's import the necessary packages.

In [7]:
#optional installations: 
#!pip install yfinance --upgrade --no-cache-dir
#!pip3 install pandas_datareader

import sqlite3
import pandas as pd

# pretty printing of pandas dataframe
pd.set_option('expand_frame_repr', False)   

# for pandas_datareader, sometimes there can be version mismatch
pd.core.common.is_list_like = pd.api.types.is_list_like
import pandas_datareader.data as web

import datetime
import time

#newest yahoo API 
import yfinance as yahoo_finance

Call the Yahoo API and store the data in the Pandas dataframe.

In [8]:
# ___variables___
ticker = 'AAPL'    # Apple stock   

start_time = datetime.datetime(2017, 10, 1)
#end_time = datetime.datetime(2019, 1, 20)
end_time = datetime.datetime.now().date().isoformat()         # today

# get daily stock data from yahoo API
connected = False
while not connected:
    try:
        df = web.get_data_yahoo(ticker, start=start_time, end=end_time)
        connected = True
        print('connected to yahoo')
    except Exception as e:
        print("type error: " + str(e))
        time.sleep( 5 )
        pass   
    
# databases do not like column names that contain spaces 
df.rename(columns={"Adj Close": "Adj_Close"}, inplace = True)

# use numerical integer index instead of date   
df = df.reset_index()
print(df.head(5))
print(df.tail(5))
connected to yahoo
        Date        High         Low        Open       Close      Volume   Adj_Close
0 2017-10-02  154.449997  152.720001  154.259995  153.809998  18698800.0  148.757111
1 2017-10-03  155.089996  153.910004  154.009995  154.479996  16230300.0  149.405106
2 2017-10-04  153.860001  152.460007  153.630005  153.479996  20163800.0  148.437958
3 2017-10-05  155.440002  154.050003  154.179993  155.389999  21283800.0  150.285172
4 2017-10-06  155.490005  154.559998  154.970001  155.300003  17407600.0  150.198135
          Date        High         Low        Open       Close      Volume   Adj_Close
553 2019-12-12  272.559998  267.320007  267.779999  271.459991  34327600.0  271.459991
554 2019-12-13  275.299988  270.929993  271.459991  275.149994  33396900.0  275.149994
555 2019-12-16  280.790009  276.980011  277.000000  279.859985  32046500.0  279.859985
556 2019-12-17  281.769989  278.799988  279.570007  280.410004  28539600.0  280.410004
557 2019-12-18  280.480011  279.119995  279.799988  280.130005  10753703.0  280.130005

The sqlite3.connect connects to the database and creates it if it does not exist. The cur will act like the the actual cursor in the command line when we manually interact with the database. We will feed the SQL commands to cur (to be more precise to cur.execute).

In [9]:
# ___database_part___
engine = sqlite3.connect('stock_database.db')
cur = engine.cursor()

Exporting whole dataframe into database table is just one line command. The con directive is using our engine that is connecting to the database of our selection (in this case stock_database.db database). In this database it creates table called stock_table. If such table exists, it is replaced (other options are to append data or to raise an error). The index = True makes sure we keep the index (row numbers) that we set for the dataframe in previous steps.

In [10]:
# data frame to database
df.to_sql('stock_table', con=engine, if_exists='replace', index = True)

Finally we can run SQL query with python.

In [11]:
print('printing results of the SELECT statement: ')
print(' ')

select_statement = 'SELECT * FROM stock_table WHERE `Adj_Close` > 270.0;'
cur.execute(select_statement)
for line in cur.fetchall():
    print(line)
printing results of the SELECT statement: 
 
(549, '2019-12-06 00:00:00', 271.0, 267.29998779296875, 267.4800109863281, 270.7099914550781, 26518900.0, 270.7099914550781)
(552, '2019-12-11 00:00:00', 271.1000061035156, 268.5, 268.80999755859375, 270.7699890136719, 19689200.0, 270.7699890136719)
(553, '2019-12-12 00:00:00', 272.55999755859375, 267.32000732421875, 267.7799987792969, 271.4599914550781, 34327600.0, 271.4599914550781)
(554, '2019-12-13 00:00:00', 275.29998779296875, 270.92999267578125, 271.4599914550781, 275.1499938964844, 33396900.0, 275.1499938964844)
(555, '2019-12-16 00:00:00', 280.7900085449219, 276.9800109863281, 277.0, 279.8599853515625, 32046500.0, 279.8599853515625)
(556, '2019-12-17 00:00:00', 281.7699890136719, 278.79998779296875, 279.57000732421875, 280.4100036621094, 28539600.0, 280.4100036621094)
(557, '2019-12-18 00:00:00', 280.4800109863281, 279.1199951171875, 279.79998779296875, 280.1300048828125, 10753703.0, 280.1300048828125)
In [ ]:
# closing the database session
cur.close()
engine.close()

We can also connect to the database manually via terminal to see what is there and if the output matches with the python query:

image.png

Well, there are some rounding errors/differences in the outputs, but that is topic for another article.