Lets have an sqlite3
database with stock price data called stock_database.db
. We would like to load the data do pandas dataframe for further processing. We will read data from table called stock_table
, this table contains Apple stock data (ticker AAPL).
import sqlite3
import pandas as pd
from pandas import DataFrame
# pretty printing of pandas dataframe
pd.set_option('expand_frame_repr', False)
Below part creates database connection to our database file and invokes database cursor so we can run SQL queries. Then the dataframe simply grabs the output of the SQL query via cur.fetchall
statement.
engine = sqlite3.connect('stock_database.db')
cur = engine.cursor()
cur.execute('SELECT * FROM stock_table')
df = DataFrame(cur.fetchall(), columns=['index','Date','High','Low','Open','Close','Volume','Adj_Close'])
When we load the data to the dataframe we can see that dataframe added its own default index to the table data. The database had its own indexing, so the data ends up having two independent indices. Should not be an issue though. If necessary we can always reindex to different column.
print(df.head())
# closing database session
cur.close()
engine.close()
Once the data is in the dataframe, it can be easily visualized using matplotlib
.
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(15,4))
plt.title('AAPL stock price data')
plt.plot(df['index'], df['Adj_Close'])
Each stock can have its own table in the database, we can then join these tables based on primary key for further processing. Also databases are invaluable for large datasets that do not fit to the memory. In such case we can load partial dataset to the dataframe and go from there.
For other dataframe storing options see: https://stackoverflow.com/questions/17098654/how-to-store-a-dataframe-using-pandas