Lets have an sqlite3 database with stock price data called stock_database.db. We would like to load the data do pandas dataframe for further processing. We will read data from table called stock_table, this table contains Apple stock data (ticker AAPL).

image.png

In [28]:
import sqlite3
import pandas as pd
from pandas import DataFrame
# pretty printing of pandas dataframe
pd.set_option('expand_frame_repr', False)  

Below part creates database connection to our database file and invokes database cursor so we can run SQL queries. Then the dataframe simply grabs the output of the SQL query via cur.fetchall statement.

In [29]:
engine = sqlite3.connect('stock_database.db')
cur = engine.cursor()

cur.execute('SELECT * FROM stock_table')
df = DataFrame(cur.fetchall(), columns=['index','Date','High','Low','Open','Close','Volume','Adj_Close'])   

When we load the data to the dataframe we can see that dataframe added its own default index to the table data. The database had its own indexing, so the data ends up having two independent indices. Should not be an issue though. If necessary we can always reindex to different column.

In [30]:
print(df.head())
   index                 Date        High         Low        Open       Close      Volume   Adj_Close
0      0  2017-10-02 00:00:00  154.449997  152.720001  154.259995  153.809998  18698800.0  148.757111
1      1  2017-10-03 00:00:00  155.089996  153.910004  154.009995  154.479996  16230300.0  149.405106
2      2  2017-10-04 00:00:00  153.860001  152.460007  153.630005  153.479996  20163800.0  148.437958
3      3  2017-10-05 00:00:00  155.440002  154.050003  154.179993  155.389999  21283800.0  150.285172
4      4  2017-10-06 00:00:00  155.490005  154.559998  154.970001  155.300003  17407600.0  150.198135
In [31]:
# closing database session
cur.close()
engine.close()

Once the data is in the dataframe, it can be easily visualized using matplotlib.

In [32]:
import matplotlib.pyplot as plt
%matplotlib inline
In [33]:
plt.figure(figsize=(15,4))
plt.title('AAPL stock price data')
plt.plot(df['index'], df['Adj_Close'])
Out[33]:
[<matplotlib.lines.Line2D at 0x7f490867add8>]

Each stock can have its own table in the database, we can then join these tables based on primary key for further processing. Also databases are invaluable for large datasets that do not fit to the memory. In such case we can load partial dataset to the dataframe and go from there.

For other dataframe storing options see: https://stackoverflow.com/questions/17098654/how-to-store-a-dataframe-using-pandas