Testing the machine learning algorithm for regression, dataset populated with fake data.
You are owner of imaginary shop that sells stuff online via two channels: mobile app and through website. You are given dataset with customer info, based on such info we should decide whether it is better to invest more to enhancing the website or the mobile app to drive sales.
In this section we will inspect the data to get a feel for it, the next section will deal with machine learning.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# load fake customer data
customers = pd.read_csv('Ecommerce Customers')
customers.head()
customers.describe()
customers.info()
Time spent on website and amount of money spent yearly do not correlate much
sns.jointplot(data=customers, x="Time on Website", y="Yearly Amount Spent", kind="hex")
On the other hand the amount of time spent on the app correlates more to yearly revenues, so the more time is spent on app the higher revenues we have in our eshop.
sns.jointplot(data=customers, x="Time on App", y="Yearly Amount Spent", kind="hex")
Let's see if there are any other interesting correlations. We will see that length of membership is strongly correlated to the amount that is spent yearly.
sns.pairplot(customers)
sns.lmplot(data=customers, x="Length of Membership", y="Yearly Amount Spent")
Linear fit is having rather narrow error range, indicating nice accuracy of the linear model.
code snippets from Jose Portilla Udemy course "Python for Data Science and Machine Learning Bootcamp" https://www.udemy.com/python-for-data-science-and-machine-learning-bootcamp/