Case Study 1: Bike Sales Analysis using Python.
The dataset was downloaded from Kaggle and I wanted to test my python skills. the dataset consists of 18 Columns and 113,036 rows. This analysis aims to answer the following questions;
- what is the age distribution of the customers?
- what is the most profitable year?
- what is the most profitable month?
- which Gender has the most orders?
- which country/state generates the highest revenue?
- is there any correlation between the customer’s age and revenue?
- which category/subcategory generates the most profit?
Importing the libraries and dataset
Understanding the data
After importing the dataset, it is important to understand the data you’re working with i.e knowing the number of columns and rows, knowing the different columns you will be working with.
Data Cleaning
The data was already clean to an extent, I just had to make a few changes for readability. I replaced the ‘M’ and ‘F’ in the gender column with ‘Male’ and ‘Female’ and removed the columns ‘Day’ and ‘Date’ as I won’t be needing them.
Now our data is ready for analysis!!!
The first thing I did was to know the age distribution of the customers. it was determined that the highest number of customers were Adults between the ages of 35 to 64. I went ahead to plot a pie chart to visualize the results
What is the most profitable year?
I grouped the data by year and summed up the profits. I sorted the results from highest to lowest.
The results show that the highest profitable year is the year 2015.
The most profitable month
Similarly ran the same code to get the highest profitable month.
The Chart shows that the highest profitable month is December, followed by June. A factor responsible for this is that people tend to gift bikes as presents during Christmas.
Which Gender has the most orders?
The chart shows that about 52% of orders were made by Men.
Which country/state generates the highest revenue?
The chart shows that the highest revenue-generating country is the United States followed by Australia.
Similarly, the highest revenue-generating state is California.
Is there any correlation between the customer’s age and revenue?
For this, I first of all generation a correlation table with the data and then proceeded to plot a heat map to further visualize the relationship between variables.
The correlation heat map shows no relationship between the Customer’s age and revenue. Instead, the relationship exists between the unit cost, price, profit, and revenue.
Which category/subcategory generates the most profit?
The results show that the most profitable category is bikes.
Finally, the most profitable subcategory is Road bikes.
Summary
The dataset was an excellent avenue to practice my data analysis skills with python. Generally, I wouldn’t use python to visualize data as there are tools like PowerBI and Tableau which I will be uploading soon.