import pandas as pd
import plotly as plt
import plotly.graph_objects as go
import plotly.express as px
import kaleido
Data Visualization of seaborn-data¶
Analyzing mpg.csv data from seaborn-data dataset: https://github.com/mwaskom/seaborn-data/blob/master/mpg.csv
I am using Plotly and Streamlit for this task.
mpg_data = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/refs/heads/master/mpg.csv')
# or read from data/mpg.csv
mpg_data.head(10)
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307.0 | 130.0 | 3504 | 12.0 | 70 | usa | chevrolet chevelle malibu |
1 | 15.0 | 8 | 350.0 | 165.0 | 3693 | 11.5 | 70 | usa | buick skylark 320 |
2 | 18.0 | 8 | 318.0 | 150.0 | 3436 | 11.0 | 70 | usa | plymouth satellite |
3 | 16.0 | 8 | 304.0 | 150.0 | 3433 | 12.0 | 70 | usa | amc rebel sst |
4 | 17.0 | 8 | 302.0 | 140.0 | 3449 | 10.5 | 70 | usa | ford torino |
5 | 15.0 | 8 | 429.0 | 198.0 | 4341 | 10.0 | 70 | usa | ford galaxie 500 |
6 | 14.0 | 8 | 454.0 | 220.0 | 4354 | 9.0 | 70 | usa | chevrolet impala |
7 | 14.0 | 8 | 440.0 | 215.0 | 4312 | 8.5 | 70 | usa | plymouth fury iii |
8 | 14.0 | 8 | 455.0 | 225.0 | 4425 | 10.0 | 70 | usa | pontiac catalina |
9 | 15.0 | 8 | 390.0 | 190.0 | 3850 | 8.5 | 70 | usa | amc ambassador dpl |
Initial Observation¶
The car dataset has many models of cars produced across different years. The origin and engine performance metrics are listed out.
Reading and cleaning dataset¶
Check for missing values¶
mpg_data.isna().sum()
mpg 0 cylinders 0 displacement 0 horsepower 6 weight 0 acceleration 0 model_year 0 origin 0 name 0 dtype: int64
mpg_data[mpg_data.isna().any(axis=1)]
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
32 | 25.0 | 4 | 98.0 | NaN | 2046 | 19.0 | 71 | usa | ford pinto |
126 | 21.0 | 6 | 200.0 | NaN | 2875 | 17.0 | 74 | usa | ford maverick |
330 | 40.9 | 4 | 85.0 | NaN | 1835 | 17.3 | 80 | europe | renault lecar deluxe |
336 | 23.6 | 4 | 140.0 | NaN | 2905 | 14.3 | 80 | usa | ford mustang cobra |
354 | 34.5 | 4 | 100.0 | NaN | 2320 | 15.8 | 81 | europe | renault 18i |
374 | 23.0 | 4 | 151.0 | NaN | 3035 | 20.5 | 82 | usa | amc concord dl |
mpg_data.dropna(inplace=True)
mpg_data['horsepower'] = pd.to_numeric(mpg_data['horsepower'])
Summary of the Data¶
mpg_data.describe()
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | |
---|---|---|---|---|---|---|---|
count | 392.000000 | 392.000000 | 392.000000 | 392.000000 | 392.000000 | 392.000000 | 392.000000 |
mean | 23.445918 | 5.471939 | 194.411990 | 104.469388 | 2977.584184 | 15.541327 | 75.979592 |
std | 7.805007 | 1.705783 | 104.644004 | 38.491160 | 849.402560 | 2.758864 | 3.683737 |
min | 9.000000 | 3.000000 | 68.000000 | 46.000000 | 1613.000000 | 8.000000 | 70.000000 |
25% | 17.000000 | 4.000000 | 105.000000 | 75.000000 | 2225.250000 | 13.775000 | 73.000000 |
50% | 22.750000 | 4.000000 | 151.000000 | 93.500000 | 2803.500000 | 15.500000 | 76.000000 |
75% | 29.000000 | 8.000000 | 275.750000 | 126.000000 | 3614.750000 | 17.025000 | 79.000000 |
max | 46.600000 | 8.000000 | 455.000000 | 230.000000 | 5140.000000 | 24.800000 | 82.000000 |
mpg_data.loc[mpg_data['horsepower'] == mpg_data['horsepower'].max()]
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
116 | 16.0 | 8 | 400.0 | 230.0 | 4278 | 9.5 | 73 | usa | pontiac grand prix |
mpg_data.loc[mpg_data['acceleration'] == mpg_data['acceleration'].max()]
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
299 | 27.2 | 4 | 141.0 | 71.0 | 3190 | 24.8 | 79 | europe | peugeot 504 |
mpg_data.loc[mpg_data['mpg'] == mpg_data['mpg'].max()]
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
322 | 46.6 | 4 | 86.0 | 65.0 | 2110 | 17.9 | 80 | japan | mazda glc |
mpg_data.loc[mpg_data['mpg'] == mpg_data['mpg'].min()]
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
28 | 9.0 | 8 | 304.0 | 193.0 | 4732 | 18.5 | 70 | usa | hi 1200d |
mpg_data.loc[mpg_data['cylinders']==3]
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
71 | 19.0 | 3 | 70.0 | 97.0 | 2330 | 13.5 | 72 | japan | mazda rx2 coupe |
111 | 18.0 | 3 | 70.0 | 90.0 | 2124 | 13.5 | 73 | japan | maxda rx3 |
243 | 21.5 | 3 | 80.0 | 110.0 | 2720 | 13.5 | 77 | japan | mazda rx-4 |
334 | 23.7 | 3 | 70.0 | 100.0 | 2420 | 12.5 | 80 | japan | mazda rx-7 gs |
mpg_data.loc[mpg_data['cylinders']==3].sort_values(by= 'horsepower',ascending=False).iloc[0]
mpg 21.5 cylinders 3 displacement 80.0 horsepower 110.0 weight 2720 acceleration 13.5 model_year 77 origin japan name mazda rx-4 Name: 243, dtype: object
mpg_data.loc[mpg_data['cylinders']==4].sort_values(by= 'horsepower',ascending=False).iloc[0]
mpg 25.0 cylinders 4 displacement 121.0 horsepower 115.0 weight 2671 acceleration 13.5 model_year 75 origin europe name saab 99le Name: 180, dtype: object
mpg_data.loc[mpg_data['cylinders']==5].sort_values(by= 'horsepower',ascending=False).iloc[0]
mpg 20.3 cylinders 5 displacement 131.0 horsepower 103.0 weight 2830 acceleration 15.9 model_year 78 origin europe name audi 5000 Name: 274, dtype: object
mpg_data.loc[mpg_data['cylinders']==6].sort_values(by= 'horsepower',ascending=False).iloc[0]
mpg 17.7 cylinders 6 displacement 231.0 horsepower 165.0 weight 3445 acceleration 13.4 model_year 78 origin usa name buick regal sport coupe (turbo) Name: 263, dtype: object
mpg_data.loc[mpg_data['cylinders']==8].sort_values(by= 'horsepower',ascending=False).iloc[0]
mpg 16.0 cylinders 8 displacement 400.0 horsepower 230.0 weight 4278 acceleration 9.5 model_year 73 origin usa name pontiac grand prix Name: 116, dtype: object
from plotly.subplots import make_subplots
from skimage import io
path = "../assets/"
images = ['1977_mazda_rx-4.jpg', '1980_mazda_rx-7.jpg', 'audi_5000.jpg',
'buick_regal_sport_coupe.jpg', 'harvester_intl_1200D.png',
'mazda_glc.png', 'pontiac_GP.jpg', 'pugeot_504.png', 'saab_99le.jpg']
names = ['Powerful 3 cylinder car','Newer gen 3 cylinder car','Powerful 5 cylinder car',
'Powerful 6 cylinder car','Least fuel efficient in the dataset',
'Most fuel efficient in the dataset','Highest Horsepower in the dataset',
'Highest acceleration in the dataset','Powerful 4 cylinder car'
]
fig = make_subplots(rows=5, cols=2)
subplots = [(1,1),(1,2),(2,1),(2,2),(3,1),(3,2),(4,1),(4,2),(5,1),(5,2)]
for i in range(len(images)):
row,cols=subplots[i]
img = io.imread(path + images[i])
fig.add_trace(go.Image(z=img), row=row, col=cols)
fig.update_xaxes(title_text = f"{names[i]} : {images[i].split('.')[0]}", row= row, col=cols, showticklabels=False)
fig.update_yaxes(showticklabels=False)
# Layout
fig.update_layout(title_text="Cars of different Categories", height=int(1920*0.75), width=int(1440*0.75))
fig.write_image("images/plt_1.png")
Univariate Analysis¶
Graphs and Visualizations¶
fig = px.histogram(data_frame=mpg_data, x='cylinders')
fig.write_image("images/plt_2.png")
We can see from this chart that most Cars are 4, 6 or 8 cylinders. This might be due to even numbers of cylinders providing better manufacturing cost, efficiency or performance, due to which car manufacturers choose this configuration.
A quick search gives us a good overview: why-arent-there-seven-cylinder-engines. It turns out engines with odd configurations are rather unstable and have vibrations, making even configurations preferable for balance and smoothness.
What we can also observe is that we have higher number of 4 cylinder cars in our data. One hypothesis might be that 4 cylinder cars are more fuel efficient. On the contrary it can also be that while 6 and 8 are more fuel efficient, they might be more expensive and hence less sought ought by average buyers.
One final consideration to this line of thinking should be that the data collection might be uneven, giving us this particular distribution of cars; making both of our hypothesis completely invalid.
fig = px.box(data_frame=mpg_data.sort_values(by='cylinders'), y=['horsepower'], facet_col='cylinders')
fig.write_image("images/plt_3.png")
fig = px.box(data_frame=mpg_data.sort_values(by='cylinders'), y=['mpg'], facet_col='cylinders')
fig.write_image("images/plt_4.png")
Inspecting Outliers¶
mpg_data[mpg_data.horsepower == 165 ]
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
1 | 15.0 | 8 | 350.0 | 165.0 | 3693 | 11.5 | 70 | usa | buick skylark 320 |
38 | 14.0 | 8 | 350.0 | 165.0 | 4209 | 12.0 | 71 | usa | chevrolet impala |
62 | 13.0 | 8 | 350.0 | 165.0 | 4274 | 12.0 | 72 | usa | chevrolet impala |
263 | 17.7 | 6 | 231.0 | 165.0 | 3445 | 13.4 | 78 | usa | buick regal sport coupe (turbo) |
mpg_data[mpg_data.horsepower == 230 ]
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
116 | 16.0 | 8 | 400.0 | 230.0 | 4278 | 9.5 | 73 | usa | pontiac grand prix |
mpg_data[(mpg_data.mpg == 38) & (mpg_data.cylinders == 6)]
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
387 | 38.0 | 6 | 262.0 | 85.0 | 3015 | 17.0 | 82 | usa | oldsmobile cutlass ciera (diesel) |
mpg_data[(mpg_data.mpg == 26.6) & (mpg_data.cylinders == 8)]
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
364 | 26.6 | 8 | 350.0 | 105.0 | 3725 | 19.0 | 81 | usa | oldsmobile cutlass ls |
Observations¶
From these two graphs we can have some insights as to how:
- 4 cylinder cars have lower horsepower but are more fuel efficient.
- 6 cylinder cars seems to have a balanced horsepower and fuel consumption.
- On the contrary, 8 cylinder cars have greater horsepower but are quite fuel hungry in general.
All this seems to be in line with the hypothesis that the 3 and 5 cylinder engines are not very performant, giving relatively low horsepower over average fuel efficiency. The 4 cylinder cars seem to have a good balance between horsepower and miles per gallon.
Spread and Outliers¶
There is a varied mpg and horsepower observed in 8 cylinder cars meaning there are a regular cars and then there are muscle cars giving a greater range of mpgs and horsepowers.
There are some outliers is both mpg and horsepower graphs.
Muscle Cars¶
The 6 cylinder buick regal sport coupe (turbo) is a muscle car giving a very high horsepower and comes in line with other average 8 cylinder cars. Similarly the 230hp pontiac grand prix is a 8 cylinder beast of a muscle car which explains the very high horsepower.
Efficient Cars¶
There are certain economy or diesel version of cars that make them exceptionally fuel efficient like the 6 cylinder oldsmobile cutlass ciera (diesel) or the 8 cylinder oldsmobile cutlass ls
To get a more complete picture, we need to perform further analysis. Let us hence try to gather more insights from our data regarding engine performance with respect to number of cylinders, displacement, weight and country of origin.
fig = px.histogram(data_frame=mpg_data, y = 'origin', color='origin')
fig.write_image("images/plt_5.png")
Seems like most cars in our dataset are USA based and there are only two other origins i.e. Japan and Europe. We can now analyze our cars categoirically based on the countries. Let us see what differences cars of each country possess.
fig = px.histogram(data_frame=mpg_data, x = 'model_year', facet_col='origin')
fig.write_image("images/plt_6.png")
fig = px.box(data_frame=mpg_data, y = ['mpg', 'acceleration'], facet_col='origin')
fig.write_image("images/plt_7.png")
fig = px.box(data_frame=mpg_data, y = 'horsepower', facet_col='origin')
fig.write_image("images/plt_8.png")
mpg_data.groupby('origin').agg(mean_horsepower = ('horsepower','mean')).reset_index()
origin | mean_horsepower | |
---|---|---|
0 | europe | 80.558824 |
1 | japan | 79.835443 |
2 | usa | 119.048980 |
This shows us that Japanese cars are the most fuel efficient while USA has on average, more powerful cars. Also worth noting is that USA has a higher fence for hp meaning that there are car variants that are very powerful (muscle cars) which we see lacking in other countires.
Bivariate analysis¶
fig = px.scatter(data_frame=mpg_data, x='cylinders', y = 'origin', color= 'origin')
fig.write_image("images/plt_9.png")
In the dataset, US based cars have a higher count of cylinders, while Japanese and European cars have lower cylinders, but innovate in 3 cylinder or 5 cylinder engines. This might be due to racing being more poular and prevalant in the US or due to road networks being better in the US with consumer culture demanding more powerful cars.
#px.box(data_frame=mpg_data, x='mpg')
fig = px.scatter(data_frame=mpg_data, x='model_year', y='mpg',facet_col='origin')
fig.write_image("images/plt_10.png")
avg_group = pd.DataFrame()
avg_group = mpg_data.groupby(by='model_year').agg(average_mpg = ('mpg', 'mean'))
avg_group.reset_index()
model_year | average_mpg | |
---|---|---|
0 | 70 | 17.689655 |
1 | 71 | 21.111111 |
2 | 72 | 18.714286 |
3 | 73 | 17.100000 |
4 | 74 | 22.769231 |
5 | 75 | 20.266667 |
6 | 76 | 21.573529 |
7 | 77 | 23.375000 |
8 | 78 | 24.061111 |
9 | 79 | 25.093103 |
10 | 80 | 33.803704 |
11 | 81 | 30.185714 |
12 | 82 | 32.000000 |
A trend of mpg getting better across the years can be seen in all the countries.
fig = px.line(data_frame=avg_group, y='average_mpg')
fig.write_image("images/plt_11.png")
Getting the average mpg for models that came out each year across all countries shows that there is actually a steady increase in the fuel efficiency across the years.
Company names¶
Getting names of company to see the Distribution of Car according to company
def get_first_name(x):
full_name= x.split(' ')
company = full_name[0]
return company
mpg_data['company']= mpg_data['name'].apply(lambda x: get_first_name(x))
company_hist = px.histogram(data_frame= mpg_data, x='company')
company_hist.write_image("images/plt_12.png")
country_df= mpg_data.groupby('company').agg(mean_mpg = ('mpg','mean'))
country_df.reset_index().sort_values(by='mean_mpg', ascending=False).head()
company | mean_mpg | |
---|---|---|
36 | vw | 39.016667 |
21 | nissan | 36.000000 |
32 | triumph | 35.000000 |
15 | honda | 33.761538 |
10 | datsun | 31.113043 |
country_df= mpg_data.groupby('company').agg(mean_horsepower = ('horsepower','mean'))
country_df.reset_index().sort_values(by='mean_horsepower', ascending=False).head()
company | mean_horsepower | |
---|---|---|
14 | hi | 193.000000 |
9 | chrysler | 153.666667 |
4 | cadillac | 152.500000 |
8 | chevy | 142.333333 |
26 | pontiac | 136.937500 |
Bivariate Analysis of Numerical Data¶
To gain proper insigts and explore, I wrote code to plot all values vs all other values.
from plotly.subplots import make_subplots
col_names = ['mpg','cylinders','displacement','horsepower','weight','acceleration','model_year']
graph_names = [f'{col_names[0].upper()} vs {x.upper()}' for x in col_names[1:]]
print(graph_names)
fig = make_subplots(rows=3,cols=2)
subplots = [(1,1),(1,2),(2,1),(2,2),(3,1),(3,2)]
for i in range(1,len(col_names)):
row,col = subplots[i-1]
fig.add_trace(
go.Scatter(x=mpg_data[col_names[0]], y=mpg_data[col_names[i]], mode='markers',name=graph_names[i-1]),
row=row, col=col
)
fig.update_xaxes(title_text= col_names[0],row=row,col=col)
fig.update_yaxes(title_text= col_names[i],row=row,col=col)
fig.update_layout(height=720, width=1080, title_text=f"{col_names[0]} vs others")
fig.write_image("images/plt_13.png")
['MPG vs CYLINDERS', 'MPG vs DISPLACEMENT', 'MPG vs HORSEPOWER', 'MPG vs WEIGHT', 'MPG vs ACCELERATION', 'MPG vs MODEL_YEAR']
Observations: mpg vs others¶
From these graphs, we can make a number of Observations.
- A trend that shows more number of cylinders reduces fuel efficiency.
- Higher horsepower,displacement and weight show lower fuel efficiency.
- Acceleration and mpg do not have a obvious relationship due to a highly scattered plot. We do see a proportional trend, which is quite counter intuitive.
- As the years progress, cars are getting more fuel efficient.
from plotly.subplots import make_subplots
col_names = ['cylinders','displacement','horsepower','weight','acceleration','model_year']
graph_names = [f'{col_names[0].upper()} vs {x.upper()}' for x in col_names[1:]]
print(graph_names)
fig = make_subplots(rows=3,cols=2)
subplots = [(1,1),(1,2),(2,1),(2,2),(3,1),(3,2)]
for i in range(1,len(col_names)):
row,col = subplots[i-1]
fig.add_trace(
go.Scatter(y=mpg_data[col_names[0]], x=mpg_data[col_names[i]], mode='markers',name=graph_names[i-1]),
row=row, col=col
)
fig.update_yaxes(title_text= col_names[0],row=row,col=col)
fig.update_xaxes(title_text= col_names[i],row=row,col=col)
fig.update_layout(height=720, width=1080, title_text=f"{col_names[0]} vs others")
fig.write_image("images/plt_14.png")
['CYLINDERS vs DISPLACEMENT', 'CYLINDERS vs HORSEPOWER', 'CYLINDERS vs WEIGHT', 'CYLINDERS vs ACCELERATION', 'CYLINDERS vs MODEL_YEAR']
Observations: Cylinders vs others¶
We can see that more number of cylinder means higher horse power, more displacement and also a heavier engine.
There is no clear relationship between the model year and the number of cylinders, indicating that cars with varying engine types were produced across all years.
Checking out the 4 cylinder engines, we observe that they give great fuel efficiency with light engines but with lower horse power. This could also mean they are both cheaper to manufacture and to use. This could explain why there are greater number of cars with 4 cylinder engines produced.
We can not however, say for sure that these 4 cylinder cars are the most poular or most bought cars though. For that, we would need the sales data of these models to be certain.
6 cylinder engines have a good balance of efficiency and horsepower. 8 cylinder engines pack a punch with higher horsepower but guzzle a lot of fuel. The 8 cylinder engines are some of the most powerful engines with very high horsepower. This makes sense as more cylinders will displace more fuel, producing more power but meaning lower fuel efficiency. At the same time bigger and heavier engines are needed to accomodate more number of cylinders.
from plotly.subplots import make_subplots
col_names = ['displacement','horsepower','weight','acceleration','model_year']
graph_names = [f'{col_names[0].upper()} vs {x.upper()}' for x in col_names[1:]]
print(graph_names)
fig = make_subplots(rows=2,cols=2)
subplots = [(1,1),(1,2),(2,1),(2,2)]
for i in range(1,len(col_names)):
row,col = subplots[i-1]
fig.add_trace(
go.Scatter(x=mpg_data[col_names[0]], y=mpg_data[col_names[i]], mode='markers',name=graph_names[i-1]),
row=row, col=col
)
fig.update_xaxes(title_text= col_names[0],row=row,col=col)
fig.update_yaxes(title_text= col_names[i],row=row,col=col)
fig.update_layout(height=720, width=1080, title_text=f"{col_names[0]} vs others")
fig.write_image("images/plt_15.png")
['DISPLACEMENT vs HORSEPOWER', 'DISPLACEMENT vs WEIGHT', 'DISPLACEMENT vs ACCELERATION', 'DISPLACEMENT vs MODEL_YEAR']
from plotly.subplots import make_subplots
col_names = ['horsepower','weight','acceleration','model_year']
graph_names = [f'{col_names[0].upper()} vs {x.upper()}' for x in col_names[1:]]
print(graph_names)
fig = make_subplots(rows=2,cols=2)
subplots = [(1,1),(1,2),(2,1),(2,2)]
for i in range(1,len(col_names)):
row,col = subplots[i-1]
fig.add_trace(
go.Scatter(x=mpg_data[col_names[0]], y=mpg_data[col_names[i]], mode='markers',name=graph_names[i-1]),
row=row, col=col
)
fig.update_xaxes(title_text= col_names[0],row=row,col=col)
fig.update_yaxes(title_text= col_names[i],row=row,col=col)
fig.update_layout(height=720, width=1080, title_text=f"{col_names[0]} vs others")
fig.write_image("images/plt_16.png")
['HORSEPOWER vs WEIGHT', 'HORSEPOWER vs ACCELERATION', 'HORSEPOWER vs MODEL_YEAR']
Observation: horsepower vs others & displacement vs others¶
We can see the relationships between weight, displacement and horsepower, i.e
more displacement → more horsepower
more displacement → more weight
meaning higher horsepower engines are heavier like we observed before.
The relationshoip between acceleration and displacement and acceleration and horsepower is negatively proportional though. This is rather counter intuitive as you might think cars having higher horsepower/displacement should provide higher acceleration as well. But that is not the case at all.
What we can also see is that newer models have lower horsepower.
To make more sense out of this we can create a correlation heatmap and perform further multivariate analysis.
CORRELATION HEATMAP¶
correlation = mpg_data.select_dtypes('number').corr('pearson')
correlation
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | |
---|---|---|---|---|---|---|---|
mpg | 1.000000 | -0.777618 | -0.805127 | -0.778427 | -0.832244 | 0.423329 | 0.580541 |
cylinders | -0.777618 | 1.000000 | 0.950823 | 0.842983 | 0.897527 | -0.504683 | -0.345647 |
displacement | -0.805127 | 0.950823 | 1.000000 | 0.897257 | 0.932994 | -0.543800 | -0.369855 |
horsepower | -0.778427 | 0.842983 | 0.897257 | 1.000000 | 0.864538 | -0.689196 | -0.416361 |
weight | -0.832244 | 0.897527 | 0.932994 | 0.864538 | 1.000000 | -0.416839 | -0.309120 |
acceleration | 0.423329 | -0.504683 | -0.543800 | -0.689196 | -0.416839 | 1.000000 | 0.290316 |
model_year | 0.580541 | -0.345647 | -0.369855 | -0.416361 | -0.309120 | 0.290316 | 1.000000 |
fig = px.imshow(correlation, text_auto=True, color_continuous_scale='thermal', aspect='auto')
fig.write_image("/images/plt_17.png")
The correlation heatmaps confirms the relation between cylinder, displacement, horsepower and weight. We also see that these affect mpg negatively as expected.
We can also see that there is a fair relationship between mpg and model years, meaning newer models tend to be fuel efficient.
But then acceleration is negatively proportional to cylinder, displacement, horsepower and weight, which is still unexplained. Whats more, there is a weakly positive relation between acceleration and mpg meaning higher acceleration gives better mileage, which is not logical.
fig= px.scatter(data_frame=mpg_data, x='horsepower', y='mpg', color= 'model_year')
fig.write_image("images/plt_18.png")
From this plot we can confirm that newer models are more efficient than the older ones. However, we can see that the newer models also have lower horsepower.
fig= px.scatter(data_frame=mpg_data, x='weight', y='horsepower', color= 'model_year')
fig.write_image("images/plt_19.png")
With this graph we find that heavier models are all older models, which also have a lot of horse power. The newer models are not only fuel efficient but are lighter and provide lower power output.
This raises a questions as why the companies would build such cars with low power and weight but better fuel efficiency?
fig= px.scatter(data_frame=mpg_data, x='horsepower', y='acceleration', color= 'model_year')
fig.write_image("images/plt_20.png")
This tells us almost all new models have low horse power but high acceleration. Even though there were some older models that had low horsepower/high acceleration.
px.scatter(data_frame=mpg_data, x='weight', y='acceleration', color= 'horsepower', color_continuous_scale='temps')
fig.write_image("images/plt_21.png")
Higher horsepower cars are almost always rather heavy with lower acceleration.
cyl = mpg_data.groupby('model_year').aggregate(avg_no_of_cylinders = ('cylinders','mean'), avg_mpg=('mpg','mean') ).reset_index()
fig = px.scatter(data_frame=cyl, x='model_year', y='avg_no_of_cylinders', color='avg_mpg')
fig.write_image("images/plt_22.png")
Newer cars have fewer no of cylinders on average.
These final graphs give us answers to a few questions we were asking.
Conclusions¶
Why would acceleration increase when horsepower is decreasing?¶
Newer cars have higher acceleration despite lower horsepower. And along with good acceleration, these cars are also fuel efficient.
Turns out the acceleration in cars are increasing beacuse cars are getting lighter, requiring less power to accelerate.
Why did the companies start making lower power, fuel efficient cars?¶
This data does make it clear that cars do not only need raw horsepower. They can be fast and fuel efficient even with lower horsepower. With fuel emmisions being a major concern and major technological breakthroughs, it is compeletely sensible now that the companies chose to make their cars this way. What is more that the cars could be cheaper with less materials being used and the car would be less expensive to operate for the customers as well all while being better for the environment.
There might be several factors that the data does not capture. The newer cars would have better technology like lighter materials, aerodynamics, better fuel composition and better engineering as well. But the fact is that car companies did make lighter, fuel efficient cars earlier too; only that they chose to keep making newer cars that were lower power and less fuel hungry.
Why are there so many 4 cylinder cars?¶
This falls in line with the fact that the newer models have on average lower number of cylinders. This also makes us understand as why there are more number of 4 cylinder cars which are the lightest and most fuel efficient, with better stability, lower vibration and possibly the cost.