In [96]:

Copied!





import pandas as pd
import plotly as plt
import plotly.graph_objects as go
import plotly.express as px
import kaleido
import pandas as pd
import plotly as plt
import plotly.graph_objects as go
import plotly.express as px
import kaleido

Data Visualization of seaborn-data¶

Analyzing mpg.csv data from seaborn-data dataset: https://github.com/mwaskom/seaborn-data/blob/master/mpg.csv

I am using Plotly and Streamlit for this task.

In [97]:

Copied!

mpg_data = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/refs/heads/master/mpg.csv')
# or read from data/mpg.csv
mpg_data.head(10)
mpg_data = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/refs/heads/master/mpg.csv')
# or read from data/mpg.csv
mpg_data.head(10)

Out[97]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
0	18.0	8	307.0	130.0	3504	12.0	70	usa	chevrolet chevelle malibu
1	15.0	8	350.0	165.0	3693	11.5	70	usa	buick skylark 320
2	18.0	8	318.0	150.0	3436	11.0	70	usa	plymouth satellite
3	16.0	8	304.0	150.0	3433	12.0	70	usa	amc rebel sst
4	17.0	8	302.0	140.0	3449	10.5	70	usa	ford torino
5	15.0	8	429.0	198.0	4341	10.0	70	usa	ford galaxie 500
6	14.0	8	454.0	220.0	4354	9.0	70	usa	chevrolet impala
7	14.0	8	440.0	215.0	4312	8.5	70	usa	plymouth fury iii
8	14.0	8	455.0	225.0	4425	10.0	70	usa	pontiac catalina
9	15.0	8	390.0	190.0	3850	8.5	70	usa	amc ambassador dpl

Initial Observation¶

The car dataset has many models of cars produced across different years. The origin and engine performance metrics are listed out.

Reading and cleaning dataset¶

Check for missing values¶

In [98]:

Copied!

mpg_data.isna().sum()
mpg_data.isna().sum()

Out[98]:

mpg             0
cylinders       0
displacement    0
horsepower      6
weight          0
acceleration    0
model_year      0
origin          0
name            0
dtype: int64

In [99]:

Copied!

mpg_data[mpg_data.isna().any(axis=1)]
mpg_data[mpg_data.isna().any(axis=1)]

Out[99]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
32	25.0	4	98.0	NaN	2046	19.0	71	usa	ford pinto
126	21.0	6	200.0	NaN	2875	17.0	74	usa	ford maverick
330	40.9	4	85.0	NaN	1835	17.3	80	europe	renault lecar deluxe
336	23.6	4	140.0	NaN	2905	14.3	80	usa	ford mustang cobra
354	34.5	4	100.0	NaN	2320	15.8	81	europe	renault 18i
374	23.0	4	151.0	NaN	3035	20.5	82	usa	amc concord dl

In [100]:

Copied!

mpg_data.dropna(inplace=True)
mpg_data.dropna(inplace=True)

In [101]:

Copied!

mpg_data['horsepower'] = pd.to_numeric(mpg_data['horsepower'])
mpg_data['horsepower'] = pd.to_numeric(mpg_data['horsepower'])

Summary of the Data¶

In [102]:

Copied!

mpg_data.describe()
mpg_data.describe()

Out[102]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year
count	392.000000	392.000000	392.000000	392.000000	392.000000	392.000000	392.000000
mean	23.445918	5.471939	194.411990	104.469388	2977.584184	15.541327	75.979592
std	7.805007	1.705783	104.644004	38.491160	849.402560	2.758864	3.683737
min	9.000000	3.000000	68.000000	46.000000	1613.000000	8.000000	70.000000
25%	17.000000	4.000000	105.000000	75.000000	2225.250000	13.775000	73.000000
50%	22.750000	4.000000	151.000000	93.500000	2803.500000	15.500000	76.000000
75%	29.000000	8.000000	275.750000	126.000000	3614.750000	17.025000	79.000000
max	46.600000	8.000000	455.000000	230.000000	5140.000000	24.800000	82.000000

In [103]:

Copied!

mpg_data.loc[mpg_data['horsepower'] == mpg_data['horsepower'].max()]
mpg_data.loc[mpg_data['horsepower'] == mpg_data['horsepower'].max()]

Out[103]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
116	16.0	8	400.0	230.0	4278	9.5	73	usa	pontiac grand prix

In [104]:

Copied!

mpg_data.loc[mpg_data['acceleration'] == mpg_data['acceleration'].max()]
mpg_data.loc[mpg_data['acceleration'] == mpg_data['acceleration'].max()]

Out[104]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
299	27.2	4	141.0	71.0	3190	24.8	79	europe	peugeot 504

In [105]:

Copied!

mpg_data.loc[mpg_data['mpg'] == mpg_data['mpg'].max()]
mpg_data.loc[mpg_data['mpg'] == mpg_data['mpg'].max()]

Out[105]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
322	46.6	4	86.0	65.0	2110	17.9	80	japan	mazda glc

In [106]:

Copied!

mpg_data.loc[mpg_data['mpg'] == mpg_data['mpg'].min()]
mpg_data.loc[mpg_data['mpg'] == mpg_data['mpg'].min()]

Out[106]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
28	9.0	8	304.0	193.0	4732	18.5	70	usa	hi 1200d

In [107]:

Copied!

mpg_data.loc[mpg_data['cylinders']==3]
mpg_data.loc[mpg_data['cylinders']==3]

Out[107]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
71	19.0	3	70.0	97.0	2330	13.5	72	japan	mazda rx2 coupe
111	18.0	3	70.0	90.0	2124	13.5	73	japan	maxda rx3
243	21.5	3	80.0	110.0	2720	13.5	77	japan	mazda rx-4
334	23.7	3	70.0	100.0	2420	12.5	80	japan	mazda rx-7 gs

In [108]:

Copied!

mpg_data.loc[mpg_data['cylinders']==3].sort_values(by= 'horsepower',ascending=False).iloc[0]
mpg_data.loc[mpg_data['cylinders']==3].sort_values(by= 'horsepower',ascending=False).iloc[0]

Out[108]:

mpg                   21.5
cylinders                3
displacement          80.0
horsepower           110.0
weight                2720
acceleration          13.5
model_year              77
origin               japan
name            mazda rx-4
Name: 243, dtype: object

In [109]:

Copied!

mpg_data.loc[mpg_data['cylinders']==4].sort_values(by= 'horsepower',ascending=False).iloc[0]
mpg_data.loc[mpg_data['cylinders']==4].sort_values(by= 'horsepower',ascending=False).iloc[0]

Out[109]:

mpg                  25.0
cylinders               4
displacement        121.0
horsepower          115.0
weight               2671
acceleration         13.5
model_year             75
origin             europe
name            saab 99le
Name: 180, dtype: object

In [110]:

Copied!

mpg_data.loc[mpg_data['cylinders']==5].sort_values(by= 'horsepower',ascending=False).iloc[0]
mpg_data.loc[mpg_data['cylinders']==5].sort_values(by= 'horsepower',ascending=False).iloc[0]

Out[110]:

mpg                  20.3
cylinders               5
displacement        131.0
horsepower          103.0
weight               2830
acceleration         15.9
model_year             78
origin             europe
name            audi 5000
Name: 274, dtype: object

In [111]:

Copied!

mpg_data.loc[mpg_data['cylinders']==6].sort_values(by= 'horsepower',ascending=False).iloc[0]
mpg_data.loc[mpg_data['cylinders']==6].sort_values(by= 'horsepower',ascending=False).iloc[0]

Out[111]:

mpg                                        17.7
cylinders                                     6
displacement                              231.0
horsepower                                165.0
weight                                     3445
acceleration                               13.4
model_year                                   78
origin                                      usa
name            buick regal sport coupe (turbo)
Name: 263, dtype: object

In [112]:

Copied!

mpg_data.loc[mpg_data['cylinders']==8].sort_values(by= 'horsepower',ascending=False).iloc[0]
mpg_data.loc[mpg_data['cylinders']==8].sort_values(by= 'horsepower',ascending=False).iloc[0]

Out[112]:

mpg                           16.0
cylinders                        8
displacement                 400.0
horsepower                   230.0
weight                        4278
acceleration                   9.5
model_year                      73
origin                         usa
name            pontiac grand prix
Name: 116, dtype: object

In [113]:

Copied!





from plotly.subplots import make_subplots
from skimage import io

path = "../assets/"

images = ['1977_mazda_rx-4.jpg', '1980_mazda_rx-7.jpg', 'audi_5000.jpg',
          'buick_regal_sport_coupe.jpg', 'harvester_intl_1200D.png',
          'mazda_glc.png', 'pontiac_GP.jpg', 'pugeot_504.png', 'saab_99le.jpg']

names = ['Powerful 3 cylinder car','Newer gen 3 cylinder car','Powerful 5 cylinder car',
         'Powerful 6 cylinder car','Least fuel efficient in the dataset',
         'Most fuel efficient in the dataset','Highest Horsepower in the dataset',
         'Highest acceleration in the dataset','Powerful 4 cylinder car'
        ]
fig = make_subplots(rows=5, cols=2)
subplots =  [(1,1),(1,2),(2,1),(2,2),(3,1),(3,2),(4,1),(4,2),(5,1),(5,2)] 

for i in range(len(images)):
    row,cols=subplots[i]
    img = io.imread(path + images[i])
    fig.add_trace(go.Image(z=img), row=row, col=cols)
    fig.update_xaxes(title_text = f"{names[i]} : {images[i].split('.')[0]}", row= row, col=cols, showticklabels=False)
    fig.update_yaxes(showticklabels=False)
# Layout
fig.update_layout(title_text="Cars of different Categories", height=int(1920*0.75), width=int(1440*0.75))
fig.write_image("images/plt_1.png")
from plotly.subplots import make_subplots
from skimage import io

path = "../assets/"

images = ['1977_mazda_rx-4.jpg', '1980_mazda_rx-7.jpg', 'audi_5000.jpg',
          'buick_regal_sport_coupe.jpg', 'harvester_intl_1200D.png',
          'mazda_glc.png', 'pontiac_GP.jpg', 'pugeot_504.png', 'saab_99le.jpg']

names = ['Powerful 3 cylinder car','Newer gen 3 cylinder car','Powerful 5 cylinder car',
         'Powerful 6 cylinder car','Least fuel efficient in the dataset',
         'Most fuel efficient in the dataset','Highest Horsepower in the dataset',
         'Highest acceleration in the dataset','Powerful 4 cylinder car'
        ]
fig = make_subplots(rows=5, cols=2)
subplots =  [(1,1),(1,2),(2,1),(2,2),(3,1),(3,2),(4,1),(4,2),(5,1),(5,2)] 

for i in range(len(images)):
    row,cols=subplots[i]
    img = io.imread(path + images[i])
    fig.add_trace(go.Image(z=img), row=row, col=cols)
    fig.update_xaxes(title_text = f"{names[i]} : {images[i].split('.')[0]}", row= row, col=cols, showticklabels=False)
    fig.update_yaxes(showticklabels=False)
# Layout
fig.update_layout(title_text="Cars of different Categories", height=int(1920*0.75), width=int(1440*0.75))
fig.write_image("images/plt_1.png")

Plot 1

Univariate Analysis¶

Graphs and Visualizations¶

In [114]:

Copied!

fig = px.histogram(data_frame=mpg_data, x='cylinders')
fig.write_image("images/plt_2.png")
fig = px.histogram(data_frame=mpg_data, x='cylinders')
fig.write_image("images/plt_2.png")

Plot 2

We can see from this chart that most Cars are 4, 6 or 8 cylinders. This might be due to even numbers of cylinders providing better manufacturing cost, efficiency or performance, due to which car manufacturers choose this configuration.

A quick search gives us a good overview: why-arent-there-seven-cylinder-engines. It turns out engines with odd configurations are rather unstable and have vibrations, making even configurations preferable for balance and smoothness.

What we can also observe is that we have higher number of 4 cylinder cars in our data. One hypothesis might be that 4 cylinder cars are more fuel efficient. On the contrary it can also be that while 6 and 8 are more fuel efficient, they might be more expensive and hence less sought ought by average buyers.

One final consideration to this line of thinking should be that the data collection might be uneven, giving us this particular distribution of cars; making both of our hypothesis completely invalid.

In [115]:

Copied!

fig = px.box(data_frame=mpg_data.sort_values(by='cylinders'), y=['horsepower'], facet_col='cylinders')
fig.write_image("images/plt_3.png")

fig = px.box(data_frame=mpg_data.sort_values(by='cylinders'), y=['horsepower'], facet_col='cylinders')
fig.write_image("images/plt_3.png")

Plot 3

In [116]:

Copied!

fig = px.box(data_frame=mpg_data.sort_values(by='cylinders'), y=['mpg'], facet_col='cylinders')
fig.write_image("images/plt_4.png")

fig = px.box(data_frame=mpg_data.sort_values(by='cylinders'), y=['mpg'], facet_col='cylinders')
fig.write_image("images/plt_4.png")

Plot 4

Inspecting Outliers¶

In [117]:

Copied!

mpg_data[mpg_data.horsepower == 165 ]
mpg_data[mpg_data.horsepower == 165 ]

Out[117]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
1	15.0	8	350.0	165.0	3693	11.5	70	usa	buick skylark 320
38	14.0	8	350.0	165.0	4209	12.0	71	usa	chevrolet impala
62	13.0	8	350.0	165.0	4274	12.0	72	usa	chevrolet impala
263	17.7	6	231.0	165.0	3445	13.4	78	usa	buick regal sport coupe (turbo)

In [118]:

Copied!

mpg_data[mpg_data.horsepower == 230 ]
mpg_data[mpg_data.horsepower == 230 ]

Out[118]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
116	16.0	8	400.0	230.0	4278	9.5	73	usa	pontiac grand prix

In [119]:

Copied!

mpg_data[(mpg_data.mpg == 38) & (mpg_data.cylinders == 6)]
mpg_data[(mpg_data.mpg == 38) & (mpg_data.cylinders == 6)]

Out[119]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
387	38.0	6	262.0	85.0	3015	17.0	82	usa	oldsmobile cutlass ciera (diesel)

In [120]:

Copied!

mpg_data[(mpg_data.mpg == 26.6) & (mpg_data.cylinders == 8)]
mpg_data[(mpg_data.mpg == 26.6) & (mpg_data.cylinders == 8)]

Out[120]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year	origin	name
364	26.6	8	350.0	105.0	3725	19.0	81	usa	oldsmobile cutlass ls

Observations¶

From these two graphs we can have some insights as to how:

4 cylinder cars have lower horsepower but are more fuel efficient.
6 cylinder cars seems to have a balanced horsepower and fuel consumption.
On the contrary, 8 cylinder cars have greater horsepower but are quite fuel hungry in general.

All this seems to be in line with the hypothesis that the 3 and 5 cylinder engines are not very performant, giving relatively low horsepower over average fuel efficiency. The 4 cylinder cars seem to have a good balance between horsepower and miles per gallon.

Spread and Outliers¶

There is a varied mpg and horsepower observed in 8 cylinder cars meaning there are a regular cars and then there are muscle cars giving a greater range of mpgs and horsepowers.

There are some outliers is both mpg and horsepower graphs.

Muscle Cars¶

The 6 cylinder buick regal sport coupe (turbo) is a muscle car giving a very high horsepower and comes in line with other average 8 cylinder cars. Similarly the 230hp pontiac grand prix is a 8 cylinder beast of a muscle car which explains the very high horsepower.

Efficient Cars¶

There are certain economy or diesel version of cars that make them exceptionally fuel efficient like the 6 cylinder oldsmobile cutlass ciera (diesel) or the 8 cylinder oldsmobile cutlass ls

To get a more complete picture, we need to perform further analysis. Let us hence try to gather more insights from our data regarding engine performance with respect to number of cylinders, displacement, weight and country of origin.

In [121]:

Copied!

fig = px.histogram(data_frame=mpg_data, y = 'origin', color='origin')
fig.write_image("images/plt_5.png")
fig = px.histogram(data_frame=mpg_data, y = 'origin', color='origin')
fig.write_image("images/plt_5.png")

Plot 5

Seems like most cars in our dataset are USA based and there are only two other origins i.e. Japan and Europe. We can now analyze our cars categoirically based on the countries. Let us see what differences cars of each country possess.

In [122]:

Copied!

fig = px.histogram(data_frame=mpg_data, x = 'model_year', facet_col='origin')
fig.write_image("images/plt_6.png")
fig = px.histogram(data_frame=mpg_data, x = 'model_year', facet_col='origin')
fig.write_image("images/plt_6.png")

Plot 6

In [123]:

Copied!

fig = px.box(data_frame=mpg_data, y = ['mpg', 'acceleration'], facet_col='origin')
fig.write_image("images/plt_7.png")
fig = px.box(data_frame=mpg_data, y = ['mpg', 'acceleration'], facet_col='origin')
fig.write_image("images/plt_7.png")

Plot 7

In [124]:

Copied!

fig = px.box(data_frame=mpg_data, y = 'horsepower', facet_col='origin')
fig.write_image("images/plt_8.png")
fig = px.box(data_frame=mpg_data, y = 'horsepower', facet_col='origin')
fig.write_image("images/plt_8.png")

Plot 3

In [125]:

Copied!

mpg_data.groupby('origin').agg(mean_horsepower = ('horsepower','mean')).reset_index()
mpg_data.groupby('origin').agg(mean_horsepower = ('horsepower','mean')).reset_index()

Out[125]:

	origin	mean_horsepower
0	europe	80.558824
1	japan	79.835443
2	usa	119.048980

This shows us that Japanese cars are the most fuel efficient while USA has on average, more powerful cars. Also worth noting is that USA has a higher fence for hp meaning that there are car variants that are very powerful (muscle cars) which we see lacking in other countires.

Bivariate analysis¶

In [126]:

Copied!

fig = px.scatter(data_frame=mpg_data, x='cylinders', y = 'origin', color= 'origin')
fig.write_image("images/plt_9.png")
fig = px.scatter(data_frame=mpg_data, x='cylinders', y = 'origin', color= 'origin')
fig.write_image("images/plt_9.png")

Plot 9

In the dataset, US based cars have a higher count of cylinders, while Japanese and European cars have lower cylinders, but innovate in 3 cylinder or 5 cylinder engines. This might be due to racing being more poular and prevalant in the US or due to road networks being better in the US with consumer culture demanding more powerful cars.

In [127]:

Copied!

#px.box(data_frame=mpg_data, x='mpg')
fig = px.scatter(data_frame=mpg_data, x='model_year', y='mpg',facet_col='origin')
fig.write_image("images/plt_10.png")
#px.box(data_frame=mpg_data, x='mpg')
fig = px.scatter(data_frame=mpg_data, x='model_year', y='mpg',facet_col='origin')
fig.write_image("images/plt_10.png")

Plot 10

In [128]:

Copied!

avg_group = pd.DataFrame()
avg_group = mpg_data.groupby(by='model_year').agg(average_mpg = ('mpg', 'mean'))
avg_group.reset_index()
avg_group = pd.DataFrame()
avg_group = mpg_data.groupby(by='model_year').agg(average_mpg = ('mpg', 'mean'))
avg_group.reset_index()

Out[128]:

	model_year	average_mpg
0	70	17.689655
1	71	21.111111
2	72	18.714286
3	73	17.100000
4	74	22.769231
5	75	20.266667
6	76	21.573529
7	77	23.375000
8	78	24.061111
9	79	25.093103
10	80	33.803704
11	81	30.185714
12	82	32.000000

A trend of mpg getting better across the years can be seen in all the countries.

In [148]:

Copied!

fig = px.line(data_frame=avg_group,  y='average_mpg')
fig.write_image("images/plt_11.png")
fig = px.line(data_frame=avg_group,  y='average_mpg')
fig.write_image("images/plt_11.png")

Plot 11

Getting the average mpg for models that came out each year across all countries shows that there is actually a steady increase in the fuel efficiency across the years.

Company names¶

Getting names of company to see the Distribution of Car according to company

In [130]:

Copied!





def get_first_name(x):
    full_name= x.split(' ')
    company = full_name[0]
    return company

mpg_data['company']= mpg_data['name'].apply(lambda x: get_first_name(x))
def get_first_name(x):
    full_name= x.split(' ')
    company = full_name[0]
    return company

mpg_data['company']= mpg_data['name'].apply(lambda x: get_first_name(x))

In [131]:

Copied!

company_hist = px.histogram(data_frame= mpg_data, x='company')
company_hist.write_image("images/plt_12.png")
company_hist = px.histogram(data_frame= mpg_data, x='company')
company_hist.write_image("images/plt_12.png")

Plot 12

In [132]:

Copied!

country_df= mpg_data.groupby('company').agg(mean_mpg = ('mpg','mean'))
country_df.reset_index().sort_values(by='mean_mpg', ascending=False).head()
country_df= mpg_data.groupby('company').agg(mean_mpg = ('mpg','mean'))
country_df.reset_index().sort_values(by='mean_mpg', ascending=False).head()

Out[132]:

	company	mean_mpg
36	vw	39.016667
21	nissan	36.000000
32	triumph	35.000000
15	honda	33.761538
10	datsun	31.113043

In [133]:

Copied!

country_df= mpg_data.groupby('company').agg(mean_horsepower = ('horsepower','mean'))
country_df.reset_index().sort_values(by='mean_horsepower', ascending=False).head()
country_df= mpg_data.groupby('company').agg(mean_horsepower = ('horsepower','mean'))
country_df.reset_index().sort_values(by='mean_horsepower', ascending=False).head()

Out[133]:

	company	mean_horsepower
14	hi	193.000000
9	chrysler	153.666667
4	cadillac	152.500000
8	chevy	142.333333
26	pontiac	136.937500

Bivariate Analysis of Numerical Data¶

To gain proper insigts and explore, I wrote code to plot all values vs all other values.

In [134]:

Copied!





from plotly.subplots import make_subplots

col_names = ['mpg','cylinders','displacement','horsepower','weight','acceleration','model_year']
graph_names = [f'{col_names[0].upper()} vs {x.upper()}' for x in col_names[1:]]
print(graph_names)
fig = make_subplots(rows=3,cols=2)
subplots =  [(1,1),(1,2),(2,1),(2,2),(3,1),(3,2)] 

for i in range(1,len(col_names)):
    row,col = subplots[i-1]
    fig.add_trace(
        go.Scatter(x=mpg_data[col_names[0]], y=mpg_data[col_names[i]], mode='markers',name=graph_names[i-1]),
        row=row, col=col
    )
    fig.update_xaxes(title_text= col_names[0],row=row,col=col)
    fig.update_yaxes(title_text= col_names[i],row=row,col=col)

fig.update_layout(height=720, width=1080, title_text=f"{col_names[0]} vs others")
fig.write_image("images/plt_13.png")
from plotly.subplots import make_subplots

col_names = ['mpg','cylinders','displacement','horsepower','weight','acceleration','model_year']
graph_names = [f'{col_names[0].upper()} vs {x.upper()}' for x in col_names[1:]]
print(graph_names)
fig = make_subplots(rows=3,cols=2)
subplots =  [(1,1),(1,2),(2,1),(2,2),(3,1),(3,2)] 

for i in range(1,len(col_names)):
    row,col = subplots[i-1]
    fig.add_trace(
        go.Scatter(x=mpg_data[col_names[0]], y=mpg_data[col_names[i]], mode='markers',name=graph_names[i-1]),
        row=row, col=col
    )
    fig.update_xaxes(title_text= col_names[0],row=row,col=col)
    fig.update_yaxes(title_text= col_names[i],row=row,col=col)

fig.update_layout(height=720, width=1080, title_text=f"{col_names[0]} vs others")
fig.write_image("images/plt_13.png")

['MPG vs CYLINDERS', 'MPG vs DISPLACEMENT', 'MPG vs HORSEPOWER', 'MPG vs WEIGHT', 'MPG vs ACCELERATION', 'MPG vs MODEL_YEAR']

Plot 13

Observations: mpg vs others¶

From these graphs, we can make a number of Observations.

A trend that shows more number of cylinders reduces fuel efficiency.
Higher horsepower,displacement and weight show lower fuel efficiency.
Acceleration and mpg do not have a obvious relationship due to a highly scattered plot. We do see a proportional trend, which is quite counter intuitive.
As the years progress, cars are getting more fuel efficient.

In [ ]:

Copied!





from plotly.subplots import make_subplots

col_names = ['cylinders','displacement','horsepower','weight','acceleration','model_year']
graph_names = [f'{col_names[0].upper()} vs {x.upper()}' for x in col_names[1:]]
print(graph_names)
fig = make_subplots(rows=3,cols=2)
subplots =  [(1,1),(1,2),(2,1),(2,2),(3,1),(3,2)] 

for i in range(1,len(col_names)):
    row,col = subplots[i-1]
    fig.add_trace(
        go.Scatter(y=mpg_data[col_names[0]], x=mpg_data[col_names[i]], mode='markers',name=graph_names[i-1]),
        row=row, col=col
    )
    fig.update_yaxes(title_text= col_names[0],row=row,col=col)
    fig.update_xaxes(title_text= col_names[i],row=row,col=col)

fig.update_layout(height=720, width=1080, title_text=f"{col_names[0]} vs others")
fig.write_image("images/plt_14.png")
from plotly.subplots import make_subplots

col_names = ['cylinders','displacement','horsepower','weight','acceleration','model_year']
graph_names = [f'{col_names[0].upper()} vs {x.upper()}' for x in col_names[1:]]
print(graph_names)
fig = make_subplots(rows=3,cols=2)
subplots =  [(1,1),(1,2),(2,1),(2,2),(3,1),(3,2)] 

for i in range(1,len(col_names)):
    row,col = subplots[i-1]
    fig.add_trace(
        go.Scatter(y=mpg_data[col_names[0]], x=mpg_data[col_names[i]], mode='markers',name=graph_names[i-1]),
        row=row, col=col
    )
    fig.update_yaxes(title_text= col_names[0],row=row,col=col)
    fig.update_xaxes(title_text= col_names[i],row=row,col=col)

fig.update_layout(height=720, width=1080, title_text=f"{col_names[0]} vs others")
fig.write_image("images/plt_14.png")

['CYLINDERS vs DISPLACEMENT', 'CYLINDERS vs HORSEPOWER', 'CYLINDERS vs WEIGHT', 'CYLINDERS vs ACCELERATION', 'CYLINDERS vs MODEL_YEAR']

Plot 14

Observations: Cylinders vs others¶

We can see that more number of cylinder means higher horse power, more displacement and also a heavier engine.

There is no clear relationship between the model year and the number of cylinders, indicating that cars with varying engine types were produced across all years.

Checking out the 4 cylinder engines, we observe that they give great fuel efficiency with light engines but with lower horse power. This could also mean they are both cheaper to manufacture and to use. This could explain why there are greater number of cars with 4 cylinder engines produced.

We can not however, say for sure that these 4 cylinder cars are the most poular or most bought cars though. For that, we would need the sales data of these models to be certain.

6 cylinder engines have a good balance of efficiency and horsepower. 8 cylinder engines pack a punch with higher horsepower but guzzle a lot of fuel. The 8 cylinder engines are some of the most powerful engines with very high horsepower. This makes sense as more cylinders will displace more fuel, producing more power but meaning lower fuel efficiency. At the same time bigger and heavier engines are needed to accomodate more number of cylinders.

In [136]:

Copied!





from plotly.subplots import make_subplots

col_names = ['displacement','horsepower','weight','acceleration','model_year']
graph_names = [f'{col_names[0].upper()} vs {x.upper()}' for x in col_names[1:]]
print(graph_names)
fig = make_subplots(rows=2,cols=2)
subplots =  [(1,1),(1,2),(2,1),(2,2)] 

for i in range(1,len(col_names)):
    row,col = subplots[i-1]
    fig.add_trace(
        go.Scatter(x=mpg_data[col_names[0]], y=mpg_data[col_names[i]], mode='markers',name=graph_names[i-1]),
        row=row, col=col
    )
    fig.update_xaxes(title_text= col_names[0],row=row,col=col)
    fig.update_yaxes(title_text= col_names[i],row=row,col=col)

fig.update_layout(height=720, width=1080, title_text=f"{col_names[0]} vs others")
fig.write_image("images/plt_15.png")
from plotly.subplots import make_subplots

col_names = ['displacement','horsepower','weight','acceleration','model_year']
graph_names = [f'{col_names[0].upper()} vs {x.upper()}' for x in col_names[1:]]
print(graph_names)
fig = make_subplots(rows=2,cols=2)
subplots =  [(1,1),(1,2),(2,1),(2,2)] 

for i in range(1,len(col_names)):
    row,col = subplots[i-1]
    fig.add_trace(
        go.Scatter(x=mpg_data[col_names[0]], y=mpg_data[col_names[i]], mode='markers',name=graph_names[i-1]),
        row=row, col=col
    )
    fig.update_xaxes(title_text= col_names[0],row=row,col=col)
    fig.update_yaxes(title_text= col_names[i],row=row,col=col)

fig.update_layout(height=720, width=1080, title_text=f"{col_names[0]} vs others")
fig.write_image("images/plt_15.png")

['DISPLACEMENT vs HORSEPOWER', 'DISPLACEMENT vs WEIGHT', 'DISPLACEMENT vs ACCELERATION', 'DISPLACEMENT vs MODEL_YEAR']

Plot 15

In [137]:

Copied!





from plotly.subplots import make_subplots

col_names = ['horsepower','weight','acceleration','model_year']
graph_names = [f'{col_names[0].upper()} vs {x.upper()}' for x in col_names[1:]]
print(graph_names)
fig = make_subplots(rows=2,cols=2)
subplots =  [(1,1),(1,2),(2,1),(2,2)] 

for i in range(1,len(col_names)):
    row,col = subplots[i-1]
    fig.add_trace(
        go.Scatter(x=mpg_data[col_names[0]], y=mpg_data[col_names[i]], mode='markers',name=graph_names[i-1]),
        row=row, col=col
    )
    fig.update_xaxes(title_text= col_names[0],row=row,col=col)
    fig.update_yaxes(title_text= col_names[i],row=row,col=col)

fig.update_layout(height=720, width=1080, title_text=f"{col_names[0]} vs others")
fig.write_image("images/plt_16.png")
from plotly.subplots import make_subplots

col_names = ['horsepower','weight','acceleration','model_year']
graph_names = [f'{col_names[0].upper()} vs {x.upper()}' for x in col_names[1:]]
print(graph_names)
fig = make_subplots(rows=2,cols=2)
subplots =  [(1,1),(1,2),(2,1),(2,2)] 

for i in range(1,len(col_names)):
    row,col = subplots[i-1]
    fig.add_trace(
        go.Scatter(x=mpg_data[col_names[0]], y=mpg_data[col_names[i]], mode='markers',name=graph_names[i-1]),
        row=row, col=col
    )
    fig.update_xaxes(title_text= col_names[0],row=row,col=col)
    fig.update_yaxes(title_text= col_names[i],row=row,col=col)

fig.update_layout(height=720, width=1080, title_text=f"{col_names[0]} vs others")
fig.write_image("images/plt_16.png")

['HORSEPOWER vs WEIGHT', 'HORSEPOWER vs ACCELERATION', 'HORSEPOWER vs MODEL_YEAR']

Plot 16

Observation: horsepower vs others & displacement vs others¶

We can see the relationships between weight, displacement and horsepower, i.e

more displacement → more horsepower

more displacement → more weight

meaning higher horsepower engines are heavier like we observed before.

The relationshoip between acceleration and displacement and acceleration and horsepower is negatively proportional though. This is rather counter intuitive as you might think cars having higher horsepower/displacement should provide higher acceleration as well. But that is not the case at all.

What we can also see is that newer models have lower horsepower.

To make more sense out of this we can create a correlation heatmap and perform further multivariate analysis.

CORRELATION HEATMAP¶

In [138]:

Copied!

correlation = mpg_data.select_dtypes('number').corr('pearson')
correlation
correlation = mpg_data.select_dtypes('number').corr('pearson')
correlation

Out[138]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	model_year
mpg	1.000000	-0.777618	-0.805127	-0.778427	-0.832244	0.423329	0.580541
cylinders	-0.777618	1.000000	0.950823	0.842983	0.897527	-0.504683	-0.345647
displacement	-0.805127	0.950823	1.000000	0.897257	0.932994	-0.543800	-0.369855
horsepower	-0.778427	0.842983	0.897257	1.000000	0.864538	-0.689196	-0.416361
weight	-0.832244	0.897527	0.932994	0.864538	1.000000	-0.416839	-0.309120
acceleration	0.423329	-0.504683	-0.543800	-0.689196	-0.416839	1.000000	0.290316
model_year	0.580541	-0.345647	-0.369855	-0.416361	-0.309120	0.290316	1.000000

In [ ]:

Copied!

fig = px.imshow(correlation, text_auto=True, color_continuous_scale='thermal', aspect='auto')
fig.write_image("/images/plt_17.png")
fig = px.imshow(correlation, text_auto=True, color_continuous_scale='thermal', aspect='auto')
fig.write_image("/images/plt_17.png")

Plot 17

The correlation heatmaps confirms the relation between cylinder, displacement, horsepower and weight. We also see that these affect mpg negatively as expected.

We can also see that there is a fair relationship between mpg and model years, meaning newer models tend to be fuel efficient.

But then acceleration is negatively proportional to cylinder, displacement, horsepower and weight, which is still unexplained. Whats more, there is a weakly positive relation between acceleration and mpg meaning higher acceleration gives better mileage, which is not logical.

In [140]:

Copied!

fig= px.scatter(data_frame=mpg_data, x='horsepower', y='mpg', color= 'model_year')
fig.write_image("images/plt_18.png")
fig= px.scatter(data_frame=mpg_data, x='horsepower', y='mpg', color= 'model_year')
fig.write_image("images/plt_18.png")

Plot 18

From this plot we can confirm that newer models are more efficient than the older ones. However, we can see that the newer models also have lower horsepower.

In [141]:

Copied!

fig= px.scatter(data_frame=mpg_data, x='weight', y='horsepower', color= 'model_year')
fig.write_image("images/plt_19.png")
fig= px.scatter(data_frame=mpg_data, x='weight', y='horsepower', color= 'model_year')
fig.write_image("images/plt_19.png")

Plot 19

With this graph we find that heavier models are all older models, which also have a lot of horse power. The newer models are not only fuel efficient but are lighter and provide lower power output.

This raises a questions as why the companies would build such cars with low power and weight but better fuel efficiency?

In [142]:

Copied!

fig= px.scatter(data_frame=mpg_data, x='horsepower', y='acceleration', color= 'model_year')
fig.write_image("images/plt_20.png")
fig= px.scatter(data_frame=mpg_data, x='horsepower', y='acceleration', color= 'model_year')
fig.write_image("images/plt_20.png")

Plot 20

This tells us almost all new models have low horse power but high acceleration. Even though there were some older models that had low horsepower/high acceleration.

In [143]:

Copied!

px.scatter(data_frame=mpg_data, x='weight', y='acceleration', color= 'horsepower', color_continuous_scale='temps')
fig.write_image("images/plt_21.png")
px.scatter(data_frame=mpg_data, x='weight', y='acceleration', color= 'horsepower', color_continuous_scale='temps')
fig.write_image("images/plt_21.png")

Plot 21

Higher horsepower cars are almost always rather heavy with lower acceleration.

In [144]:

Copied!

cyl = mpg_data.groupby('model_year').aggregate(avg_no_of_cylinders = ('cylinders','mean'), avg_mpg=('mpg','mean') ).reset_index()
fig = px.scatter(data_frame=cyl, x='model_year', y='avg_no_of_cylinders', color='avg_mpg')
fig.write_image("images/plt_22.png")
cyl = mpg_data.groupby('model_year').aggregate(avg_no_of_cylinders = ('cylinders','mean'), avg_mpg=('mpg','mean') ).reset_index()
fig = px.scatter(data_frame=cyl, x='model_year', y='avg_no_of_cylinders', color='avg_mpg')
fig.write_image("images/plt_22.png")

Plot 22

Newer cars have fewer no of cylinders on average.

These final graphs give us answers to a few questions we were asking.

Conclusions¶

Why would acceleration increase when horsepower is decreasing?¶

Newer cars have higher acceleration despite lower horsepower. And along with good acceleration, these cars are also fuel efficient.

Turns out the acceleration in cars are increasing beacuse cars are getting lighter, requiring less power to accelerate.

Why did the companies start making lower power, fuel efficient cars?¶

This data does make it clear that cars do not only need raw horsepower. They can be fast and fuel efficient even with lower horsepower. With fuel emmisions being a major concern and major technological breakthroughs, it is compeletely sensible now that the companies chose to make their cars this way. What is more that the cars could be cheaper with less materials being used and the car would be less expensive to operate for the customers as well all while being better for the environment.

There might be several factors that the data does not capture. The newer cars would have better technology like lighter materials, aerodynamics, better fuel composition and better engineering as well. But the fact is that car companies did make lighter, fuel efficient cars earlier too; only that they chose to keep making newer cars that were lower power and less fuel hungry.

Why are there so many 4 cylinder cars?¶

This falls in line with the fact that the newer models have on average lower number of cylinders. This also makes us understand as why there are more number of 4 cylinder cars which are the lightest and most fuel efficient, with better stability, lower vibration and possibly the cost.