This is a report about world happiness. In the report, we will visualize the distribution of countries' happiness scores from 2015-2019. Besides, we will analyze features like GDP, Health, Trust, and etc. to explore how they contribute to happiness scores.
In case you need the source data, visit this webpage:
Data Source
Exploring Data
The column names of the 5 datasets are different and the meaning of each column is stated as below:
- Country / Country or region: Name of the country.
- Region: Region the country belongs to.
- Happiness Rank / Overall rank: Rank of the country based on the Happiness Score.
- Happiness Score / Score: A metric measured in 2015 by asking the sampled people the question: "How would you rate your happiness on a scale of 0 to 10 where 10 is the happiest.".
- Standard Error: The standard error of the happiness score.
- Economy (GDP per Capita) / Economy..GDP.per.Capita. / GDP per capita: The extent to which GDP contributes to the calculation of the Happiness Score.
- Family: The extent to which Family contributes to the calculation of the Happiness Score.
- Health (Life Expectancy) / Health..Life.Expectancy. / Healthy life expectancy: The extent to which Life expectancy contributed to the calculation of the Happiness Score.
- Freedom / Freedom to make life choices: The extent to which Freedom contributed to the calculation of the Happiness Score.
- Trust (Government Corruption) / Trust..Government.Corruption. / Perceptions of corruption: The extent to which Perception of Corruption contributes to Happiness Score.
- Generosity: The extent to which Generosity contributed to the calculation of the Happiness Score.
- Dystopia Residual / Dystopia.Residual: The extent to which Dystopia Residual contributed to the calculation of the Happiness Score.
- Lower Confidence Interval: Lower Confidence Interval of the Happiness Score
- Upper Confidence Interval: Upper Confidence Interval of the Happiness Score
- Whisker.high: Upper Whisker of the Happiness Score
- Whisker.low: Lower Whisker of the Happiness Score
- Social support: The extent to which Social support contributed to the calculation of the Happiness Score.
First we need to have a look at the column names.
import pandas as pd
import numpy as np
years = list(range(2015, 2020))
names = locals()
for year in years:
names["happiness" + str(year)] = pd.read_csv(f"{year}.csv")
print(names["happiness" + str(year)].columns)
Index(['Country', 'Region', 'Happiness Rank', 'Happiness Score',
'Standard Error', 'Economy (GDP per Capita)', 'Family',
'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)',
'Generosity', 'Dystopia Residual'],
dtype='object')
Index(['Country', 'Region', 'Happiness Rank', 'Happiness Score',
'Lower Confidence Interval', 'Upper Confidence Interval',
'Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)',
'Freedom', 'Trust (Government Corruption)', 'Generosity',
'Dystopia Residual'],
dtype='object')
Index(['Country', 'Happiness.Rank', 'Happiness.Score', 'Whisker.high',
'Whisker.low', 'Economy..GDP.per.Capita.', 'Family',
'Health..Life.Expectancy.', 'Freedom', 'Generosity',
'Trust..Government.Corruption.', 'Dystopia.Residual'],
dtype='object')
Index(['Overall rank', 'Country or region', 'Score', 'GDP per capita',
'Social support', 'Healthy life expectancy',
'Freedom to make life choices', 'Generosity',
'Perceptions of corruption'],
dtype='object')
Index(['Overall rank', 'Country or region', 'Score', 'GDP per capita',
'Social support', 'Healthy life expectancy',
'Freedom to make life choices', 'Generosity',
'Perceptions of corruption'],
dtype='object')
Then we need to unify the column names.
happiness2017.columns = [
"Country",
"Happiness Rank",
"Happiness Score",
"Whisker High",
"Whisker Low",
"Economy (GDP per Capita)",
"Family",
"Health (Life Expectancy)",
"Freedom",
"Generosity",
"Trust (Government Corruption)",
"Dystopia Residual",
]
happiness2018.columns = [
"Happiness Rank",
"Country",
"Happiness Score",
"Economy (GDP per Capita)",
"Social Support",
"Health (Life Expectancy)",
"Freedom",
"Generosity",
"Trust (Government Corruption)",
]
happiness2019.columns = [
"Happiness Rank",
"Country",
"Happiness Score",
"Economy (GDP per Capita)",
"Social Support",
"Health (Life Expectancy)",
"Freedom",
"Generosity",
"Trust (Government Corruption)",
]
# Add Region column to happiness2017, happiness2018, and happiness2019
names = locals()
for year in range(2017, 2020):
for i in range(2015, 2017):
names["happiness" + str(year)] = names["happiness" + str(year)].merge(
names["happiness" + str(i)][["Country", "Region"]], on="Country", how="left"
)
null_index = names["happiness" + str(year)][
names["happiness" + str(year)].isnull().T.any()
].index.to_list()
for ind in null_index:
if names["happiness" + str(year)].loc[ind, "Region_x"] is np.nan:
names["happiness" + str(year)].loc[ind, "Region_x"] = names[
"happiness" + str(year)
].loc[ind, "Region_y"]
names["happiness" + str(year)] = (
names["happiness" + str(year)]
.drop(["Region_y"], axis=1)
.rename(columns={"Region_x": "Region"})
)
Because in each year, the country being involved in the ranking were different and the names for some countries changed, the Region column would have some null values. We need to mannually fill these null values.
happiness2017[happiness2017.isnull().T.any()]
| Country | Happiness Rank | Happiness Score | Whisker High | Whisker Low | Economy (GDP per Capita) | Family | Health (Life Expectancy) | Freedom | Generosity | Trust (Government Corruption) | Dystopia Residual | Region | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 32 | Taiwan Province of China | 33 | 6.422 | 6.494596 | 6.349404 | 1.433627 | 1.384565 | 0.793984 | 0.361467 | 0.258360 | 0.063829 | 2.126607 | NaN |
| 70 | Hong Kong S.A.R., China | 71 | 5.472 | 5.549594 | 5.394406 | 1.551675 | 1.262791 | 0.943062 | 0.490969 | 0.374466 | 0.293934 | 0.554633 | NaN |
happiness2017.loc[32, "Region"] = 'Eastern Asia'
happiness2017.loc[70, "Region"] = 'Eastern Asia'
happiness2018[happiness2018.isnull().T.any()]
| Happiness Rank | Country | Happiness Score | Economy (GDP per Capita) | Social Support | Health (Life Expectancy) | Freedom | Generosity | Trust (Government Corruption) | Region | |
|---|---|---|---|---|---|---|---|---|---|---|
| 19 | 20 | United Arab Emirates | 6.774 | 2.096 | 0.776 | 0.670 | 0.284 | 0.186 | NaN | Middle East and Northern Africa |
| 37 | 38 | Trinidad & Tobago | 6.192 | 1.223 | 1.492 | 0.564 | 0.575 | 0.171 | 0.019 | NaN |
| 57 | 58 | Northern Cyprus | 5.835 | 1.229 | 1.211 | 0.909 | 0.495 | 0.179 | 0.154 | NaN |
happiness2018.loc[37, "Region"] = 'Latin America and Caribbean'
happiness2018.loc[57, "Region"] = 'Western Europe'
happiness2019[happiness2019.isnull().T.any()]
| Happiness Rank | Country | Happiness Score | Economy (GDP per Capita) | Social Support | Health (Life Expectancy) | Freedom | Generosity | Trust (Government Corruption) | Region | |
|---|---|---|---|---|---|---|---|---|---|---|
| 38 | 39 | Trinidad & Tobago | 6.192 | 1.231 | 1.477 | 0.713 | 0.489 | 0.185 | 0.016 | NaN |
| 63 | 64 | Northern Cyprus | 5.718 | 1.263 | 1.252 | 1.042 | 0.417 | 0.191 | 0.162 | NaN |
| 83 | 84 | North Macedonia | 5.274 | 0.983 | 1.294 | 0.838 | 0.345 | 0.185 | 0.034 | NaN |
| 119 | 120 | Gambia | 4.516 | 0.308 | 0.939 | 0.428 | 0.382 | 0.269 | 0.167 | NaN |
happiness2019.loc[38, "Region"] = 'Latin America and Caribbean'
happiness2019.loc[63, "Region"] = 'Western Europe'
happiness2019.loc[83, "Region"] = 'Central and Eastern Europe'
happiness2019.loc[119, "Region"] = 'Sub-Saharan Africa'
Now let's look at the indicators of each dataset.
print(happiness2015.shape)
happiness2015.describe()
(158, 12)
| Happiness Rank | Happiness Score | Standard Error | Economy (GDP per Capita) | Family | Health (Life Expectancy) | Freedom | Trust (Government Corruption) | Generosity | Dystopia Residual | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 158.000000 | 158.000000 | 158.000000 | 158.000000 | 158.000000 | 158.000000 | 158.000000 | 158.000000 | 158.000000 | 158.000000 |
| mean | 79.493671 | 5.375734 | 0.047885 | 0.846137 | 0.991046 | 0.630259 | 0.428615 | 0.143422 | 0.237296 | 2.098977 |
| std | 45.754363 | 1.145010 | 0.017146 | 0.403121 | 0.272369 | 0.247078 | 0.150693 | 0.120034 | 0.126685 | 0.553550 |
| min | 1.000000 | 2.839000 | 0.018480 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.328580 |
| 25% | 40.250000 | 4.526000 | 0.037268 | 0.545808 | 0.856823 | 0.439185 | 0.328330 | 0.061675 | 0.150553 | 1.759410 |
| 50% | 79.500000 | 5.232500 | 0.043940 | 0.910245 | 1.029510 | 0.696705 | 0.435515 | 0.107220 | 0.216130 | 2.095415 |
| 75% | 118.750000 | 6.243750 | 0.052300 | 1.158448 | 1.214405 | 0.811013 | 0.549092 | 0.180255 | 0.309883 | 2.462415 |
| max | 158.000000 | 7.587000 | 0.136930 | 1.690420 | 1.402230 | 1.025250 | 0.669730 | 0.551910 | 0.795880 | 3.602140 |
print(happiness2016.shape)
happiness2016.describe()
(157, 13)
| Happiness Rank | Happiness Score | Lower Confidence Interval | Upper Confidence Interval | Economy (GDP per Capita) | Family | Health (Life Expectancy) | Freedom | Trust (Government Corruption) | Generosity | Dystopia Residual | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 |
| mean | 78.980892 | 5.382185 | 5.282395 | 5.481975 | 0.953880 | 0.793621 | 0.557619 | 0.370994 | 0.137624 | 0.242635 | 2.325807 |
| std | 45.466030 | 1.141674 | 1.148043 | 1.136493 | 0.412595 | 0.266706 | 0.229349 | 0.145507 | 0.111038 | 0.133756 | 0.542220 |
| min | 1.000000 | 2.905000 | 2.732000 | 3.078000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.817890 |
| 25% | 40.000000 | 4.404000 | 4.327000 | 4.465000 | 0.670240 | 0.641840 | 0.382910 | 0.257480 | 0.061260 | 0.154570 | 2.031710 |
| 50% | 79.000000 | 5.314000 | 5.237000 | 5.419000 | 1.027800 | 0.841420 | 0.596590 | 0.397470 | 0.105470 | 0.222450 | 2.290740 |
| 75% | 118.000000 | 6.269000 | 6.154000 | 6.434000 | 1.279640 | 1.021520 | 0.729930 | 0.484530 | 0.175540 | 0.311850 | 2.664650 |
| max | 157.000000 | 7.526000 | 7.460000 | 7.669000 | 1.824270 | 1.183260 | 0.952770 | 0.608480 | 0.505210 | 0.819710 | 3.837720 |
print(happiness2017.shape)
happiness2017.describe()
(155, 13)
| Happiness Rank | Happiness Score | Whisker High | Whisker Low | Economy (GDP per Capita) | Family | Health (Life Expectancy) | Freedom | Generosity | Trust (Government Corruption) | Dystopia Residual | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 155.000000 | 155.000000 | 155.000000 | 155.000000 | 155.000000 | 155.000000 | 155.000000 | 155.000000 | 155.000000 | 155.000000 | 155.000000 |
| mean | 78.000000 | 5.354019 | 5.452326 | 5.255713 | 0.984718 | 1.188898 | 0.551341 | 0.408786 | 0.246883 | 0.123120 | 1.850238 |
| std | 44.888751 | 1.131230 | 1.118542 | 1.145030 | 0.420793 | 0.287263 | 0.237073 | 0.149997 | 0.134780 | 0.101661 | 0.500028 |
| min | 1.000000 | 2.693000 | 2.864884 | 2.521116 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.377914 |
| 25% | 39.500000 | 4.505500 | 4.608172 | 4.374955 | 0.663371 | 1.042635 | 0.369866 | 0.303677 | 0.154106 | 0.057271 | 1.591291 |
| 50% | 78.000000 | 5.279000 | 5.370032 | 5.193152 | 1.064578 | 1.253918 | 0.606042 | 0.437454 | 0.231538 | 0.089848 | 1.832910 |
| 75% | 116.500000 | 6.101500 | 6.194600 | 6.006527 | 1.318027 | 1.414316 | 0.723008 | 0.516561 | 0.323762 | 0.153296 | 2.144654 |
| max | 155.000000 | 7.537000 | 7.622030 | 7.479556 | 1.870766 | 1.610574 | 0.949492 | 0.658249 | 0.838075 | 0.464308 | 3.117485 |
print(happiness2018.shape)
happiness2018.describe()
(156, 10)
| Happiness Rank | Happiness Score | Economy (GDP per Capita) | Social Support | Health (Life Expectancy) | Freedom | Generosity | Trust (Government Corruption) | |
|---|---|---|---|---|---|---|---|---|
| count | 156.000000 | 156.000000 | 156.000000 | 156.000000 | 156.000000 | 156.000000 | 156.000000 | 155.000000 |
| mean | 78.500000 | 5.375917 | 0.891449 | 1.213237 | 0.597346 | 0.454506 | 0.181006 | 0.112000 |
| std | 45.177428 | 1.119506 | 0.391921 | 0.302372 | 0.247579 | 0.162424 | 0.098471 | 0.096492 |
| min | 1.000000 | 2.905000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 39.750000 | 4.453750 | 0.616250 | 1.066750 | 0.422250 | 0.356000 | 0.109500 | 0.051000 |
| 50% | 78.500000 | 5.378000 | 0.949500 | 1.255000 | 0.644000 | 0.487000 | 0.174000 | 0.082000 |
| 75% | 117.250000 | 6.168500 | 1.197750 | 1.463000 | 0.777250 | 0.578500 | 0.239000 | 0.137000 |
| max | 156.000000 | 7.632000 | 2.096000 | 1.644000 | 1.030000 | 0.724000 | 0.598000 | 0.457000 |
print(happiness2019.shape)
happiness2019.describe()
(156, 10)
| Happiness Rank | Happiness Score | Economy (GDP per Capita) | Social Support | Health (Life Expectancy) | Freedom | Generosity | Trust (Government Corruption) | |
|---|---|---|---|---|---|---|---|---|
| count | 156.000000 | 156.000000 | 156.000000 | 156.000000 | 156.000000 | 156.000000 | 156.000000 | 156.000000 |
| mean | 78.500000 | 5.407096 | 0.905147 | 1.208814 | 0.725244 | 0.392571 | 0.184846 | 0.110603 |
| std | 45.177428 | 1.113120 | 0.398389 | 0.299191 | 0.242124 | 0.143289 | 0.095254 | 0.094538 |
| min | 1.000000 | 2.853000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 39.750000 | 4.544500 | 0.602750 | 1.055750 | 0.547750 | 0.308000 | 0.108750 | 0.047000 |
| 50% | 78.500000 | 5.379500 | 0.960000 | 1.271500 | 0.789000 | 0.417000 | 0.177500 | 0.085500 |
| 75% | 117.250000 | 6.184500 | 1.232500 | 1.452500 | 0.881750 | 0.507250 | 0.248250 | 0.141250 |
| max | 156.000000 | 7.769000 | 1.684000 | 1.624000 | 1.141000 | 0.631000 | 0.566000 | 0.453000 |
Visualization
Names for some countries in Pyecharts differed from those in the datasets, so we must change the name of some countries into those Pyecharts can recognize.
countries = {
"South Korea": "Korea",
"Congo (Kinshasa)": "Dem. Rep. Congo",
"Congo (Brazzaville)": "Congo",
"South Sudan": "S. Sudan",
"Somaliland region": "Somalia",
"Central African Republic": "Central African Rep.",
"Ivory Coast": "Côte d'Ivoire",
"Dominican Republic": "Dominican Rep.",
"Czech Republic": "Czech Rep.",
"North Korea": "Dem. Rep. Korea",
}
names = locals()
for key, val in countries.items():
for i in range(2015, 2020):
try:
df_name = names["happiness" + str(i)]
row_ind = int(df_name.loc[df_name["Country"] == key].index.values)
names["happiness" + str(i)].loc[row_ind, "Country"] = countries[key]
except (TypeError, KeyError):
pass
Put all info into a dictionary for visualization.
data = dict()
data[2015] = happiness2015.to_dict("list")
data[2016] = happiness2016.to_dict("list")
data[2017] = happiness2017.to_dict("list")
data[2018] = happiness2018.to_dict("list")
data[2019] = happiness2019.to_dict("list")
from pyecharts.charts import Map, Timeline, Tab
from pyecharts import options as opts
import math
col_list = [
"Happiness Score",
"Economy (GDP per Capita)",
"Family",
"Health (Life Expectancy)",
"Freedom",
"Trust (Government Corruption)",
"Generosity",
"Dystopia Residual",
]
tab = Tab()
for col in col_list:
tl = Timeline()
for year in range(2015, 2020):
try:
data_pair = []
# range of values for each feature
minimum = min(names["happiness" + str(year)][col].round(2))
maximum = max(names["happiness" + str(year)][col].round(2))
data_pair = [
list(z)
for z in zip(
names["happiness" + str(year)]["Country"].to_list(),
names["happiness" + str(year)][col].round(3),
)
]
c = (
Map()
.add(col, data_pair, "world", is_map_symbol_show=False)
.set_series_opts(label_opts=opts.LabelOpts(is_show=False))
.set_global_opts(
title_opts=opts.TitleOpts(title=f"World {col} Index of {year}"),
visualmap_opts=opts.VisualMapOpts(
min_=minimum, max_=maximum, is_piecewise=False
),
legend_opts=opts.LegendOpts(is_show=False),
)
)
tl = tl.add(c, year).add_schema(
is_auto_play=False, play_interval=1000, is_loop_play=True
)
except KeyError:
pass
tab = tab.add(tl, col)
tab.render_notebook()
# Click tab above the charts to see the distribution of features.
# And click the timeline below the charts to swich between years.
# Click the play button to automatically see the changes between years.
In developed countries, like those in North America, Oceania, and Europe, happiness scores are higher than countries in Asia and Africa.
Situations are similar for GDP, Freedom, Trust, and Generosity. So we guess, higher GDP, Freedom, Trust, and Generosity correlated with higher happiness score.
World Family Index didn't differ much across countries. But in war region, like Iraq and Afghanistan, and in poor countries, like Sudan and some other countries in Africa, the values are low. Health Index shows similar distribution.
Dystopia Residual Index is extremely high in Latin America and in the other countries, there weren't much difference.
from pyecharts.charts import Scatter
from pyecharts.commons.utils import JsCode
col_list = [
"Economy (GDP per Capita)",
"Family",
"Health (Life Expectancy)",
"Freedom",
"Trust (Government Corruption)",
"Generosity",
"Dystopia Residual",
]
tab = Tab()
for col in col_list:
tl = Timeline()
# tooltip format
js_code_str = """
function(params){
return params.data[4]+' - '+params.data[3]+'<br/>'
+'X: '+params.data[5]+'<br/>'
+'Y: '+params.data[1];
}
"""
for year in range(2015, 2020):
try:
df = names["happiness" + str(year)]
region_list = df["Region"].unique().tolist()
df["R_transfromed"] = df["Region"].apply(lambda x: region_list.index(x))
y_data = [
z
for z in zip(
df[col].round(3),
df["R_transfromed"].to_list(),
df["Country"],
df["Region"],
df["Happiness Score"].round(3),
)
]
c = (
Scatter()
.add_xaxis(df["Happiness Score"].to_list())
.add_yaxis("", y_data)
.set_global_opts(
title_opts=opts.TitleOpts(
title=f"Relationship between Happiness Score and {col} in {year}"
),
visualmap_opts=[
opts.VisualMapOpts(
is_show=False, type_="color", dimension=2, min_=0, max_=10
)
],
xaxis_opts=opts.AxisOpts(
type_="value",
name="Happiness Score",
is_scale=True,
name_location="middle",
),
yaxis_opts=opts.AxisOpts(type_="value", name=col, is_scale=True),
tooltip_opts=opts.TooltipOpts(formatter=JsCode(js_code_str)),
)
.set_series_opts(label_opts=opts.LabelOpts(is_show=False))
)
tl = tl.add(c, year).add_schema(
is_auto_play=False, play_interval=1000, is_loop_play=True
)
except KeyError:
pass
tab = tab.add(tl, col)
tab.render_notebook()
From the scatter plot, we further prove our guess that higher GDP and Health Index correlated with higher happiness score. However, higher Freedom Index and Generosity Index dont't mean higher Happiness Score. For Trust Index, when it is smaller than 6, it doesn't show much correlation with happiness score; while when it is greater than 6, the positive correlation is much stronger. Family Index and Dystopia Residual Index also show a positive correlation with Happiness Score to some extent.
Countries in North America, Western Europe, and Latin America as well as Australia, New Zealand, Singapore, and some countries in Middle East are ranked as the happiest countries among the world, most of which are also among the richest countries. As for countries in Latin America, I think citizens' happiness may be rooted in their heart and naturally part of their nature. This might be explained by the enthusiasm and optimism.