This is a report about world happiness. In the report, we will visualize the distribution of countries' happiness scores from 2015-2019. Besides, we will analyze features like GDP, Health, Trust, and etc. to explore how they contribute to happiness scores.

In case you need the source data, visit this webpage:
Data Source

Exploring Data

The column names of the 5 datasets are different and the meaning of each column is stated as below:

Country / Country or region: Name of the country.

Region: Region the country belongs to.

Happiness Rank / Overall rank: Rank of the country based on the Happiness Score.

Happiness Score / Score: A metric measured in 2015 by asking the sampled people the question: "How would you rate your happiness on a scale of 0 to 10 where 10 is the happiest.".

Standard Error: The standard error of the happiness score.

Economy (GDP per Capita) / Economy..GDP.per.Capita. / GDP per capita: The extent to which GDP contributes to the calculation of the Happiness Score.

Family: The extent to which Family contributes to the calculation of the Happiness Score.

Health (Life Expectancy) / Health..Life.Expectancy. / Healthy life expectancy: The extent to which Life expectancy contributed to the calculation of the Happiness Score.

Freedom / Freedom to make life choices: The extent to which Freedom contributed to the calculation of the Happiness Score.

Trust (Government Corruption) / Trust..Government.Corruption. / Perceptions of corruption: The extent to which Perception of Corruption contributes to Happiness Score.

Generosity: The extent to which Generosity contributed to the calculation of the Happiness Score.

Dystopia Residual / Dystopia.Residual: The extent to which Dystopia Residual contributed to the calculation of the Happiness Score.

Lower Confidence Interval: Lower Confidence Interval of the Happiness Score

Upper Confidence Interval: Upper Confidence Interval of the Happiness Score

Whisker.high: Upper Whisker of the Happiness Score

Whisker.low: Lower Whisker of the Happiness Score

Social support: The extent to which Social support contributed to the calculation of the Happiness Score.

First we need to have a look at the column names.

import pandas as pd
import numpy as np

years = list(range(2015, 2020))
names = locals()
for year in years:
    names["happiness" + str(year)] = pd.read_csv(f"{year}.csv")
    print(names["happiness" + str(year)].columns)

Index(['Country', 'Region', 'Happiness Rank', 'Happiness Score',
       'Standard Error', 'Economy (GDP per Capita)', 'Family',
       'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)',
       'Generosity', 'Dystopia Residual'],
      dtype='object')
Index(['Country', 'Region', 'Happiness Rank', 'Happiness Score',
       'Lower Confidence Interval', 'Upper Confidence Interval',
       'Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)',
       'Freedom', 'Trust (Government Corruption)', 'Generosity',
       'Dystopia Residual'],
      dtype='object')
Index(['Country', 'Happiness.Rank', 'Happiness.Score', 'Whisker.high',
       'Whisker.low', 'Economy..GDP.per.Capita.', 'Family',
       'Health..Life.Expectancy.', 'Freedom', 'Generosity',
       'Trust..Government.Corruption.', 'Dystopia.Residual'],
      dtype='object')
Index(['Overall rank', 'Country or region', 'Score', 'GDP per capita',
       'Social support', 'Healthy life expectancy',
       'Freedom to make life choices', 'Generosity',
       'Perceptions of corruption'],
      dtype='object')
Index(['Overall rank', 'Country or region', 'Score', 'GDP per capita',
       'Social support', 'Healthy life expectancy',
       'Freedom to make life choices', 'Generosity',
       'Perceptions of corruption'],
      dtype='object')

Then we need to unify the column names.

happiness2017.columns = [
    "Country",
    "Happiness Rank",
    "Happiness Score",
    "Whisker High",
    "Whisker Low",
    "Economy (GDP per Capita)",
    "Family",
    "Health (Life Expectancy)",
    "Freedom",
    "Generosity",
    "Trust (Government Corruption)",
    "Dystopia Residual",
]

happiness2018.columns = [
    "Happiness Rank",
    "Country",
    "Happiness Score",
    "Economy (GDP per Capita)",
    "Social Support",
    "Health (Life Expectancy)",
    "Freedom",
    "Generosity",
    "Trust (Government Corruption)",
]

happiness2019.columns = [
    "Happiness Rank",
    "Country",
    "Happiness Score",
    "Economy (GDP per Capita)",
    "Social Support",
    "Health (Life Expectancy)",
    "Freedom",
    "Generosity",
    "Trust (Government Corruption)",
]

# Add Region column to happiness2017, happiness2018, and happiness2019
names = locals()
for year in range(2017, 2020):
    for i in range(2015, 2017):
        names["happiness" + str(year)] = names["happiness" + str(year)].merge(
            names["happiness" + str(i)][["Country", "Region"]], on="Country", how="left"
        )
    null_index = names["happiness" + str(year)][
        names["happiness" + str(year)].isnull().T.any()
    ].index.to_list()
    for ind in null_index:
        if names["happiness" + str(year)].loc[ind, "Region_x"] is np.nan:
            names["happiness" + str(year)].loc[ind, "Region_x"] = names[
                "happiness" + str(year)
            ].loc[ind, "Region_y"]
    names["happiness" + str(year)] = (
        names["happiness" + str(year)]
        .drop(["Region_y"], axis=1)
        .rename(columns={"Region_x": "Region"})
    )

Because in each year, the country being involved in the ranking were different and the names for some countries changed, the Region column would have some null values. We need to mannually fill these null values.

happiness2017[happiness2017.isnull().T.any()]

	Country	Happiness Rank	Happiness Score	Whisker High	Whisker Low	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Generosity	Trust (Government Corruption)	Dystopia Residual	Region
32	Taiwan Province of China	33	6.422	6.494596	6.349404	1.433627	1.384565	0.793984	0.361467	0.258360	0.063829	2.126607	NaN
70	Hong Kong S.A.R., China	71	5.472	5.549594	5.394406	1.551675	1.262791	0.943062	0.490969	0.374466	0.293934	0.554633	NaN

happiness2017.loc[32, "Region"] = 'Eastern Asia'
happiness2017.loc[70, "Region"] = 'Eastern Asia'

happiness2018[happiness2018.isnull().T.any()]

	Happiness Rank	Country	Happiness Score	Economy (GDP per Capita)	Social Support	Health (Life Expectancy)	Freedom	Generosity	Trust (Government Corruption)	Region
19	20	United Arab Emirates	6.774	2.096	0.776	0.670	0.284	0.186	NaN	Middle East and Northern Africa
37	38	Trinidad & Tobago	6.192	1.223	1.492	0.564	0.575	0.171	0.019	NaN
57	58	Northern Cyprus	5.835	1.229	1.211	0.909	0.495	0.179	0.154	NaN

happiness2018.loc[37, "Region"] = 'Latin America and Caribbean'
happiness2018.loc[57, "Region"] = 'Western Europe'

happiness2019[happiness2019.isnull().T.any()]

	Happiness Rank	Country	Happiness Score	Economy (GDP per Capita)	Social Support	Health (Life Expectancy)	Freedom	Generosity	Trust (Government Corruption)	Region
38	39	Trinidad & Tobago	6.192	1.231	1.477	0.713	0.489	0.185	0.016	NaN
63	64	Northern Cyprus	5.718	1.263	1.252	1.042	0.417	0.191	0.162	NaN
83	84	North Macedonia	5.274	0.983	1.294	0.838	0.345	0.185	0.034	NaN
119	120	Gambia	4.516	0.308	0.939	0.428	0.382	0.269	0.167	NaN

happiness2019.loc[38, "Region"] = 'Latin America and Caribbean'
happiness2019.loc[63, "Region"] = 'Western Europe'
happiness2019.loc[83, "Region"] = 'Central and Eastern Europe'
happiness2019.loc[119, "Region"] = 'Sub-Saharan Africa'

Now let's look at the indicators of each dataset.

print(happiness2015.shape)
happiness2015.describe()

(158, 12)

	Happiness Rank	Happiness Score	Standard Error	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Trust (Government Corruption)	Generosity	Dystopia Residual
count	158.000000	158.000000	158.000000	158.000000	158.000000	158.000000	158.000000	158.000000	158.000000	158.000000
mean	79.493671	5.375734	0.047885	0.846137	0.991046	0.630259	0.428615	0.143422	0.237296	2.098977
std	45.754363	1.145010	0.017146	0.403121	0.272369	0.247078	0.150693	0.120034	0.126685	0.553550
min	1.000000	2.839000	0.018480	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.328580
25%	40.250000	4.526000	0.037268	0.545808	0.856823	0.439185	0.328330	0.061675	0.150553	1.759410
50%	79.500000	5.232500	0.043940	0.910245	1.029510	0.696705	0.435515	0.107220	0.216130	2.095415
75%	118.750000	6.243750	0.052300	1.158448	1.214405	0.811013	0.549092	0.180255	0.309883	2.462415
max	158.000000	7.587000	0.136930	1.690420	1.402230	1.025250	0.669730	0.551910	0.795880	3.602140

print(happiness2016.shape)
happiness2016.describe()

(157, 13)

	Happiness Rank	Happiness Score	Lower Confidence Interval	Upper Confidence Interval	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Trust (Government Corruption)	Generosity	Dystopia Residual
count	157.000000	157.000000	157.000000	157.000000	157.000000	157.000000	157.000000	157.000000	157.000000	157.000000	157.000000
mean	78.980892	5.382185	5.282395	5.481975	0.953880	0.793621	0.557619	0.370994	0.137624	0.242635	2.325807
std	45.466030	1.141674	1.148043	1.136493	0.412595	0.266706	0.229349	0.145507	0.111038	0.133756	0.542220
min	1.000000	2.905000	2.732000	3.078000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.817890
25%	40.000000	4.404000	4.327000	4.465000	0.670240	0.641840	0.382910	0.257480	0.061260	0.154570	2.031710
50%	79.000000	5.314000	5.237000	5.419000	1.027800	0.841420	0.596590	0.397470	0.105470	0.222450	2.290740
75%	118.000000	6.269000	6.154000	6.434000	1.279640	1.021520	0.729930	0.484530	0.175540	0.311850	2.664650
max	157.000000	7.526000	7.460000	7.669000	1.824270	1.183260	0.952770	0.608480	0.505210	0.819710	3.837720

print(happiness2017.shape)
happiness2017.describe()

(155, 13)

	Happiness Rank	Happiness Score	Whisker High	Whisker Low	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Generosity	Trust (Government Corruption)	Dystopia Residual
count	155.000000	155.000000	155.000000	155.000000	155.000000	155.000000	155.000000	155.000000	155.000000	155.000000	155.000000
mean	78.000000	5.354019	5.452326	5.255713	0.984718	1.188898	0.551341	0.408786	0.246883	0.123120	1.850238
std	44.888751	1.131230	1.118542	1.145030	0.420793	0.287263	0.237073	0.149997	0.134780	0.101661	0.500028
min	1.000000	2.693000	2.864884	2.521116	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.377914
25%	39.500000	4.505500	4.608172	4.374955	0.663371	1.042635	0.369866	0.303677	0.154106	0.057271	1.591291
50%	78.000000	5.279000	5.370032	5.193152	1.064578	1.253918	0.606042	0.437454	0.231538	0.089848	1.832910
75%	116.500000	6.101500	6.194600	6.006527	1.318027	1.414316	0.723008	0.516561	0.323762	0.153296	2.144654
max	155.000000	7.537000	7.622030	7.479556	1.870766	1.610574	0.949492	0.658249	0.838075	0.464308	3.117485

print(happiness2018.shape)
happiness2018.describe()

(156, 10)

	Happiness Rank	Happiness Score	Economy (GDP per Capita)	Social Support	Health (Life Expectancy)	Freedom	Generosity	Trust (Government Corruption)
count	156.000000	156.000000	156.000000	156.000000	156.000000	156.000000	156.000000	155.000000
mean	78.500000	5.375917	0.891449	1.213237	0.597346	0.454506	0.181006	0.112000
std	45.177428	1.119506	0.391921	0.302372	0.247579	0.162424	0.098471	0.096492
min	1.000000	2.905000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
25%	39.750000	4.453750	0.616250	1.066750	0.422250	0.356000	0.109500	0.051000
50%	78.500000	5.378000	0.949500	1.255000	0.644000	0.487000	0.174000	0.082000
75%	117.250000	6.168500	1.197750	1.463000	0.777250	0.578500	0.239000	0.137000
max	156.000000	7.632000	2.096000	1.644000	1.030000	0.724000	0.598000	0.457000

print(happiness2019.shape)
happiness2019.describe()

(156, 10)

	Happiness Rank	Happiness Score	Economy (GDP per Capita)	Social Support	Health (Life Expectancy)	Freedom	Generosity	Trust (Government Corruption)
count	156.000000	156.000000	156.000000	156.000000	156.000000	156.000000	156.000000	156.000000
mean	78.500000	5.407096	0.905147	1.208814	0.725244	0.392571	0.184846	0.110603
std	45.177428	1.113120	0.398389	0.299191	0.242124	0.143289	0.095254	0.094538
min	1.000000	2.853000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
25%	39.750000	4.544500	0.602750	1.055750	0.547750	0.308000	0.108750	0.047000
50%	78.500000	5.379500	0.960000	1.271500	0.789000	0.417000	0.177500	0.085500
75%	117.250000	6.184500	1.232500	1.452500	0.881750	0.507250	0.248250	0.141250
max	156.000000	7.769000	1.684000	1.624000	1.141000	0.631000	0.566000	0.453000

Visualization

Names for some countries in Pyecharts differed from those in the datasets, so we must change the name of some countries into those Pyecharts can recognize.

countries = {
    "South Korea": "Korea",
    "Congo (Kinshasa)": "Dem. Rep. Congo",
    "Congo (Brazzaville)": "Congo",
    "South Sudan": "S. Sudan",
    "Somaliland region": "Somalia",
    "Central African Republic": "Central African Rep.",
    "Ivory Coast": "Côte d'Ivoire",
    "Dominican Republic": "Dominican Rep.",
    "Czech Republic": "Czech Rep.",
    "North Korea": "Dem. Rep. Korea",
}

names = locals()
for key, val in countries.items():
    for i in range(2015, 2020):
        try:
            df_name = names["happiness" + str(i)]
            row_ind = int(df_name.loc[df_name["Country"] == key].index.values)
            names["happiness" + str(i)].loc[row_ind, "Country"] = countries[key]
        except (TypeError, KeyError):
            pass

Put all info into a dictionary for visualization.

data       = dict()
data[2015] = happiness2015.to_dict("list")
data[2016] = happiness2016.to_dict("list")
data[2017] = happiness2017.to_dict("list")
data[2018] = happiness2018.to_dict("list")
data[2019] = happiness2019.to_dict("list")

from pyecharts.charts import Map, Timeline, Tab
from pyecharts import options as opts
import math

col_list = [
    "Happiness Score",
    "Economy (GDP per Capita)",
    "Family",
    "Health (Life Expectancy)",
    "Freedom",
    "Trust (Government Corruption)",
    "Generosity",
    "Dystopia Residual",
]

tab = Tab()
for col in col_list:
    tl = Timeline()
    for year in range(2015, 2020):
        try:
            data_pair = []
            # range of values for each feature
            minimum = min(names["happiness" + str(year)][col].round(2))
            maximum = max(names["happiness" + str(year)][col].round(2))

            data_pair = [
                list(z)
                for z in zip(
                    names["happiness" + str(year)]["Country"].to_list(),
                    names["happiness" + str(year)][col].round(3),
                )
            ]

            c = (
                Map()
                .add(col, data_pair, "world", is_map_symbol_show=False)
                .set_series_opts(label_opts=opts.LabelOpts(is_show=False))
                .set_global_opts(
                    title_opts=opts.TitleOpts(title=f"World {col} Index of {year}"),
                    visualmap_opts=opts.VisualMapOpts(
                        min_=minimum, max_=maximum, is_piecewise=False
                    ),
                    legend_opts=opts.LegendOpts(is_show=False),
                )
            )
            tl = tl.add(c, year).add_schema(
                is_auto_play=False, play_interval=1000, is_loop_play=True
            )
        except KeyError:
            pass
    tab = tab.add(tl, col)
tab.render_notebook()

# Click tab above the charts to see the distribution of features.
# And click the timeline below the charts to swich between years.
# Click the play button to automatically see the changes between years.

In developed countries, like those in North America, Oceania, and Europe, happiness scores are higher than countries in Asia and Africa.

Situations are similar for GDP, Freedom, Trust, and Generosity. So we guess, higher GDP, Freedom, Trust, and Generosity correlated with higher happiness score.

World Family Index didn't differ much across countries. But in war region, like Iraq and Afghanistan, and in poor countries, like Sudan and some other countries in Africa, the values are low. Health Index shows similar distribution.

Dystopia Residual Index is extremely high in Latin America and in the other countries, there weren't much difference.

from pyecharts.charts import Scatter
from pyecharts.commons.utils import JsCode

col_list = [
    "Economy (GDP per Capita)",
    "Family",
    "Health (Life Expectancy)",
    "Freedom",
    "Trust (Government Corruption)",
    "Generosity",
    "Dystopia Residual",
]
tab = Tab()
for col in col_list:
    tl = Timeline()
    # tooltip format
    js_code_str = """
function(params){
return params.data[4]+' - '+params.data[3]+'<br/>'
+'X: '+params.data[5]+'<br/>'
+'Y: '+params.data[1];
}
"""
    for year in range(2015, 2020):
        try:
            df = names["happiness" + str(year)]
            region_list = df["Region"].unique().tolist()
            df["R_transfromed"] = df["Region"].apply(lambda x: region_list.index(x))
            y_data = [
                z
                for z in zip(
                    df[col].round(3),
                    df["R_transfromed"].to_list(),
                    df["Country"],
                    df["Region"],
                    df["Happiness Score"].round(3),
                )
            ]
            c = (
                Scatter()
                .add_xaxis(df["Happiness Score"].to_list())
                .add_yaxis("", y_data)
                .set_global_opts(
                    title_opts=opts.TitleOpts(
                        title=f"Relationship between Happiness Score and {col} in {year}"
                    ),
                    visualmap_opts=[
                        opts.VisualMapOpts(
                            is_show=False, type_="color", dimension=2, min_=0, max_=10
                        )
                    ],
                    xaxis_opts=opts.AxisOpts(
                        type_="value",
                        name="Happiness Score",
                        is_scale=True,
                        name_location="middle",
                    ),
                    yaxis_opts=opts.AxisOpts(type_="value", name=col, is_scale=True),
                    tooltip_opts=opts.TooltipOpts(formatter=JsCode(js_code_str)),
                )
                .set_series_opts(label_opts=opts.LabelOpts(is_show=False))
            )
            tl = tl.add(c, year).add_schema(
                is_auto_play=False, play_interval=1000, is_loop_play=True
            )
        except KeyError:
            pass
    tab = tab.add(tl, col)
tab.render_notebook()

From the scatter plot, we further prove our guess that higher GDP and Health Index correlated with higher happiness score. However, higher Freedom Index and Generosity Index dont't mean higher Happiness Score. For Trust Index, when it is smaller than 6, it doesn't show much correlation with happiness score; while when it is greater than 6, the positive correlation is much stronger. Family Index and Dystopia Residual Index also show a positive correlation with Happiness Score to some extent.

Countries in North America, Western Europe, and Latin America as well as Australia, New Zealand, Singapore, and some countries in Middle East are ranked as the happiest countries among the world, most of which are also among the richest countries. As for countries in Latin America, I think citizens' happiness may be rooted in their heart and naturally part of their nature. This might be explained by the enthusiasm and optimism.