The dataset comes from the National Center for Education Statistics. In this article, I will demonstrate some data cleaning and dataframe manipulation techniques. Hope this helps!
For everything you may need, visit:
Source Data
nces-ed-attainment.csv
National Center for Education.py

The original dataset is titled: Percentage of persons 25 to 29 years old with selected levels of educational attainment, by race/ethnicity and sex: Selected years, 1920 through 2018. The cleaned version has columns for Year, Sex, Educational Attainment, and race/ethnicity categories considered in the dataset. Note that not all columns will have data starting at 1920.

Exploring the data

# packages needed
import pandas as pd
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv('nces-ed-attainment.csv')
df.head()

	Year	Sex	Min degree	Total	White	Black	Hispanic	Asian	Pacific Islander	American Indian/Alaska Native	Two or more races
0	1920	A	high school	---	22.0	6.3	---	---	---	---	---
1	1940	A	high school	38.1	41.2	12.3	---	---	---	---	---
2	1950	A	high school	52.8	56.3	23.6	---	---	---	---	---
3	1960	A	high school	60.7	63.7	38.6	---	---	---	---	---
4	1970	A	high school	75.4	77.8	58.4	---	---	---	---	---

There are so many "---" in dataset. We should convert it into null values for further manipulation.

df = df.replace('---', np.nan)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 11 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Year                           214 non-null    int64  
 1   Sex                            214 non-null    object 
 2   Min degree                     214 non-null    object 
 3   Total                          214 non-null    object 
 4   White                          214 non-null    float64
 5   Black                          214 non-null    float64
 6   Hispanic                       214 non-null    object 
 7   Asian                          214 non-null    object 
 8   Pacific Islander               214 non-null    object 
 9   American Indian/Alaska Native  214 non-null    object 
 10  Two or more races              214 non-null    object 
dtypes: float64(2), int64(1), object(8)
memory usage: 18.5+ KB

Several columns have wrong type.

race = [
    "Total",
    "Hispanic",
    "Asian",
    "Pacific Islander",
    "American Indian/Alaska Native",
    "Two or more races",
]
df[race] = df[race].apply(pd.to_numeric)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 11 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Year                           214 non-null    int64  
 1   Sex                            214 non-null    object 
 2   Min degree                     214 non-null    object 
 3   Total                          212 non-null    float64
 4   White                          214 non-null    float64
 5   Black                          214 non-null    float64
 6   Hispanic                       204 non-null    float64
 7   Asian                          168 non-null    float64
 8   Pacific Islander               93 non-null     float64
 9   American Indian/Alaska Native  131 non-null    float64
 10  Two or more races              157 non-null    float64
dtypes: float64(8), int64(1), object(2)
memory usage: 18.5+ KB

Have a look at some indicators of each column.

df.describe(include="all").T

	count	unique	top	freq	mean	std	min	25%	50%	75%	max
Year	214.0	NaN	NaN	NaN	2005.48	15.6	1920.0	2005.0	2010.0	2014.0	2018.0
Sex	214	3	A	78	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Min degree	214	4	high school	59	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Total	212.0	NaN	NaN	NaN	42.77	30.15	4.1	22.12	36.1	84.4	94.0
White	214.0	NaN	NaN	NaN	47.21	31.32	4.5	22.3	42.65	85.9	96.4
Black	214.0	NaN	NaN	NaN	35.19	32.36	1.1	10.12	23.2	70.62	93.5
Hispanic	204.0	NaN	NaN	NaN	27.68	26.85	0.6	7.6	16.6	57.02	87.2
Asian	168.0	NaN	NaN	NaN	62.02	27.03	15.0	46.55	66.0	81.05	98.5
Pacific Islander	93.0	NaN	NaN	NaN	50.79	33.24	10.0	24.0	34.0	90.7	100.0
American Indian/Alaska Native	131.0	NaN	NaN	NaN	42.67	32.1	2.1	16.55	24.4	83.05	95.1
Two or more races	157.0	NaN	NaN	NaN	44.46	31.58	2.9	25.7	34.7	87.9	98.2

Visulization

Correlation Matrix Heatmap

cormat = df[race].corr().round(2)
sns.heatmap(cormat, annot=True)

Enrollment Date Distribution

f, axes = plt.subplots(4, 2, sharey=False, figsize=(15, 12))
for ind, val in enumerate(race):
    sns.histplot(
        data=df,
        x=val,
        hue="Min degree",
        label="100% Equities",
        kde=True,
        stat="density",
        linewidth=0,
        ax=axes[ind // 2, ind % 2],
        bins=200,
    ).set(title=val)
f.tight_layout(pad=3.0)
plt.show()

For Black, Hispanic, Pacific Islander people and people with two or more races, the enrollment rate is low for most of the past time. For Asian and White people, the overall enrollment rate is high for most of the past time.

f, axes = plt.subplots(4, 2, sharey=False, figsize=(14, 14))
for ind, val in enumerate(race):
    sns.lineplot(
        data=df, x="Year", y=val, hue="Min degree", ci=None, ax=axes[ind // 2, ind % 2]
    )

The Black and White people has low enrollment rate at the 1920s to 1970s, but the enrollments increased dramatically. For Black, White and Hispanic people, the enrollment rates for the four degrees has been increasing overtime. The enrollment rates of the four degrees keeps quite static since 2006 for Asian, Pacific Islander, American Indian/Alaska Native and people with two or more races.

Questions

Q1

What are the percent of different degrees completed for a given year range and sex? Parameter arguments are as follows: two year arguments, and a value for sex (’A’, ’F’, or ’M’). Function should return all rows of the data which match the given sex, and have data between the given years (inclusive for the start, exclusive for the end). If no data is found for the parameters, return Python keyword None.

def completion_bet_years(dataframe, year1, year2, sex):
    """ Return percent of different degrees completed between year1 and year2 for Sex==sex.

    Args:
    dataframe (DataFrame): A dataframe containing the needed data
    year1     (int)      : Year number: the  earlier one
    year2     (int)      : Year number: the  later one
    sex       (str)      : Gender

    Returns:
    DataFrame: The percent of degrees completed between year1 and year2 for Sex==sex.
    """
    result = dataframe.loc[
        (dataframe["Sex"] == sex)
        & (dataframe["Year"] >= year1)
        & (dataframe["Year"] < year2)
    ]
    if result.shape[0] == 0:
        return None
    else:
        return result

print(completion_bet_years(df, 1920, 1941, "A"))

    Year Sex   Min degree  Total  White  Black  Hispanic  Asian  \
0   1920   A  high school    NaN   22.0    6.3       NaN    NaN   
1   1940   A  high school   38.1   41.2   12.3       NaN    NaN   
39  1920   A   bachelor's    NaN    4.5    1.2       NaN    NaN   
40  1940   A   bachelor's    5.9    6.4    1.6       NaN    NaN   

    Pacific Islander  American Indian/Alaska Native  Two or more races  
0                NaN                            NaN                NaN  
1                NaN                            NaN                NaN  
39               NaN                            NaN                NaN  
40               NaN                            NaN                NaN

Q2

What were the percentages for women vs men having earned a Bachelor’s Degree in a given year? Parameter list argument is the year in question and return the percentages as a tuple: (% for men, % for women)

def compare_bachelors_in_year(dataframe, year):
    """ Return the percentages for women vs men having earned a Bachelor’s Degree in Year==year

    Args:
    df   (Dataframe): A dataframe containing the needed data
    year (int)      : The year in which you want to know the info

    Returns:
    tuple: A tuple returns the percentage of Bachelor’s Degree male and femal in Year==year.
    """
    women = dataframe[
        (dataframe["Sex"] == "F")
        & (dataframe["Min degree"] == "bachelor's")
        & (dataframe["Year"] == year)
    ].iloc[0]["Total"]
    men = dataframe[
        (dataframe["Sex"] == "M")
        & (dataframe["Min degree"] == "bachelor's")
        & (dataframe["Year"] == year)
    ].iloc[0]["Total"]
    return (f"{men} % for men", f"{women} % for women")

compare_bachelors_in_year(df, 2010)

('27.8 % for men', '35.7 % for women')

Q3

What were the two most commonly awarded levels of educational attainment awarded between 2000-2010 (inclusive)? Use the mean percent over the years to compare the education levels. Return a list as follows: [#1 level, mean % of #1 level, #2 level, mean % of #2 level].

def top_2_2000s(dataframe):
    """Return the two most common educational attainment between 2000-2010.

    Args:
    DataFrame (DataFrame): A dataframe containing the needed data

    Returns:
    list: A list returns the two most common educational attainment between 2000-2010.
    """
    df_2000s = dataframe.loc[
        (dataframe["Year"] >= 2000)
        & (dataframe["Year"] <= 2010)
        & (dataframe["Sex"] == "A")
    ][["Total", "Min degree"]]
    mean_percent_edu = (
        df_2000s.groupby("Min degree").mean().sort_values(by="Total", ascending=False)
    )
    level1 = mean_percent_edu.iloc[0]["Total"].round(2)
    level2 = mean_percent_edu.iloc[1]["Total"].round(2)
    edu1 = mean_percent_edu.index[0]
    edu2 = mean_percent_edu.index[1]
    return [f"#1 level, {level1} % of {edu1}", f"#2 level, {level2} % of {edu2}"]

top_2_2000s(df)

['#1 level, 87.557 % of high school', "#2 level, 38.757 % of associate's"]