6. Scientist Names#

import pandas as pd
import matplotlib.pyplot as plt

6.1. Get Data#

df_raw = pd.read_json("https://raw.githubusercontent.com/dariusk/corpora/master/data/humans/scientists.json")
df_raw.head()
description scientists
0 List of particularly famous scientists Aage Bohr
1 List of particularly famous scientists Abdul Qadeer Khan
2 List of particularly famous scientists Abu Nasr Al-Farabi
3 List of particularly famous scientists Ada Lovelace
4 List of particularly famous scientists Adalbert Czerny

6.2. Prepare the dataset#

df = df_raw[ ['scientists'] ].copy()
df['first_name'] = df.scientists.apply( lambda x: x.split(' ')[0] )
df['last_name'] = df.scientists.apply( lambda x: x.split(' ')[-1] )
df['len_diff'] = df.apply( lambda x: len(x.first_name) - len(x.last_name), axis=1 )
df.head()
scientists first_name last_name len_diff
0 Aage Bohr Aage Bohr 0
1 Abdul Qadeer Khan Abdul Khan 1
2 Abu Nasr Al-Farabi Abu Al-Farabi -6
3 Ada Lovelace Ada Lovelace -5
4 Adalbert Czerny Adalbert Czerny 2

6.3. Plots#

fig, ax = plt.subplots( figsize=(10,6.18) )
df.plot.hist(ax=ax)
<AxesSubplot:ylabel='Frequency'>
../../_images/scientist_names_10_1.png