Contents

Scientist Names

Contents

6. Scientist Names#

import pandas as pd

import matplotlib.pyplot as plt

6.1. Get Data#

df_raw = pd.read_json("https://raw.githubusercontent.com/dariusk/corpora/master/data/humans/scientists.json")

df_raw.head()

	description	scientists
0	List of particularly famous scientists	Aage Bohr
1	List of particularly famous scientists	Abdul Qadeer Khan
2	List of particularly famous scientists	Abu Nasr Al-Farabi
3	List of particularly famous scientists	Ada Lovelace
4	List of particularly famous scientists	Adalbert Czerny

6.2. Prepare the dataset#

df = df_raw[ ['scientists'] ].copy()
df['first_name'] = df.scientists.apply( lambda x: x.split(' ')[0] )
df['last_name'] = df.scientists.apply( lambda x: x.split(' ')[-1] )
df['len_diff'] = df.apply( lambda x: len(x.first_name) - len(x.last_name), axis=1 )

df.head()

	scientists	first_name	last_name	len_diff
0	Aage Bohr	Aage	Bohr	0
1	Abdul Qadeer Khan	Abdul	Khan	1
2	Abu Nasr Al-Farabi	Abu	Al-Farabi	-6
3	Ada Lovelace	Ada	Lovelace	-5
4	Adalbert Czerny	Adalbert	Czerny	2

6.3. Plots#

fig, ax = plt.subplots( figsize=(10,6.18) )
df.plot.hist(ax=ax)

<AxesSubplot:ylabel='Frequency'>

../../_images/scientist_names_10_1.png