HTML tables store rows in the <tr>...</tr> tag and each row has corresponding <td>...</td> cells for holding values. In pandas, we can also read the HTML tables from a file or URL. The read_html() function reads an HTML table from a file or URL and returns HTML tables into a list of pandas DataFrames:
# Reading HTML table from given URL
table_url = 'https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_North_America'
df_list = pd.read_html(table_url)
print("Number of DataFrames:",len(df_list))
This results in the following output:
Number of DataFrames: 7
In the preceding code example, we have read the HTML table from a given web page using the read_html() method. read_html() will return all the tables as a list of DataFrames. Let's check one of the DataFrames from the list:
# Check first DataFrame
df_list[0].head()
This results in the following output:
In the preceding...