Pandas is very effective in scraping html tables and converting them directly to the dataframes. This way we can easily scrape Hang Seng index components from Yahoo! Finance without calling their API.
import pandas as pd
# There there is only one html table on this Yahoo! Finance page
payload=pd.read_html('https://finance.yahoo.com/quote/%5EHSI/components/')
table_0 = payload[0]
df = table_0
df.head()
Tickers:
symbols = df['Symbol'].values.tolist()
print(symbols[:10]) # first few tickers
Company names:
names = df['Company Name'].values.tolist()
print(names[:10]) # first few company names
All companies in Hang Seng index:
df
Source:
Some web pages might be blocking pandas web scraping. In such case follow this link: