Acquiring the Data
Learn how to acquire data from the website in a specified form.
Date formatting
The website mentioned in the previous lesson follows the following format.
https://us.trend-calendar.com/trend/{date}.html
The {date}
has to be replaced by the date that we want a word cloud of. It has to be in the YYYY-MM-DD
format. For ease, we’llll scrape the data in intervals of seven days in the following way, [2020–01–01, 2020–01–08, 2020–01–15, 2020–01–22 …… ]
Generate the dates
The pandas library has a function date_range()
, which is like the range()
function but for dates. The function takes the start date
, end date
, and frequency
as parameters.
def get_dates():
dates = pd.date_range('2020-01-01','2020-12-27',freq='7d')
dates = [d.strftime('%Y-%m-%d') for d in dates]
return dates
Define a function to get data for a given day
We’ll only store the top 10 keywords and hashtags. The following figure illustrates the tags for Twitter hashtags.
Get hands-on with 1400+ tech skills courses.