Acquiring the Data

Date formatting

The website mentioned in the previous lesson follows the following format.

https://us.trend-calendar.com/trend/{date}.html

The {date} has to be replaced by the date that we want a word cloud of. It has to be in the YYYY-MM-DD format. For ease, we’llll scrape the data in intervals of seven days in the following way, [2020–01–01, 2020–01–08, 2020–01–15, 2020–01–22 …… ]

Generate the dates

The pandas library has a function date_range(), which is like the range() function but for dates. The function takes the start date, end date, and frequency as parameters.

def get_dates():
    dates = pd.date_range('2020-01-01','2020-12-27',freq='7d')
    dates = [d.strftime('%Y-%m-%d') for d in dates]
    return dates

Define a function to get data for a given day

We’ll only store the top 10 keywords and hashtags. The following figure illustrates the tags for Twitter hashtags.

Get hands-on with 1400+ tech skills courses.