US Hurricane Data: 1851–2019

Learn how to clean up hurricane data using different Python scripts.

Cleaning up the hurricane data

If you quickly look through the webpage itself, you’ll see some formatting that’ll need cleaning up. Each decade is introduced with a single row containing nothing but a string looking like “1850s” and so on… We’ll want to drop those rows. Years with no events have the string “None” in the second column. Those, too, will need to go.

Some events have no data for their “Max Wind (kt)” speeds. Instead of a number (measured in knots), the speed values for those events are represented by five dashes (-----). We’ll have to convert that to something we can work with. And finally, while three-letter abbreviations generally represent months, a couple of events stretched across two months. To be able to process those properly, we’ll convert “Sp-Oc” and “Jl-Au” to “Sep and Jul” respectively. The fact is that we won’t be using the month column, so this won’t make any difference. But it’s an excellent tool to know.

Let’s look at the data types for each column. We can ignore the strings in the “States” and “Name” column — we’re not interested in those anyway. But we will need to do something with the ”Year” and “Max Wind (kt)” columns — they won’t do us any good as an object.

Here’s how we set things up.

Get hands-on with 1300+ tech skills courses.