Handling Irrelevant Data
Learn how to deal with irrelevant data using Python.
Irrelevant data
Irrelevant data is data that we don’t need during analysis. For example, if we have a dataset containing 15 columns and only need 10 columns to answer our project’s research question, we can delete or drop the five columns we don’t need. This step guarantees that our work is much more readable by other stakeholders and helps us avoid using irrelevant data during analysis, which could impact the project outcomes.
In the same way that we delete columns we don’t need, we can also delete rows that we don’t need. For instance, in a dataset that contains records for the past 10 years, we can choose to work with only the records for the past year and delete the rest if they don’t help answer our research question. Deleting irrelevant data can help conserve memory space and reduce the project’s complexity.
Deleting an irrelevant column
We use the drop()
function to delete an irrelevant column. During such instances, we pass three parameters to it.
The ...