We often need to iterate over files with only a specific extension. If we want to read all the CSV files in a folder, we’ll only iterate over files with the extension “.csv.” Similarly, if we want to iterate over all the images in a folder, we'll use files with extensions such as “.png,” “.jpg,” etc.
Two libraries in Python are commonly used when dealing with files: Glob and OS. In this Answer, we'll learn to use each for our desired tasks.
We use the OS
library if we want to use the operating or file system. We can import the library and list down all the files in a folder with the following command (path
will represent the path of the folder):
import osos.listdir(path)
If we don't specify a path
, the command will get the names of all the files within the current directory.
The following code shows different ways of iterating over files with specific extensions within a folder using os
:
# Importing OS library and then printing all files in the 'All' folderimport osallfiles = os.listdir("./All") # 'allfiles' will have names of all files in the folderprint("Following are all the files inside 'All' Folder: \n\n", allfiles)print("\n\n") # Empty lines# Code below will display all files with "csv" extensionprint("Following are files inside 'All' Folder with '.csv' extension: \n")for name in allfiles:if name.endswith("csv"): #endswith requires a string or tupleprint(name)print("\n\n") # Empty lines# Code below will display all files with "csv" or "xlsx" extensionprint("Following are files inside 'All' Folder with '.csv' or '.xlsx' extension: \n")for name in allfiles:if name.endswith(("csv", "xlsx")): #endswith requires a string or tupleprint(name)
Lines 10–13: Print all three filenames in the “All”
folder with the csv
extension.
Lines 19–22: Print all the six filenames in the “All”
folder with the csv
and xlsx
extension.
The Glob library is primarily used for searching through files. It can either be given explicit extensions to find specific files in a folder or be given an *
to find all the files within a folder.
We can import the library and list down all the files in a folder with the following command (path
will represent the path to the folder):
import globglob.glob("path/*")
The remaining string will be returned if the brackets in the command above do not contain an *
. If the brackets are empty, then the command will return nothing.
However, if we only put the *
in the brackets, all the files within the current directory will be listed. The filenames we get using glob
will consist of the path
as well. The returned filename will be listed as path/filename
.
The following code shows different ways of iterating over files with specific extensions within a folder using glob
:
# Importing OS library and then printing all files in the 'All' folderimport globallfiles = glob.glob("All/*") # 'allfiles' will have names of all files in the folderprint("Following are all the files inside 'All' Folder: \n\n", allfiles)print("\n\n") # Empty lines# Code below will display all files with "csv" extensionprint("Following are files inside 'All' Folder with '.csv' extension: \n")print(glob.glob("All/*.csv"))print("\n\n") # Empty lines# Code below will display all files with "csv" or "xlsx" extensionprint("Following are files inside 'All' Folder with '.csv' or '.xlsx' extension: \n")print(glob.glob("All/*.csv") + glob.glob("All/*.xlsx"))
Lines 10–11: Print all the three filenames in the “All”
folder with the csv
extension
Lines 17–18: Print all the six filenames in the “All”
folder with the csv
and xlsx
extension.
One major difference between glob
and os
is that the former returns all the filenames within a list. This can be useful in cases where we need to store the filenames for future use.
In this Answer, we learned how to iterate over the files in a folder having specific extensions using two of Python's standard libraries, Glob and OS.
Free Resources