Python glob: How to Use Pattern Matching for File and Directory Search
Article

Python glob: How to Use Pattern Matching for File and Directory Search

Article

If you’ve ever needed to find or filter files in a directory using flexible rules, Python’s built-in **`glob` module** is one of the simplest and most effective tools available.

If you’ve ever needed to find or filter files in a directory using flexible rules, Python’s built-in glob module is one of the simplest and most effective tools available. It lets you search for filenames and paths that match shell-style wildcard patterns, similar to typing commands in a Unix or Linux shell, without writing complex directory traversal logic by hand.

In this article you’ll learn what glob does, why it’s useful, how its pattern matching works, and how to use it in real Python scripts. We’ll also look at recursive searching and iterator-based alternatives for more efficient file handling.

What the glob Module Is

The glob module comes with the Python standard library and provides functions to find pathnames matching a specified pattern according to rules similar to those used by the Unix shell. Patterns can include special wildcard characters such as:

  • * — matches zero or more characters
  • ? — matches exactly one character
  • [abc] or [a-z] — character ranges or sets

Unlike scanning directories manually with os.listdir() and filtering results, glob matches patterns directly and returns the file and directory names that fit those patterns.

By default, results may be returned in arbitrary order, so if you need them sorted you can apply Python’s sorted() to the output.

Basic Usage of glob.glob()

The most common function is glob() itself. It takes a pattern string that represents the file search you want, and returns a list of matching paths.

Example: Find Files with a Specific Extension

import glob # List all Python files in the current directory python_files = glob.glob("*.py") print("Python files:", python_files)

Explanation:

  • *.py matches any filename ending in .py.
  • The returned list contains all paths that fit the pattern in the current directory.

Wildcards and Character Ranges

You can mix and match wildcards and ranges to create powerful filters:

import glob # Files with names that begin with a digit and any extension digit_files = glob.glob("[0-9]*.*") print("Digit files:", digit_files) # Files with exactly four characters before .txt four_char_txt = glob.glob("????.txt") print("Four char .txt files:", four_char_txt)

Explanation:

  • [0-9]*.* finds files starting with a digit.
  • ????.txt matches files where exactly four characters precede the .txt extension.

Recursive Searching with **

Since Python 3.5, the glob module supports recursive searching using the ** pattern combined with the recursive=True argument. This lets you match files not only in the top-level directory but also within all nested subdirectories.

Example: Recursive Search

import glob # Find all text files in the current directory and subdirectories text_files = glob.glob("**/*.txt", recursive=True) for file in text_files: print(file)

Explanation:

  • "**/*.txt" means “search all directories and subdirectories for files ending in .txt.”
  • recursive=True enables the special meaning of ** for nested directories.

This pattern makes it easy to gather large collections of files from deep directory trees with minimal code.

Using an Iterator with glob.iglob()

If you expect a very large number of matches and want to process them one by one without storing the full list in memory, you can use iglob(), which returns an iterator instead of a list:

import glob for path in glob.iglob("**/*.log", recursive=True): print("Log file:", path)

This approach is particularly useful when scanning large file systems or when memory efficiency is a priority.

Advanced Pattern Matching

Beyond * and ?, you can use character sets to match one of several characters. For example:

import glob # Match all files that end with either .jpg or .png image_files = glob.glob("*.[jp][pn]g") print(image_files)

Here, *.[jp][pn]g uses character sets to match common image extensions. It’s not as flexible as full regular expressions, but it’s simpler for common file patterns.

If you need literal matching of wildcard characters in your pattern (for example, you really want to match a filename that includes *), the glob.escape() function can sanitize those characters so they won’t be treated as wildcards.

Using glob with Other Python Modules

Often you’ll use glob together with modules like os or pathlib to perform file operations. For example, you might process each matched file with os.path functions or move them with shutil. In modern Python, pathlib also offers Path.glob() and Path.rglob() methods that integrate globbing into object-oriented path manipulation.

Common Use Cases

The glob module is typically used for:

  • Listing files of a given extension or pattern in a directory
  • Searching nested folders for specific file types
  • Filtering files before batch processing (e.g., renaming, copying, parsing)
  • Collecting file paths for further automation workflows, such as automatic testing or data ingestion

Because it’s part of the standard library, glob is widely available and doesn’t require extra dependencies, making it ideal for quick scripting tasks.

Conclusion

Python’s glob module offers a powerful yet approachable way to find files and directories based on pattern matching rules similar to the Unix shell. Whether you need a quick list of Python scripts, a recursive set of logs, or complex filters based on character patterns, glob gives you flexible control with minimal code.

For situations where you need more advanced path objects or chained operations, combining glob with pathlib opens the door to both powerful pattern matching and modern, readable filesystem code.

Table of Contents

    Take a Taste of Easy Scraping!