Understanding XPath contains() Text — A Practical Guide
GuideLearn how XPath contains() works for matching partial text and attributes in HTML or XML, with practical examples for scraping and automation.
When working with XML or HTML documents — whether for web scraping, browser automation, or test frameworks like Selenium or Scrapy — you’ll often need to locate elements based on part of the text they contain. XPath’s contains() function makes this possible.
In this guide, we’ll explain what contains() does, why it’s useful, and how to use it effectively with real-world examples.
What contains() Does in XPath
contains() is a built-in XPath function that allows you to match elements using a substring rather than requiring an exact text match.
In real-world pages, visible text often changes slightly due to dynamic content, formatting, or UI updates. Using contains() makes your XPath selectors more flexible and resilient.
Basic syntax:
//tag[contains(text(), 'partial text')]
Where:
tagis the HTML or XML tag you’re targeting (e.g.,div,span,button)contains(text(), 'partial text')checks whether the element’s visible text includes the specified substring
Why Partial Text Matching Matters
Using contains() in XPath offers several advantages:
- Handles dynamic content where full text may change
- Makes scraping and automation scripts less brittle
- Improves test stability when UI labels vary slightly
- Reduces maintenance caused by small text changes
Partial text matching helps prevent selectors from breaking due to minor UI updates.
Core Syntax and Examples
Find Elements by Partial Text
To locate a paragraph element containing the word Welcome:
//p[contains(text(), 'Welcome')]
This matches elements like:
<p>Welcome back to the site!</p>
<p>Please, Welcome our new members</p>
Match Attributes Using contains()
You can also match substrings within attributes, such as class names:
//div[contains(@class, 'menu')]
This matches:
<div class="main-menu"></div>
<div class="menu-item large"></div>
Combine Multiple Conditions
XPath supports logical operators like and and or:
//a[contains(text(), 'Learn') and contains(@href, '/guide')]
This selects links that:
- Contain “Learn” in the visible text
- Have
/guidein thehrefattribute
Choosing Between Exact and Partial Text
XPath also allows exact text matching:
//button[text() = 'Submit']
This matches only elements with exactly “Submit” as the text.
If the text may vary (e.g., “Submit Now” or “Submit Form”), use:
//button[contains(text(), 'Submit')]
Partial matches are generally safer for dynamic pages.
Case Sensitivity and Edge Cases
XPath is case-sensitive by default. For example:
contains(text(), 'submit')
Will not match “Submit”.
To handle case differences, you can normalize text using translate().
Whitespace can also cause mismatches. Using normalize-space(text()) helps remove extra spaces before matching.
Practical Usage in Automation and Scraping
In real-world scraping workflows, having a dedicated tool can make complex tasks much easier. For example, if you’re scraping Twitter profile content or tweet data, tools like MrScraper’s Twitter Scraper Made Simple guide show how to set up and extract structured social media data efficiently.
Selenium Example (JavaScript)
let element = driver.findElement(
By.xpath("//a[contains(text(), 'Log In')]")
);
await element.click();
This matches links such as:
- “Log In”
- “Log In Here”
- “Click to Log In”
Scrapy Example (Python)
response.xpath("//a[contains(text(), 'article')]/@href").getall()
This extracts links whose anchor text contains the word “article”.
Performance Considerations
Overusing contains() — especially with // at the beginning — can slow down scraping or automation.
Best practices:
- Narrow the search scope whenever possible
- Combine
contains()with attributes or parent elements - Avoid overly broad XPath expressions
Example of a more efficient selector:
//div[@class='header']//*[contains(text(), 'Login')]
Conclusion
XPath’s contains() function is an essential tool for matching partial text or attribute values in HTML and XML documents. It makes selectors more flexible, resilient, and suitable for real-world scraping and automation tasks.
By combining contains() with logical operators, attribute filters, and functions like normalize-space() or starts-with(), you can build powerful XPath expressions that remain reliable even when page content changes.
Find more insights here
Twitter Profile Scraper — What It Is and How It Works
Learn what a Twitter profile scraper is, how it works, common use cases, technical challenges, and b...
Why cURL Doesn't Follow Redirects by Default (and How to Fix It)
Understand how cURL handles HTTP redirects, why it doesn’t follow them automatically, and how to con...
What Is an API Call — A Deep Dive Into How Applications Communicate
Learn what an API call is, how it works, common HTTP methods, real-world examples, and why API calls...