Understanding XPath Contains Text — A Practical Guide
Article

Understanding XPath Contains Text — A Practical Guide

Guide

When you’re working with XML or HTML documents, whether in web scraping, automation, or test frameworks like Selenium or Scrapy, there are often times when you need to locate an element *based on part of the text it contains*.

When you’re working with XML or HTML documents, whether in web scraping, automation, or test frameworks like Selenium or Scrapy, there are often times when you need to locate an element based on part of the text it contains. XPath’s contains() function makes this possible, and in this article we’ll explain what it does, why it’s useful, and how to use it with real examples.

What contains() Does in XPath

contains() is a built-in XPath function that lets you match elements based on a substring rather than requiring an exact match. In many real-world pages, the visible text inside elements can vary slightly — it might include dynamic data, extra words, or formatting changes. Using contains() lets you locate elements if they include a certain text fragment, making your selectors more flexible and robust.

The basic syntax looks like this:

//tag[contains(text(), 'partial text')]

Here:

  • tag is the HTML (or XML) tag you’re targeting, such as div, span, or button.
  • contains(text(), 'partial text') checks if the visible text of the element includes partial text.

Why Partial Text Matching Matters

XPath expressions with contains() give you several advantages:

  • They handle dynamic content, where the full text may change but part of it remains predictable.
  • They make scrapers or automation scripts less brittle, since they don’t depend on exact text.
  • They are useful in test automation when UI labels vary slightly under different conditions.

By using partial text matches, you avoid brittle locators that break whenever the text changes by a word or two.

Core Syntax and Examples

Here’s how to write XPath expressions using contains() for different scenarios.

Find Elements by Partial Text

To locate a paragraph <p> element that contains the word Welcome anywhere in its text:

//p[contains(text(), 'Welcome')]

This will match something like:

<p>Welcome back to the site!</p> <p>Please, Welcome our new members</p> ``` :contentReference[oaicite:6]{index=6} ### Match Attributes Using `contains()` You can also use `contains()` to match part of an attribute value — for example, to find elements with class names that include a certain substring: ```xpath //div[contains(@class, 'menu')]

This matches <div class="main-menu">, <div class="menu-item large">, etc.

Combine Multiple Conditions

XPath allows logical combinations using and or or. For example:

//a[contains(text(), 'Learn') and contains(@href, '/guide')]

This selects all links that contain the word “Learn” in the text and also include /guide in the href attribute.

Choosing Between Exact and Partial Text

XPath also supports exact text matching:

//button[text() = 'Submit']

This only matches if the button text is exactly “Submit” with no extra characters or spaces. But if the text varies (e.g., “Submit Now” or “Submit Form”), this exact match will fail.

In contrast, a partial match such as:

//button[contains(text(), 'Submit')]

will match all buttons whose text includes the word “Submit”, regardless of what else appears.

Case Sensitivity and Edge Cases

  • By default, XPath is case-sensitive. That means contains(text(), 'submit') won’t match “Submit”. To handle case variations you can use functions like translate() to normalize text to a common case.
  • Whitespace can also cause mismatches. Using normalize-space(text()) helps eliminate leading and trailing spaces before matching.

Practical Usage in Automation and Scraping

In tools like Selenium, contains() is widely used because exact text or attribute matches often fail due to dynamic UIs. Here’s a simple example in Selenium WebDriver (JavaScript):

let element = driver.findElement(By.xpath("//a[contains(text(), 'Log In')]")); await element.click();

This finds any link where the text includes “Log In”, such as “Log In Here” or “Click to Log In”.

Similarly, in Scrapy (Python), you might use an XPath like:

response.xpath("//a[contains(text(), 'article')]/@href").getall()

This extracts the href of all links whose text contains “article”.

Performance Considerations

Using contains() indiscriminately (especially with // at the start of your XPath) can slow down scraping or automation because it searches through the entire document tree. Whenever possible:

  • Restrict the search to more specific structures (e.g., //div[@class='header']/*[contains(text(), '…')]).
  • Combine contains() with other filters to narrow down results.

Conclusion

XPath’s contains() function is an essential tool when you need to match partial text or attribute values in HTML or XML documents. It increases flexibility and resilience in both scraping and automation workflows by locating elements based on substrings rather than exact text matches.

By combining contains() with logical operators, attribute checks, and other XPath functions (like starts-with() or normalize-space()), you can write powerful selectors that work reliably even when page content changes or text varies slightly.

Table of Contents

    Take a Taste of Easy Scraping!