Understanding XPath contains() Text — A Practical Guide
Article

Understanding XPath contains() Text — A Practical Guide

Guide

Learn how XPath contains() works for matching partial text and attributes in HTML or XML, with practical examples for scraping and automation.

When working with XML or HTML documents — whether for web scraping, browser automation, or test frameworks like Selenium or Scrapy — you’ll often need to locate elements based on part of the text they contain. XPath’s contains() function makes this possible.

In this guide, we’ll explain what contains() does, why it’s useful, and how to use it effectively with real-world examples.

What contains() Does in XPath

contains() is a built-in XPath function that allows you to match elements using a substring rather than requiring an exact text match.

In real-world pages, visible text often changes slightly due to dynamic content, formatting, or UI updates. Using contains() makes your XPath selectors more flexible and resilient.

Basic syntax:

//tag[contains(text(), 'partial text')]

Where:

  • tag is the HTML or XML tag you’re targeting (e.g., div, span, button)
  • contains(text(), 'partial text') checks whether the element’s visible text includes the specified substring

Why Partial Text Matching Matters

Using contains() in XPath offers several advantages:

  • Handles dynamic content where full text may change
  • Makes scraping and automation scripts less brittle
  • Improves test stability when UI labels vary slightly
  • Reduces maintenance caused by small text changes

Partial text matching helps prevent selectors from breaking due to minor UI updates.

Core Syntax and Examples

Find Elements by Partial Text

To locate a paragraph element containing the word Welcome:

//p[contains(text(), 'Welcome')]

This matches elements like:

<p>Welcome back to the site!</p>
<p>Please, Welcome our new members</p>

Match Attributes Using contains()

You can also match substrings within attributes, such as class names:

//div[contains(@class, 'menu')]

This matches:

<div class="main-menu"></div>
<div class="menu-item large"></div>

Combine Multiple Conditions

XPath supports logical operators like and and or:

//a[contains(text(), 'Learn') and contains(@href, '/guide')]

This selects links that:

  • Contain “Learn” in the visible text
  • Have /guide in the href attribute

Choosing Between Exact and Partial Text

XPath also allows exact text matching:

//button[text() = 'Submit']

This matches only elements with exactly “Submit” as the text.

If the text may vary (e.g., “Submit Now” or “Submit Form”), use:

//button[contains(text(), 'Submit')]

Partial matches are generally safer for dynamic pages.

Case Sensitivity and Edge Cases

XPath is case-sensitive by default. For example:

contains(text(), 'submit')

Will not match “Submit”.

To handle case differences, you can normalize text using translate().

Whitespace can also cause mismatches. Using normalize-space(text()) helps remove extra spaces before matching.

Practical Usage in Automation and Scraping

In real-world scraping workflows, having a dedicated tool can make complex tasks much easier. For example, if you’re scraping Twitter profile content or tweet data, tools like MrScraper’s Twitter Scraper Made Simple guide show how to set up and extract structured social media data efficiently.

Selenium Example (JavaScript)

let element = driver.findElement(
  By.xpath("//a[contains(text(), 'Log In')]")
);
await element.click();

This matches links such as:

  • “Log In”
  • “Log In Here”
  • “Click to Log In”

Scrapy Example (Python)

response.xpath("//a[contains(text(), 'article')]/@href").getall()

This extracts links whose anchor text contains the word “article”.

Performance Considerations

Overusing contains() — especially with // at the beginning — can slow down scraping or automation.

Best practices:

  • Narrow the search scope whenever possible
  • Combine contains() with attributes or parent elements
  • Avoid overly broad XPath expressions

Example of a more efficient selector:

//div[@class='header']//*[contains(text(), 'Login')]

Conclusion

XPath’s contains() function is an essential tool for matching partial text or attribute values in HTML and XML documents. It makes selectors more flexible, resilient, and suitable for real-world scraping and automation tasks.

By combining contains() with logical operators, attribute filters, and functions like normalize-space() or starts-with(), you can build powerful XPath expressions that remain reliable even when page content changes.

Table of Contents

    Take a Taste of Easy Scraping!