Scraping Amazon Product Data With Python: A Step-by-Step Tutorial
Article

Scraping Amazon Product Data With Python: A Step-by-Step Tutorial

Engineering

Learn how to scrape Amazon product data using Python and Playwright. A step-by-step guide to bypass bot checks, extract product info, and improve scraping reliability.

Web scraping has become a key tool for developers, analysts, and businesses that rely on structured product information. Whether you're tracking competitor prices, monitoring product availability, or researching marketplace trends, being able to extract accurate data reliably is essential.

Scraping large e-commerce platforms like Amazon, however, is not as simple as scraping regular websites. Strong anti-bot systems, CAPTCHAs, dynamic HTML rendering, and bot-detection signals often prevent standard HTTP requests from accessing the real product page.

In this guide, you’ll learn how to scrape Amazon product data using Python and Playwright, the most resilient browser automation tool available today. We’ll walk through a complete working script and explain how it handles Amazon’s blocking mechanisms automatically.

Why Traditional Requests + BeautifulSoup Fails on Amazon

If you've tried scraping Amazon using the typical approach:

  • requests for fetching HTML
  • BeautifulSoup for parsing
  • custom headers / user-agent spoofing

…you’ve likely run into pages like:

  • “To discuss automated access to Amazon data…”
  • “Click the button below to continue shopping”
  • CAPTCHA screens
  • Empty HTML pages with no product content

This happens because Amazon uses:

  • JavaScript-based DOM hydration
  • Bot-detection that blocks non-browser traffic
  • Redirect loops to CAPTCHA pages
  • Dynamic HTML selectors that frequently change

Solution: use a real browser — and that’s exactly what Playwright provides.

Why Playwright Is the Best Tool for Scraping Amazon

Playwright (by Microsoft) is a headless/full browser automation framework that:

  • Loads websites like a real browser
  • Executes JavaScript automatically
  • Bypasses many fingerprinting checks
  • Gives full control of navigation and selectors
  • Works with Chromium, Firefox, and WebKit

Because it behaves like a real user, Amazon delivers the actual product page, not a fallback bot-protection page.

Step-by-Step: Scrape an Amazon Product With Playwright

Below is the full working Python code. This script:

  • Detects Amazon bot-check pages
  • Handles CAPTCHA redirects
  • Loads the real product page
  • Extracts title, price, rating, image, and availability
  • Works even when Amazon changes some selectors

Python Code: Scrape Amazon With Playwright

from playwright.sync_api import sync_playwright
import time

URL = "https://www.amazon.com/dp/B0CFC7Q4V3"  # Replace with any real product

def is_amazon_block_page(page):
    url = page.url.lower()
    
    block_keywords = [
        "captcha",
        "validatecaptcha",
        "/ap/cvf",
        "amazonbotcheck",
        "robot-check",
        "signin",
        "503",
        "sorry",
        "not-found"
    ]
    
    return any(k in url for k in block_keywords)


def scrape_amazon(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        context = browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/120.0.0.0 Safari/537.36"
            ),
            locale="en-US",
        )
        page = context.new_page()

        print("Navigating...")
        page.goto(url, timeout=60000, wait_until="domcontentloaded")

        # Detect anti-bot page
        if is_amazon_block_page(page):
            print("Amazon returned a bot-detection page. Trying to bypass...")

            try:
                if page.locator("button[type='submit']").count() > 0:
                    page.click("button[type='submit']")
                    page.wait_for_load_state("networkidle")
                    time.sleep(3)
            except:
                pass

            if is_amazon_block_page(page):
                print("Still blocked. You need proxies or a session cookie.")
                browser.close()
                return None

        # Wait for product page
        product_selectors = [
            "#productTitle",
            "span.a-size-large.product-title-word-break"
        ]

        got_content = False
        for sel in product_selectors:
            try:
                page.wait_for_selector(sel, timeout=5000)
                got_content = True
                break
            except:
                pass

        if not got_content:
            print("Product content never loaded. Amazon likely blocked scraping.")
            print("Final URL:", page.url)
            browser.close()
            return None

        # Helpers
        def safe_text(selector):
            try:
                return page.locator(selector).inner_text().strip()
            except:
                return None

        def safe_attr(selector, attr):
            try:
                return page.locator(selector).get_attribute(attr)
            except:
                return None

        # Extract fields
        title = safe_text("#productTitle") or safe_text("span.a-size-large.product-title-word-break")
        price = safe_text("span.a-offscreen")
        rating = safe_text("span.a-icon-alt")
        image_url = safe_attr("#landingImage", "src") or safe_attr("img[data-old-hires]", "src")
        availability = safe_text("#availability")

        result = {
            "title": title,
            "price": price,
            "rating": rating,
            "image_url": image_url,
            "availability": availability,
        }

        browser.close()
        return result


# Run
data = scrape_amazon(URL)
print("\nResult:\n", data)

Example Output

{
 'title': 'Amazon Echo Dot (newest model)...',
 'price': '$49.99',
 'rating': '4.6 out of 5 stars',
 'image_url': 'https://m.media-amazon.com/images/I/71vtuXXQdDL._AC_SY606_.jpg',
 'availability': 'In Stock'
}

This shows the scraper successfully:

  • passed bot checks
  • executed JavaScript
  • loaded the real product page
  • extracted structured data

Before You Continue: A Faster, Easier Way With MrScraper

Building your own Amazon scraper is possible, but you must deal with:

  • rotating proxies
  • browser fingerprinting
  • HTML changes
  • CAPTCHA loops
  • maintenance

If you want a zero-maintenance approach, MrScraper provides:

  • Automatic proxy rotation
  • Anti-bot bypass
  • Instant JSON output
  • No Playwright setup required

Conclusion

Scraping Amazon reliably today requires more than standard HTTP requests — it requires:

  • a real browser
  • JavaScript rendering
  • bot detection handling
  • fallback logic

Playwright handles these effectively, making it one of the best tools for scraping Amazon.

With the script above, you can extract product title, price, rating, images, and availability. And if you prefer an easier, automated solution, MrScraper’s Amazon scraper handles everything for you.

Table of Contents

    Take a Taste of Easy Scraping!