Scraping Amazon Product Data With Python: A Step-by-Step Tutorial

Web scraping has become a key tool for developers, analysts, and businesses that rely on structured product information. Whether you're tracking competitor prices, monitoring product availability, or researching marketplace trends, being able to extract accurate data reliably is essential.

Scraping large e-commerce platforms like Amazon, however, is not as simple as scraping regular websites. Strong anti-bot systems, CAPTCHAs, dynamic HTML rendering, and bot-detection signals often prevent standard HTTP requests from accessing the real product page.

In this guide, you’ll learn how to scrape Amazon product data using Python and Playwright, the most resilient browser automation tool available today. We’ll walk through a complete working script and explain how it handles Amazon’s blocking mechanisms automatically.

Why Traditional Requests + BeautifulSoup Fails on Amazon

If you've tried scraping Amazon using the typical approach:

requests for fetching HTML
BeautifulSoup for parsing
custom headers / user-agent spoofing

…you’ve likely run into pages like:

“To discuss automated access to Amazon data…”
“Click the button below to continue shopping”
CAPTCHA screens
Empty HTML pages with no product content

This happens because Amazon uses:

JavaScript-based DOM hydration
Bot-detection that blocks non-browser traffic
Redirect loops to CAPTCHA pages
Dynamic HTML selectors that frequently change

Solution: use a real browser — and that’s exactly what Playwright provides.

Why Playwright Is the Best Tool for Scraping Amazon

Playwright (by Microsoft) is a headless/full browser automation framework that:

Loads websites like a real browser
Executes JavaScript automatically
Bypasses many fingerprinting checks
Gives full control of navigation and selectors
Works with Chromium, Firefox, and WebKit

Because it behaves like a real user, Amazon delivers the actual product page, not a fallback bot-protection page.

Step-by-Step: Scrape an Amazon Product With Playwright

Below is the full working Python code. This script:

Detects Amazon bot-check pages
Handles CAPTCHA redirects
Loads the real product page
Extracts title, price, rating, image, and availability
Works even when Amazon changes some selectors

Python Code: Scrape Amazon With Playwright

from playwright.sync_api import sync_playwright
import time

URL = "https://www.amazon.com/dp/B0CFC7Q4V3"  # Replace with any real product

def is_amazon_block_page(page):
    url = page.url.lower()
    
    block_keywords = [
        "captcha",
        "validatecaptcha",
        "/ap/cvf",
        "amazonbotcheck",
        "robot-check",
        "signin",
        "503",
        "sorry",
        "not-found"
    ]
    
    return any(k in url for k in block_keywords)


def scrape_amazon(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        context = browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/120.0.0.0 Safari/537.36"
            ),
            locale="en-US",
        )
        page = context.new_page()

        print("Navigating...")
        page.goto(url, timeout=60000, wait_until="domcontentloaded")

        # Detect anti-bot page
        if is_amazon_block_page(page):
            print("Amazon returned a bot-detection page. Trying to bypass...")

            try:
                if page.locator("button[type='submit']").count() > 0:
                    page.click("button[type='submit']")
                    page.wait_for_load_state("networkidle")
                    time.sleep(3)
            except:
                pass

            if is_amazon_block_page(page):
                print("Still blocked. You need proxies or a session cookie.")
                browser.close()
                return None

        # Wait for product page
        product_selectors = [
            "#productTitle",
            "span.a-size-large.product-title-word-break"
        ]

        got_content = False
        for sel in product_selectors:
            try:
                page.wait_for_selector(sel, timeout=5000)
                got_content = True
                break
            except:
                pass

        if not got_content:
            print("Product content never loaded. Amazon likely blocked scraping.")
            print("Final URL:", page.url)
            browser.close()
            return None

        # Helpers
        def safe_text(selector):
            try:
                return page.locator(selector).inner_text().strip()
            except:
                return None

        def safe_attr(selector, attr):
            try:
                return page.locator(selector).get_attribute(attr)
            except:
                return None

        # Extract fields
        title = safe_text("#productTitle") or safe_text("span.a-size-large.product-title-word-break")
        price = safe_text("span.a-offscreen")
        rating = safe_text("span.a-icon-alt")
        image_url = safe_attr("#landingImage", "src") or safe_attr("img[data-old-hires]", "src")
        availability = safe_text("#availability")

        result = {
            "title": title,
            "price": price,
            "rating": rating,
            "image_url": image_url,
            "availability": availability,
        }

        browser.close()
        return result


# Run
data = scrape_amazon(URL)
print("\\nResult:\\n", data)

Example Output

{
 'title': 'Amazon Echo Dot (newest model)...',
 'price': '$49.99',
 'rating': '4.6 out of 5 stars',
 'image_url': 'https://m.media-amazon.com/images/I/71vtuXXQdDL._AC_SY606_.jpg',
 'availability': 'In Stock'
}

This shows the scraper successfully:

passed bot checks
executed JavaScript
loaded the real product page
extracted structured data

Before You Continue: A Faster, Easier Way With MrScraper

Building your own Amazon scraper is possible, but you must deal with:

rotating proxies
browser fingerprinting
HTML changes
CAPTCHA loops
maintenance

If you want a zero-maintenance approach, MrScraper provides:

Automatic proxy rotation
Anti-bot bypass
Instant JSON output
No Playwright setup required

Conclusion

Scraping Amazon reliably today requires more than standard HTTP requests — it requires:

a real browser
JavaScript rendering
bot detection handling
fallback logic

Playwright handles these effectively, making it one of the best tools for scraping Amazon.

With the script above, you can extract product title, price, rating, images, and availability. And if you prefer an easier, automated solution, MrScraper’s Amazon scraper handles everything for you.

Scraping Amazon Product Data With Python: A Step-by-Step Tutorial

Why Traditional Requests + BeautifulSoup Fails on Amazon

Why Playwright Is the Best Tool for Scraping Amazon

Step-by-Step: Scrape an Amazon Product With Playwright

Python Code: Scrape Amazon With Playwright

Example Output

Before You Continue: A Faster, Easier Way With MrScraper

Conclusion

Table of Contents

Take a Taste of Easy Scraping!

Find more insights here

Scrape Bing Search: A Practical Technical Guide

FilterBypass: Unblocking Restricted Sites in a Simple Way

YouTube.com Unblocked: Accessing YouTube When It’s Restricted