Scraping Amazon Product Data With Python: A Step-by-Step Tutorial
EngineeringLearn how to scrape Amazon product data using Python and Playwright. A step-by-step guide to bypass bot checks, extract product info, and improve scraping reliability.
Web scraping has become a key tool for developers, analysts, and businesses that rely on structured product information. Whether you're tracking competitor prices, monitoring product availability, or researching marketplace trends, being able to extract accurate data reliably is essential.
Scraping large e-commerce platforms like Amazon, however, is not as simple as scraping regular websites. Strong anti-bot systems, CAPTCHAs, dynamic HTML rendering, and bot-detection signals often prevent standard HTTP requests from accessing the real product page.
In this guide, you’ll learn how to scrape Amazon product data using Python and Playwright, the most resilient browser automation tool available today. We’ll walk through a complete working script and explain how it handles Amazon’s blocking mechanisms automatically.
Why Traditional Requests + BeautifulSoup Fails on Amazon
If you've tried scraping Amazon using the typical approach:
requestsfor fetching HTMLBeautifulSoupfor parsing- custom headers / user-agent spoofing
…you’ve likely run into pages like:
- “To discuss automated access to Amazon data…”
- “Click the button below to continue shopping”
- CAPTCHA screens
- Empty HTML pages with no product content
This happens because Amazon uses:
- JavaScript-based DOM hydration
- Bot-detection that blocks non-browser traffic
- Redirect loops to CAPTCHA pages
- Dynamic HTML selectors that frequently change
Solution: use a real browser — and that’s exactly what Playwright provides.
Why Playwright Is the Best Tool for Scraping Amazon
Playwright (by Microsoft) is a headless/full browser automation framework that:
- Loads websites like a real browser
- Executes JavaScript automatically
- Bypasses many fingerprinting checks
- Gives full control of navigation and selectors
- Works with Chromium, Firefox, and WebKit
Because it behaves like a real user, Amazon delivers the actual product page, not a fallback bot-protection page.
Step-by-Step: Scrape an Amazon Product With Playwright
Below is the full working Python code. This script:
- Detects Amazon bot-check pages
- Handles CAPTCHA redirects
- Loads the real product page
- Extracts title, price, rating, image, and availability
- Works even when Amazon changes some selectors
Python Code: Scrape Amazon With Playwright
from playwright.sync_api import sync_playwright
import time
URL = "https://www.amazon.com/dp/B0CFC7Q4V3" # Replace with any real product
def is_amazon_block_page(page):
url = page.url.lower()
block_keywords = [
"captcha",
"validatecaptcha",
"/ap/cvf",
"amazonbotcheck",
"robot-check",
"signin",
"503",
"sorry",
"not-found"
]
return any(k in url for k in block_keywords)
def scrape_amazon(url):
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
context = browser.new_context(
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
),
locale="en-US",
)
page = context.new_page()
print("Navigating...")
page.goto(url, timeout=60000, wait_until="domcontentloaded")
# Detect anti-bot page
if is_amazon_block_page(page):
print("Amazon returned a bot-detection page. Trying to bypass...")
try:
if page.locator("button[type='submit']").count() > 0:
page.click("button[type='submit']")
page.wait_for_load_state("networkidle")
time.sleep(3)
except:
pass
if is_amazon_block_page(page):
print("Still blocked. You need proxies or a session cookie.")
browser.close()
return None
# Wait for product page
product_selectors = [
"#productTitle",
"span.a-size-large.product-title-word-break"
]
got_content = False
for sel in product_selectors:
try:
page.wait_for_selector(sel, timeout=5000)
got_content = True
break
except:
pass
if not got_content:
print("Product content never loaded. Amazon likely blocked scraping.")
print("Final URL:", page.url)
browser.close()
return None
# Helpers
def safe_text(selector):
try:
return page.locator(selector).inner_text().strip()
except:
return None
def safe_attr(selector, attr):
try:
return page.locator(selector).get_attribute(attr)
except:
return None
# Extract fields
title = safe_text("#productTitle") or safe_text("span.a-size-large.product-title-word-break")
price = safe_text("span.a-offscreen")
rating = safe_text("span.a-icon-alt")
image_url = safe_attr("#landingImage", "src") or safe_attr("img[data-old-hires]", "src")
availability = safe_text("#availability")
result = {
"title": title,
"price": price,
"rating": rating,
"image_url": image_url,
"availability": availability,
}
browser.close()
return result
# Run
data = scrape_amazon(URL)
print("\nResult:\n", data)
Example Output
{
'title': 'Amazon Echo Dot (newest model)...',
'price': '$49.99',
'rating': '4.6 out of 5 stars',
'image_url': 'https://m.media-amazon.com/images/I/71vtuXXQdDL._AC_SY606_.jpg',
'availability': 'In Stock'
}
This shows the scraper successfully:
- passed bot checks
- executed JavaScript
- loaded the real product page
- extracted structured data
Before You Continue: A Faster, Easier Way With MrScraper
Building your own Amazon scraper is possible, but you must deal with:
- rotating proxies
- browser fingerprinting
- HTML changes
- CAPTCHA loops
- maintenance
If you want a zero-maintenance approach, MrScraper provides:
- Automatic proxy rotation
- Anti-bot bypass
- Instant JSON output
- No Playwright setup required
Conclusion
Scraping Amazon reliably today requires more than standard HTTP requests — it requires:
- a real browser
- JavaScript rendering
- bot detection handling
- fallback logic
Playwright handles these effectively, making it one of the best tools for scraping Amazon.
With the script above, you can extract product title, price, rating, images, and availability. And if you prefer an easier, automated solution, MrScraper’s Amazon scraper handles everything for you.
Find more insights here
How to Scrape TikTok: Scrape Profile Stats and Videos
Learn how to scrape TikTok profile stats and videos using Python and Playwright. Extract followers,...
“Your IP Has Been Temporarily Blocked” — What It Means & What to Do?
Learn why websites show the “Your IP has been temporarily blocked” error and how to fix it. Includes...
Instant Data Scraper Extensions — Effortless Web Scraping From Your Browser
Instant data-scraper extensions let you extract web data instantly without coding. Pelajari cara ker...