How to Scrape Amazon with Node.js: A Beginner-Friendly Guide
The modern web thrives on data. Companies rely on it to monitor competition, understand customer behavior, study market trends, and build smarter internal tools. Amazon, being one of the largest e-commerce platforms in the world, holds an enormous amount of product information. The problem? Manually collecting that data is slow, repetitive, and unrealistic at scale.
This is where web scraping becomes valuable. Web scraping allows a program to visit a website, read its structure, and extract the exact information you need—automatically. Instead of scrolling through pages and copying product names or prices by hand, a scraper handles the work for you in seconds. In a workflow where speed and accuracy matter, automation isn’t just helpful—it’s necessary.
In this guide, we’ll walk through how to scrape Amazon using Node.js and Puppeteer, one of the most reliable tools for browser automation. To keep things realistic, we’ll use Amazon’s Health & Beauty category as our example. It’s a crowded section with diverse items—skincare, supplements, cosmetics—making it perfect for demonstrating how a scraper handles dynamic product listings and multiple layouts.
By the end, you’ll understand how to run the script, how it works, and why each step is necessary when extracting data from Amazon’s dynamic pages.
1. Preparing Your Node.js Environment
Before writing any scraping code, you’ll need to install Node.js.
Download Node.js from:
https://nodejs.org
After installing, confirm everything works:
node --version
npm --version
If both return a version number, you’re ready.
2. Setting Up Your Project
Create a new folder and initialize a Node project:
mkdir amazon-scraper
cd amazon-scraper
npm init -y
Then install Puppeteer Core:
npm install puppeteer-core
We use puppeteer-core so we rely on your system’s Chrome/Chromium instead of downloading a new browser.
3. Creating the Scraper File
Create a file:
touch main.js
This file will contain logic for:
- launching Chrome/Chromium
- navigating to Amazon
- scrolling to load dynamic content
- extracting product titles and prices
- paginating through search results
- saving results to JSON
Puppeteer is ideal because Amazon relies heavily on dynamic rendering.
4. Writing the Amazon Scraper (Complete Code)
Below is the full working script:
const puppeteer = require("puppeteer-core");
const fs = require("fs");
let browser = null;
let page = null;
let all = [];
// Save scraped data
function saveJSON() {
fs.writeFileSync("results.json", JSON.stringify(all, null, 2));
console.log("Saved results to results.json");
}
// Graceful exit
async function finishAndExit() {
console.log("\nFinalizing scraper...");
console.log("Total items collected:", all.length);
saveJSON();
try { if (browser) await browser.close(); } catch {}
process.exit(0);
}
process.on("SIGINT", finishAndExit);
(async () => {
browser = await puppeteer.launch({
headless: false,
executablePath: puppeteer.executablePath(),
args: [
"--disable-http2",
"--disable-features=IsolateOrigins,site-per-process",
"--no-sandbox",
"--disable-setuid-sandbox"
]
});
page = await browser.newPage();
await page.setUserAgent(
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120 Safari/537.36"
);
const startUrl = "https://www.amazon.com/s?k=health+and+beauty";
console.log("Opening:", startUrl);
await page.goto(startUrl, { waitUntil: "networkidle2", timeout: 90000 });
const scrapePage = async () => {
await page.evaluate(async () => {
await new Promise(resolve => {
let total = 0;
const distance = 400;
const timer = setInterval(() => {
window.scrollBy(0, distance);
total += distance;
if (total >= document.body.scrollHeight) {
clearInterval(timer);
resolve();
}
}, 200);
});
});
return await page.$$eval(
`
div[data-asin][data-component-type="s-search-result"],
div[data-asin].s-result-item,
div.puis-card-container[data-asin]
`,
nodes => nodes.map(n => ({
title:
n.querySelector("h2 a span")?.textContent?.trim() ||
n.querySelector("span.a-text-normal")?.textContent?.trim() ||
null,
price:
n.querySelector(".a-price .a-offscreen")?.textContent?.trim() ||
null
}))
);
};
let limit = 10;
while (limit-- > 0) {
console.log("\nScraping current page...");
const items = await scrapePage();
all.push(...items);
const current = await page.$eval(
"span.s-pagination-selected",
el => parseInt(el.textContent.trim())
).catch(() => null);
if (!current) break;
const next = current + 1;
const nextHref = await page.$eval(
`a.s-pagination-item.s-pagination-button[aria-label="Go to page ${next}"]`,
el => el.href
).catch(() => null);
if (!nextHref) break;
await page.goto(nextHref, { waitUntil: "networkidle2", timeout: 90000 });
}
await finishAndExit();
})();
Step-by-Step Explanation
Below is a summary of what each block does. (You can expand these sections in your blog if needed.)
Top-Level Imports
Imports Puppeteer and file system utilities.
Save Helper
Writes all scraped results into results.json.
Graceful Shutdown
Ensures that even if you press CTRL+C, your results are saved.
Browser Launch
Opens Chrome/Chromium with safe defaults and anti-detection settings.
User-Agent Spoofing
Helps avoid basic bot-detection issues.
Scrolling Logic
Forces Amazon to render all products by scrolling the page slowly.
Product Extraction
Extracts product titles and prices using multiple selectors to handle layout variations.
Pagination
Reads the current page number and moves to the next page by URL.
Safety Limit
Prevents infinite loops.
5. Running the Scraper
Inside your project folder:
node main.js
A browser will open and the scraper will scroll, load data, and save everything into:
results.json
You can stop anytime with CTRL+C — your progress is still saved.
Quick Tips & Notes
- Use headless: false while developing.
- Some sites block default headless browsers; your user agent helps avoid that.
- For large-scale scraping, using rotating proxies is recommended.
- Amazon may return CAPTCHAs—handle responsibly.
- Always follow local laws and the website’s terms.
- Save partial progress regularly when scraping multiple pages.
Final Thoughts
Scraping Amazon with Node.js gives you a powerful and flexible way to collect data at scale. With Puppeteer, you can handle dynamic pages, scroll-based content, and pagination with ease.
As your project grows, you can enhance the scraper with:
- rotating proxies
- CAPTCHA solving
- scheduling
- pipelines and dashboards
But even with this beginner-friendly setup, you already have a robust scraping foundation.
Table of Contents
Take a Taste of Easy Scraping!
Get started now!
Step up your web scraping
Find more insights here
AI Web Scraping Tools: How Intelligent Scrapers Are Transforming Data Collection
Discover how AI web scraping tools work, why they are replacing traditional scrapers, and how businesses use intelligent extraction to collect reliable web data.
Dedicated Proxy vs Shared Proxy: What’s the Difference and Which One Is Better for Web Scraping?
Dedicated proxies vs shared proxies: learn the differences, benefits, and best use cases to choose the right proxy setup for web scraping and automation.
Instant Data Scraper Review: Features, Benefits, and Limitations (2025 Guide)
Learn everything about Instant Data Scraper — the free Chrome extension that lets you scrape websites without coding. Understand its features, limitations, best use cases, and when you should use more advanced scraping tools.
@MrScraper_
@MrScraper