How to Scrape Amazon with Node.js: A Beginner-Friendly Guide
Article

How to Scrape Amazon with Node.js: A Beginner-Friendly Guide

Article

Learn how to scrape Amazon using Node.js and Puppeteer in a simple, beginner-friendly guide. This tutorial covers setup, scrolling, pagination, code examples, and tips for extracting product data safely and efficiently.

The modern web thrives on data. Companies rely on it to monitor competition, understand customer behavior, study market trends, and build smarter internal tools. Amazon, being one of the largest e-commerce platforms in the world, holds an enormous amount of product information. The problem? Manually collecting that data is slow, repetitive, and unrealistic at scale.

This is where web scraping becomes valuable. Web scraping allows a program to visit a website, read its structure, and extract the exact information you need—automatically. Instead of scrolling through pages and copying product names or prices by hand, a scraper handles the work for you in seconds. In a workflow where speed and accuracy matter, automation isn’t just helpful—it’s necessary.

In this guide, we’ll walk through how to scrape Amazon using Node.js and Puppeteer, one of the most reliable tools for browser automation. To keep things realistic, we’ll use Amazon’s Health & Beauty category as our example. It’s a crowded section with diverse items—skincare, supplements, cosmetics—making it perfect for demonstrating how a scraper handles dynamic product listings and multiple layouts.

By the end, you’ll understand how to run the script, how it works, and why each step is necessary when extracting data from Amazon’s dynamic pages.

1. Preparing Your Node.js Environment

Before writing any scraping code, you’ll need to install Node.js.

Download Node.js from:

https://nodejs.org

After installing, confirm everything works:

node --version
npm --version

If both return a version number, you’re ready.

2. Setting Up Your Project

Create a new folder and initialize a Node project:

mkdir amazon-scraper
cd amazon-scraper
npm init -y

Then install Puppeteer Core:

npm install puppeteer-core

We use puppeteer-core so we rely on your system’s Chrome/Chromium instead of downloading a new browser.

3. Creating the Scraper File

Create a file:

touch main.js

This file will contain logic for:

  • launching Chrome/Chromium
  • navigating to Amazon
  • scrolling to load dynamic content
  • extracting product titles and prices
  • paginating through search results
  • saving results to JSON

Puppeteer is ideal because Amazon relies heavily on dynamic rendering.

4. Writing the Amazon Scraper (Complete Code)

Below is the full working script:

const puppeteer = require("puppeteer-core");
const fs = require("fs");

let browser = null;
let page = null;
let all = [];

// Save scraped data
function saveJSON() {
  fs.writeFileSync("results.json", JSON.stringify(all, null, 2));
  console.log("Saved results to results.json");
}

// Graceful exit
async function finishAndExit() {
  console.log("\nFinalizing scraper...");
  console.log("Total items collected:", all.length);
  saveJSON();

  try { if (browser) await browser.close(); } catch {}
  process.exit(0);
}

process.on("SIGINT", finishAndExit);

(async () => {
  browser = await puppeteer.launch({
    headless: false,
    executablePath: puppeteer.executablePath(),
    args: [
      "--disable-http2",
      "--disable-features=IsolateOrigins,site-per-process",
      "--no-sandbox",
      "--disable-setuid-sandbox"
    ]
  });

  page = await browser.newPage();
  await page.setUserAgent(
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120 Safari/537.36"
  );

  const startUrl = "https://www.amazon.com/s?k=health+and+beauty";
  console.log("Opening:", startUrl);

  await page.goto(startUrl, { waitUntil: "networkidle2", timeout: 90000 });

  const scrapePage = async () => {
    await page.evaluate(async () => {
      await new Promise(resolve => {
        let total = 0;
        const distance = 400;
        const timer = setInterval(() => {
          window.scrollBy(0, distance);
          total += distance;
          if (total >= document.body.scrollHeight) {
            clearInterval(timer);
            resolve();
          }
        }, 200);
      });
    });

    return await page.$$eval(
      `
      div[data-asin][data-component-type="s-search-result"],
      div[data-asin].s-result-item,
      div.puis-card-container[data-asin]
      `,
      nodes => nodes.map(n => ({
        title:
          n.querySelector("h2 a span")?.textContent?.trim() ||
          n.querySelector("span.a-text-normal")?.textContent?.trim() ||
          null,
        price:
          n.querySelector(".a-price .a-offscreen")?.textContent?.trim() ||
          null
      }))
    );
  };

  let limit = 10;

  while (limit-- > 0) {
    console.log("\nScraping current page...");
    const items = await scrapePage();
    all.push(...items);

    const current = await page.$eval(
      "span.s-pagination-selected",
      el => parseInt(el.textContent.trim())
    ).catch(() => null);

    if (!current) break;

    const next = current + 1;

    const nextHref = await page.$eval(
      `a.s-pagination-item.s-pagination-button[aria-label="Go to page ${next}"]`,
      el => el.href
    ).catch(() => null);

    if (!nextHref) break;

    await page.goto(nextHref, { waitUntil: "networkidle2", timeout: 90000 });
  }

  await finishAndExit();
})();

Step-by-Step Explanation

Below is a summary of what each block does. (You can expand these sections in your blog if needed.)

Top-Level Imports

Imports Puppeteer and file system utilities.

Save Helper

Writes all scraped results into results.json.

Graceful Shutdown

Ensures that even if you press CTRL+C, your results are saved.

Browser Launch

Opens Chrome/Chromium with safe defaults and anti-detection settings.

User-Agent Spoofing

Helps avoid basic bot-detection issues.

Scrolling Logic

Forces Amazon to render all products by scrolling the page slowly.

Product Extraction

Extracts product titles and prices using multiple selectors to handle layout variations.

Pagination

Reads the current page number and moves to the next page by URL.

Safety Limit

Prevents infinite loops.

5. Running the Scraper

Inside your project folder:

node main.js

A browser will open and the scraper will scroll, load data, and save everything into:

results.json

You can stop anytime with CTRL+C — your progress is still saved.

Quick Tips & Notes

  • Use headless: false while developing.
  • Some sites block default headless browsers; your user agent helps avoid that.
  • For large-scale scraping, using rotating proxies is recommended.
  • Amazon may return CAPTCHAs—handle responsibly.
  • Always follow local laws and the website’s terms.
  • Save partial progress regularly when scraping multiple pages.

Final Thoughts

Scraping Amazon with Node.js gives you a powerful and flexible way to collect data at scale. With Puppeteer, you can handle dynamic pages, scroll-based content, and pagination with ease.

As your project grows, you can enhance the scraper with:

  • rotating proxies
  • CAPTCHA solving
  • scheduling
  • pipelines and dashboards

But even with this beginner-friendly setup, you already have a robust scraping foundation.

Table of Contents

    Take a Taste of Easy Scraping!