Node-Unblocker for Web Scraping: What It Is and How It Works
ArticleWhen scraping websites in **Node.
When scraping websites in Node.js, defensive barriers like rate limits, IP blocks, geo-checks, and basic bot detection can quickly stop a simple HTTP request in its tracks. One tool developers sometimes use to help work around these limits, especially in lightweight or experimental scraping workflows, is Node-Unblocker. It’s an open-source proxy middleware library for Node.js and Express.
Node-Unblocker lets you build a proxy server that reroutes HTTP requests through your own application. Scrapers can then fetch content via this proxy instead of directly, which can help avoid some simple host-level restrictions. In this article we explore how to set up Node-Unblocker, how to use it in a scraping workflow, and how it can be combined with scraping tools like axios or headless browsers. Finally, we compare this approach with a managed proxy solution like MrScraper’s proxy feature for larger-scale scraping.
What Node-Unblocker Is and How It Works
Node-Unblocker is an npm package originally designed as a web proxy for circumventing blocks and censorship. In the context of scraping, it forwards incoming requests to a remote target and streams the response back to your client. Internally it handles things like relative URL rewriting and cookie path adjustments to keep proxied pages functional.
You pull it into an Express server and attach it as middleware so that requests made to a route prefix (such as /proxy/) get relayed to the target site.
Setting Up Node-Unblocker in Express
Here’s how to build a basic proxy server using Node-Unblocker:
# Initialize a new project mkdir node-unblocker-proxy cd node-unblocker-proxy npm init -y # Install dependencies npm install express unblocker |
|---|
Create proxy-server.js with the following:
| const express = require("express"); const Unblocker = require("unblocker"); const app = express(); // Unblocker will handle all routes under /proxy/ const unblocker = new Unblocker({ prefix: "/proxy/" }); app.use(unblocker); // Start the proxy server const PORT = process.env.PORT || 3000; app.listen(PORT, () => { console.log(`Proxy server running at http://localhost:${PORT}/proxy/`); }).on("upgrade", unblocker.onUpgrade); |
| :---- |
Now, if you start this server:
node proxy-server.js |
|---|
You can fetch pages through your proxy like this in a browser:
http://localhost:3000/proxy/https://example.com |
|---|
The server will forward the request for example.com and serve back the proxied content.
Using the Proxy in a Web Scraper
Node-Unblocker itself is just a reverse proxy, it doesn’t scrape or parse data. You still need to fetch the proxied content in a scraper script. Here’s how you might do that with axios in Node.js:
http://localhost:3000/proxy/https://example.com |
|---|
Then create scraper.js:
const axios = require("axios"); // Base of your proxy server const PROXY_BASE = "http://localhost:3000/proxy/"; // Target URL to scrape const TARGET_URL = "https://www.example.com"; (async () => { try { // Fetch via local Node-Unblocker proxy const response = await axios.get(PROXY_BASE + TARGET_URL, { headers: { // Optional; emulate a real browser "User-Agent": "Mozilla/5.0 (compatible; Node Scraper)" } }); console.log("HTML length:", response.data.length); // You can now parse response.data with Cheerio or other tools } catch (err) { console.error("Error scraping through proxy:", err.message); } })(); |
|---|
This makes your scraper treat the proxy as the origin. Node-Unblocker then forwards the request externally and streams back the HTML, which axios receives.
Advanced Request and Response Middleware
Node-Unblocker supports middleware hooks for modifying requests (before forwarding) and responses (before returning to your scraper). For example, you might inject or modify headers conditionally:
function addAuthHeaders(data) { if (/^https?://api.example.com/.test(data.url)) { // Add a custom token to API requests data.headers["x-scrape-token"] = "my_token_value"; } } const unblockerConfig = { prefix: "/proxy/", requestMiddleware: [addAuthHeaders] }; app.use(new Unblocker(unblockerConfig)); |
|---|
Or, to strip out specific scripts from a response:
function stripScripts(data) { if (data.contentType.includes("text/html")) { data.stream = data.stream.pipe( through(function (chunk, enc, next) { this.push(chunk.toString().replace(/<script[^>]*>.*?</script>/g, "")); next(); }) ); } } const unblockerWithMiddleware = new Unblocker({ responseMiddleware: [stripScripts] }); app.use(unblockerWithMiddleware); |
|---|
These patterns let you tailor the behavior of the proxy for scraping needs like authentication or cleanup, though they also add complexity to your server.
Integrating with Browser Automation (Puppeteer/Playwright)
Node-Unblocker’s proxy can also be used with headless browsers, although advanced sites with Cloudflare or heavy bot protections may still block straightforward proxies. A scraped browser might look like this:
const puppeteer = require("puppeteer"); (async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto("http://localhost:3000/proxy/https://example.com"); const html = await page.content(); console.log("Page HTML:", html.substring(0, 500)); await browser.close(); })(); |
|---|
This setup loads the proxied page in headless Chrome, which may help with sites requiring some level of client-side execution.
Limitations of Node-Unblocker in Scraping
Node-Unblocker is useful for basic content fetching and experiments, but it has limitations:
- No built-in proxy rotation or IP pool: If your proxy server is hosted on a single IP, you can still get blocked when scraping at scale.
- Struggles with anti-bot defenses: Sites behind Cloudflare, sophisticated rate limits, or dynamic bot detection may still block proxied requests.
- Not specialized for scraping: It acts as a generic relay rather than a scraping API with structured outputs.
This makes Node-Unblocker great for development, testing, or small-scale scrapes, but less suitable for large production scraping tasks without additional infrastructure.
MrScraper’s Proxy Feature for Scalable Scraping
If you’re building larger or more resilient scraping systems, MrScraper’s proxy feature offers managed proxy handling integrated directly with its scraping API:
- Automated proxy rotation: MrScraper routes requests through a pool of proxies without manual middleware or server setup.
- Anti-blocking intelligence: Built-in techniques reduce the chances of IP bans, even on targets with moderate bot protection.
- Unified scraping and proxy API: Instead of setting up separate proxy servers and managing middleware, you make calls to MrScraper’s API and receive structured output.
This makes MrScraper particularly useful when you want to focus on data parsing and business logic rather than proxy infrastructure.
Conclusion
Using Node-Unblocker for web scraping lets you quickly spin up your own proxy server and fetch remote content through a Node.js-based middleware. It integrates tightly with Express, supports middleware hooks for transforming requests and responses, and can work with both HTTP clients like axios and headless browsers like Puppeteer.
For simple scraping tasks or internal projects, this approach gives you direct control over how requests are routed. But when your scraping demands grow, whether you need proxy rotation, anti-block handling, or a managed scaling solution, integrating a platform like MrScraper with integrated proxy support can help reduce maintenance overhead and improve reliability.
Find more insights here
How to Use a SOCKS5 Proxy Server
A SOCKS5 proxy is simply a piece of infrastructure that sits between your device and the internet an...
Spotify Profiles Search Scraper: How It Works and Why Developers Use It
Unlock music market insights by scraping Spotify user profiles. Learn the best tools for keyword-bas...
Facebook Marketplace API: What Developers Need to Know in 2026
Learn why Meta doesn't offer a public endpoint and discover the 3 best ways developers programmatica...