How MrScraper Stays Stealth
IP Rotation
The Internet is made from a bunch of connected computers. Some provide a service that we call a website. This website (or site) is accessed using a browser, and it might contain useful information. The computer needs a way to identify each other, so it can communicate. Same as humans that need a name to refer to each other. On the internet, that identity is an IP address, and usually given by an Internet Service Provider (ISP). When a computer repeatedly accesses a site to collect useful information, the site owner might be suspicious of that behaviour.
IP addresses are leased to ISP and then distributed to customers. This way, an IP address has ISP information connected and registered to it, including its geographic region. There are two types of ISP customers:
- Residential: typical household
- Data center: rentable server
A website visit coming from a data center is more likely to be a bot than from a residential one. This is because a server can be rented cheaply, remotely, and includes both the machine and space needed for the computer. While the residential need to bring your own machine. Site owners can determine with a degree of accuracy whether a request came from a residential or data center customer.
To lower the probability of being detected as a bot, MrScraper uses a different IP address for each session when scraping. This can be done using proxy rotation, so each request is relayed by another computer. The site will see the IP address of the proxy, instead of the original IP address of the machine. This IP address is selected randomly from the residential IP address pool, and this is called IP rotation. The site owner will see different IP addresses, and it’s harder to determine which one is bot or not.
Some websites only target specific geographic regions, and filter requests from another region. That is why MrScraper has the option to specify which country of the proxy to use. Even if we use a proxy, the original IP can be leaked by WebRTC. These inconsistencies between IP address reported by the request and WebRTC can raise the probability of being flagged as a bot.
WebRTC itself is a protocol that define peer to peer communication for video conferencing. Instead of using the central server to distribute the video feed, browsers can connect directly to other clients. This necessitates the client know the IP address of each other. The problem has already solved by MrScraper using a custom built browser that rewrites the WebRTC IP information shared to the website’s server.
A site can contain multiple resources that need to be fetched from the server. Not all of the IP addresses from all the resources used to detect bot. Static resources like script or styling cached by the browser, so it’s not always being fetched. This behaviour allow MrScraper to bypass the request that was considered static, to not use the proxy, and resulted in faster load time. Here is the diagram of MrScraper using proxy.

Even with all of these, site owners can still detect other inconsistencies using information collected from the browser. For example the timezone settings used and IP address geolocation. The browser is still running on the original machine instead of the proxy, so it needs to be changed to match the proxy geolocation. Not all the change can be done trivially using browser settings, same as before, custom built browsers created to make it possible to change on each session.
Browser Fingerprinting
After fixing all of the inconsistencies from using proxy, this won’t change the fact about the original device. Information about the device can be collected using a script that runs in the browser. This script originally used to add interactivity for the website. One way to uniquely identify a device is by using cookies. This is not a problem for MrScraper, since it creates an empty profile for each new session. Using information to identify a user is called fingerprinting. The site owner can see the bulk request from the same device, and mark it as suspicious. Here is some of the information that can be used to do that:
1. Screen Size
Websites can read your screen’s resolution, available screen space, color depth and device pixel ratio. These values vary widely across devices and setups. When combined, they help form a unique signature. For example with the same resolution, but with a different pixel ratio. Or even multi monitors setups.
2. Battery Status
This information reveals device battery level, charging status, and time to discharge estimates. Originally intended for UI optimization, researchers found that battery metrics can be used to track users across sessions, because battery drain patterns can act as a quasi-unique identifier.
3. Navigator Object
This information leaks numerous details about the browser and device. This includes browser name and version, OS platform, language settings, even device memory estimate. Each of these values make it easier to pinpoint a user fingerprint.
4. Font List
Websites can detect which fonts are installed through CSS and canvas-based tests. Because font installations vary significantly between users (especially with custom or system-specific fonts), the resulting font profile can be very unique.
5. WebGL
WebGL used to accelerate 2D or 3D graphics on the web. This reveals information about GPU model, rendering precision, and pixel difference when drawing a scene. After drawing using canvas, the result of each pixel can be read and hashed to compare with other devices.
All of this information was already auto-rotated using a custom built browser used by MrScraper. Not all of these configurations can be set freely, since a never been seen value or set of values can be used to detect spoofing. Fortunately, MrScraper already collected thousands of valid fingerprints from the real browser to be used rotationally like an IP address.
Table of Contents
Interested in our research?
Learn how MrScraper's AI-powered web scraping can help your business extract data at scale.
Book a CallGet started now!
Step up your web scraping
More Research
How MrScraper Adopts Acyclic Task-Specific Agent To Build The Most Reliable Web scraper agent
Mrscraper Agent is an AI-powered scraping system built on a deterministic DAG pipeline, turning comp...
Solving Pagination: How MrScraper Deliver The Most Complete, Precise, and Optimized Listing Results
Modern listing pages use infinite scroll, dynamic loading, and mixed pagination patterns that make t...