Technology

How MrScraper Stays Stealth

Mrscraper is a service that provides a system to collect data on the web. It solves problems that arise while scraping, for example bot detection. This makes it harder to extract data. IP and Browser Fingerprint rotation can be used to solve this problem.
How MrScraper Stays Stealth

IP Rotation

The Internet is made from a bunch of connected computers. Some provide a service that we call a website. This website (or site) is accessed using a browser, and it might contain useful information. The computer needs a way to identify each other, so it can communicate. Same as humans that need a name to refer to each other. On the internet, that identity is an IP address, and usually given by an Internet Service Provider (ISP). When a computer repeatedly accesses a site to collect useful information, the site owner might be suspicious of that behaviour.

IP addresses are leased to ISP and then distributed to customers. This way, an IP address has ISP information connected and registered to it, including its geographic region. There are two types of ISP customers:

  • Residential: typical household
  • Data center: rentable server

A website visit coming from a data center is more likely to be a bot than from a residential one. This is because a server can be rented cheaply, remotely, and includes both the machine and space needed for the computer. While the residential need to bring your own machine. Site owners can determine with a degree of accuracy whether a request came from a residential or data center customer.

To lower the probability of being detected as a bot, MrScraper uses a different IP address for each session when scraping. This can be done using proxy rotation, so each request is relayed by another computer. The site will see the IP address of the proxy, instead of the original IP address of the machine. This IP address is selected randomly from the residential IP address pool, and this is called IP rotation. The site owner will see different IP addresses, and it’s harder to determine which one is bot or not.

Some websites only target specific geographic regions, and filter requests from another region. That is why MrScraper has the option to specify which country of the proxy to use. Even if we use a proxy, the original IP can be leaked by WebRTC. These inconsistencies between IP address reported by the request and WebRTC can raise the probability of being flagged as a bot.

WebRTC itself is a protocol that define peer to peer communication for video conferencing. Instead of using the central server to distribute the video feed, browsers can connect directly to other clients. This necessitates the client know the IP address of each other. The problem has already solved by MrScraper using a custom built browser that rewrites the WebRTC IP information shared to the website’s server.

A site can contain multiple resources that need to be fetched from the server. Not all of the IP addresses from all the resources used to detect bot. Static resources like script or styling cached by the browser, so it’s not always being fetched. This behaviour allow MrScraper to bypass the request that was considered static, to not use the proxy, and resulted in faster load time. Here is the diagram of MrScraper using proxy.

Mrscraper Using Proxy Diagram

Even with all of these, site owners can still detect other inconsistencies using information collected from the browser. For example the timezone settings used and IP address geolocation. The browser is still running on the original machine instead of the proxy, so it needs to be changed to match the proxy geolocation. Not all the change can be done trivially using browser settings, same as before, custom built browsers created to make it possible to change on each session.


Browser Fingerprinting

After fixing all of the inconsistencies from using proxy, this won’t change the fact about the original device. Information about the device can be collected using a script that runs in the browser. This script originally used to add interactivity for the website. One way to uniquely identify a device is by using cookies. This is not a problem for MrScraper, since it creates an empty profile for each new session. Using information to identify a user is called fingerprinting. The site owner can see the bulk request from the same device, and mark it as suspicious. Here is some of the information that can be used to do that:

1. Screen Size

Websites can read your screen’s resolution, available screen space, color depth and device pixel ratio. These values vary widely across devices and setups. When combined, they help form a unique signature. For example with the same resolution, but with a different pixel ratio. Or even multi monitors setups.

2. Battery Status

This information reveals device battery level, charging status, and time to discharge estimates. Originally intended for UI optimization, researchers found that battery metrics can be used to track users across sessions, because battery drain patterns can act as a quasi-unique identifier.

3. Navigator Object

This information leaks numerous details about the browser and device. This includes browser name and version, OS platform, language settings, even device memory estimate. Each of these values make it easier to pinpoint a user fingerprint.

4. Font List

Websites can detect which fonts are installed through CSS and canvas-based tests. Because font installations vary significantly between users (especially with custom or system-specific fonts), the resulting font profile can be very unique.

5. WebGL

WebGL used to accelerate 2D or 3D graphics on the web. This reveals information about GPU model, rendering precision, and pixel difference when drawing a scene. After drawing using canvas, the result of each pixel can be read and hashed to compare with other devices.

All of this information was already auto-rotated using a custom built browser used by MrScraper. Not all of these configurations can be set freely, since a never been seen value or set of values can be used to detect spoofing. Fortunately, MrScraper already collected thousands of valid fingerprints from the real browser to be used rotationally like an IP address.

Get started now!

Step up your web scraping

Try MrScraper Now

More Research

How MrScraper Adopts Acyclic Task-Specific Agent To Build The Most Reliable Web scraper agent
AI & Machine Learning

How MrScraper Adopts Acyclic Task-Specific Agent To Build The Most Reliable Web scraper agent

Mrscraper Agent is an AI-powered scraping system built on a deterministic DAG pipeline, turning comp...

Solving Pagination: How MrScraper Deliver The Most Complete, Precise, and Optimized Listing Results
AI & Machine Learning

Solving Pagination: How MrScraper Deliver The Most Complete, Precise, and Optimized Listing Results

Modern listing pages use infinite scroll, dynamic loading, and mixed pagination patterns that make t...

What people think about scraper icon scraper

Net in hero

The mission to make data accessible to everyone is truly inspiring. With MrScraper, data scraping and automation are now easier than ever, giving users of all skill levels the ability to access valuable data. The AI-powered no-code tool simplifies the process, allowing you to extract data without needing technical skills. Plus, the integration with APIs and Zapier makes automation smooth and efficient, from data extraction to delivery.


I'm excited to see how MrScraper will change data access, making it simpler for businesses, researchers, and developers to unlock the full potential of their data. This tool can transform how we use data, saving time and resources while providing deeper insights.

John

Adnan Sher

Product Hunt user

This tool sounds fantastic! The white glove service being offered to everyone is incredibly generous. It's great to see such customer-focused support.

Ben

Harper Perez

Product Hunt user

MrScraper is a tool that helps you collect information from websites quickly and easily. Instead of fighting annoying captchas, MrScraper does the work for you. It can grab lots of data at once, saving you time and effort.

Ali

Jayesh Gohel

Product Hunt user

Now that I've set up and tested my first scraper, I'm really impressed. It was much easier than expected, and results worked out of the box, even on sites that are tough to scrape!

Kim Moser

Kim Moser

Computer consultant

MrScraper sounds like an incredibly useful tool for anyone looking to gather data at scale without the frustration of captcha blockers. The ability to get and scrape any data you need efficiently and effectively is a game-changer.

John

Nicola Lanzillot

Product Hunt user

Support

Head over to our community where you can engage with us and our community directly.

Questions? Ask our team via live chat 24/5 or just poke us on our official Twitter or our founder. We're always happy to help.