Web Scraping with Go: A Developer’s Guide
Article

Web Scraping with Go: A Developer’s Guide

Engineering

Learn how to build web scrapers using Go (Golang). This guide covers net/http, goquery, Colly, concurrency, and best practices for scalable scraping.

Go (also called Golang) is a modern programming language designed with performance, simplicity, and concurrency in mind. Because of its strong standard library, built-in concurrency support, and efficient execution, Go is a natural choice for building web scrapers—tools that fetch web pages and extract useful information.

In this article, we’ll introduce the core concepts of web scraping in Go, show how to build scrapers using both the standard library and popular third-party tools, and explain when each approach makes sense.

What Makes Go Suitable for Web Scraping?

Go has several characteristics that make it well suited for web scraping:

  • A fast compiler and lightweight runtime for efficient execution at scale
  • Strong concurrency support with goroutines and channels
  • A solid standard library for HTTP requests and HTML parsing
  • A growing ecosystem of third-party scraping libraries
  • Built-in tooling and dependency management with go mod

Building a Basic Scraper Using Go’s Standard Library

You don’t need external tools to start scraping. Go’s standard library lets you fetch and parse HTML with minimal dependencies.

Step 1 — Fetching a Web Page

Use Go’s net/http package to request HTML content:

package main

import (
    "fmt"
    "net/http"
    "io"
)

func main() {
    resp, err := http.Get("https://example.com")
    if err != nil {
        fmt.Println("Request failed:", err)
        return
    }
    defer resp.Body.Close()

    body, _ := io.ReadAll(resp.Body)
    fmt.Println(string(body))
}

This code sends an HTTP GET request and prints the response body as raw HTML.

Step 2 — Parsing HTML for Links

To extract specific elements like links (<a href="...">), you can use the golang.org/x/net/html package:

import (
    "fmt"
    "golang.org/x/net/html"
    "net/http"
)

func findLinks(n *html.Node) []string {
    links := []string{}
    if n.Type == html.ElementNode && n.Data == "a" {
        for _, attr := range n.Attr {
            if attr.Key == "href" {
                links = append(links, attr.Val)
            }
        }
    }
    for c := n.FirstChild; c != nil; c = c.NextSibling {
        links = append(links, findLinks(c)...)
    }
    return links
}

func main() {
    resp, _ := http.Get("https://example.com")
    defer resp.Body.Close()
    root, _ := html.Parse(resp.Body)

    for _, link := range findLinks(root) {
        fmt.Println(link)
    }
}

This function recursively walks the HTML node tree and collects all href attributes.

Leveraging Third-Party Tools: goquery

While native parsing works, third-party libraries like goquery offer a jQuery-like API that makes extraction easier.

Installing goquery

Initialize your module and install goquery:

go mod init my-scraper
go get github.com/PuerkitoBio/goquery

Using goquery to Extract Data

This example prints all <h1> text from a page:

package main

import (
    "fmt"
    "net/http"
    "github.com/PuerkitoBio/goquery"
)

func main() {
    res, err := http.Get("https://example.com")
    if err != nil {
        fmt.Println("Request error:", err)
        return
    }
    defer res.Body.Close()

    doc, err := goquery.NewDocumentFromReader(res.Body)
    if err != nil {
        fmt.Println("Parsing error:", err)
        return
    }

    doc.Find("h1").Each(func(i int, s *goquery.Selection) {
        fmt.Println("Header:", s.Text())
    })
}

goquery wraps Go’s HTML parser with CSS selectors, making extraction more readable and maintainable.

Building Advanced Scrapers With Colly

For more complex tasks—such as crawling multiple pages, managing sessions, and controlling request behavior—Colly is a popular Go framework.

Installing Colly

go get -u github.com/gocolly/colly/...

Basic Colly Example

This script scrapes all links from a Wikipedia page section:

package main

import (
    "fmt"
    "github.com/gocolly/colly"
)

func main() {
    c := colly.NewCollector()

    c.OnHTML(".mw-parser-output", func(e *colly.HTMLElement) {
        links := e.ChildAttrs("a", "href")
        fmt.Println(links)
    })

    c.Visit("https://en.wikipedia.org/wiki/Web_scraping")
}

What this does:

  • Creates a new Colly collector
  • Registers an HTML callback using a CSS selector
  • Visits the page and extracts all matching links

Colly reduces boilerplate and supports rate limiting, cookies, retries, and more.

Performance and Concurrency

Go’s concurrency model is ideal for scraping large numbers of pages:

  • Goroutines enable parallel HTTP requests
  • Channels help coordinate data flow safely
  • No need to manage threads manually

With proper rate limiting and synchronization, Go scrapers can handle high-throughput workloads efficiently.

Handling Dynamic or Protected Content

Go-based scrapers work best for static content, but modern sites often rely on JavaScript or protected APIs. In such cases, you can:

  • Reverse-engineer underlying API requests instead of scraping HTML
  • Use browser automation tools like chromedp or headless Chrome bindings
  • Combine Go scrapers with rendering services for JavaScript-heavy pages

Each approach involves trade-offs between performance and complexity.

MrScraper: A Managed Scraping Solution

For teams that want to avoid maintaining scraping infrastructure, managed solutions can simplify development.

MrScraper provides:

  • Automatic proxy rotation and anti-bot handling
  • JavaScript rendering support
  • Clean, structured JSON output
  • API-based scraping jobs that integrate easily with Go applications

Instead of managing retries, IPs, and browser automation yourself, you can call MrScraper’s API and focus on using the data.

Conclusion

Web scraping with Go offers a strong balance of performance and flexibility. You can start with Go’s standard libraries for simple scrapers, use goquery for easier DOM traversal, and scale up with Colly for advanced crawling tasks. Go’s concurrency model makes it well suited for high-throughput scraping, and when JavaScript rendering or anti-bot measures are required, browser automation or managed scraping services can fill the gap. With the right tools and patterns, Go enables reliable and scalable data extraction for modern applications.

Table of Contents

    Take a Taste of Easy Scraping!