Web Scraping with Go: A Developer’s Guide
EngineeringLearn how to build web scrapers using Go (Golang). This guide covers net/http, goquery, Colly, concurrency, and best practices for scalable scraping.
Go (also called Golang) is a modern programming language designed with performance, simplicity, and concurrency in mind. Because of its strong standard library, built-in concurrency support, and efficient execution, Go is a natural choice for building web scrapers—tools that fetch web pages and extract useful information.
In this article, we’ll introduce the core concepts of web scraping in Go, show how to build scrapers using both the standard library and popular third-party tools, and explain when each approach makes sense.
What Makes Go Suitable for Web Scraping?
Go has several characteristics that make it well suited for web scraping:
- A fast compiler and lightweight runtime for efficient execution at scale
- Strong concurrency support with goroutines and channels
- A solid standard library for HTTP requests and HTML parsing
- A growing ecosystem of third-party scraping libraries
- Built-in tooling and dependency management with
go mod
Building a Basic Scraper Using Go’s Standard Library
You don’t need external tools to start scraping. Go’s standard library lets you fetch and parse HTML with minimal dependencies.
Step 1 — Fetching a Web Page
Use Go’s net/http package to request HTML content:
package main
import (
"fmt"
"net/http"
"io"
)
func main() {
resp, err := http.Get("https://example.com")
if err != nil {
fmt.Println("Request failed:", err)
return
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
fmt.Println(string(body))
}
This code sends an HTTP GET request and prints the response body as raw HTML.
Step 2 — Parsing HTML for Links
To extract specific elements like links (<a href="...">), you can use the golang.org/x/net/html package:
import (
"fmt"
"golang.org/x/net/html"
"net/http"
)
func findLinks(n *html.Node) []string {
links := []string{}
if n.Type == html.ElementNode && n.Data == "a" {
for _, attr := range n.Attr {
if attr.Key == "href" {
links = append(links, attr.Val)
}
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
links = append(links, findLinks(c)...)
}
return links
}
func main() {
resp, _ := http.Get("https://example.com")
defer resp.Body.Close()
root, _ := html.Parse(resp.Body)
for _, link := range findLinks(root) {
fmt.Println(link)
}
}
This function recursively walks the HTML node tree and collects all href attributes.
Leveraging Third-Party Tools: goquery
While native parsing works, third-party libraries like goquery offer a jQuery-like API that makes extraction easier.
Installing goquery
Initialize your module and install goquery:
go mod init my-scraper
go get github.com/PuerkitoBio/goquery
Using goquery to Extract Data
This example prints all <h1> text from a page:
package main
import (
"fmt"
"net/http"
"github.com/PuerkitoBio/goquery"
)
func main() {
res, err := http.Get("https://example.com")
if err != nil {
fmt.Println("Request error:", err)
return
}
defer res.Body.Close()
doc, err := goquery.NewDocumentFromReader(res.Body)
if err != nil {
fmt.Println("Parsing error:", err)
return
}
doc.Find("h1").Each(func(i int, s *goquery.Selection) {
fmt.Println("Header:", s.Text())
})
}
goquery wraps Go’s HTML parser with CSS selectors, making extraction more readable and maintainable.
Building Advanced Scrapers With Colly
For more complex tasks—such as crawling multiple pages, managing sessions, and controlling request behavior—Colly is a popular Go framework.
Installing Colly
go get -u github.com/gocolly/colly/...
Basic Colly Example
This script scrapes all links from a Wikipedia page section:
package main
import (
"fmt"
"github.com/gocolly/colly"
)
func main() {
c := colly.NewCollector()
c.OnHTML(".mw-parser-output", func(e *colly.HTMLElement) {
links := e.ChildAttrs("a", "href")
fmt.Println(links)
})
c.Visit("https://en.wikipedia.org/wiki/Web_scraping")
}
What this does:
- Creates a new Colly collector
- Registers an HTML callback using a CSS selector
- Visits the page and extracts all matching links
Colly reduces boilerplate and supports rate limiting, cookies, retries, and more.
Performance and Concurrency
Go’s concurrency model is ideal for scraping large numbers of pages:
- Goroutines enable parallel HTTP requests
- Channels help coordinate data flow safely
- No need to manage threads manually
With proper rate limiting and synchronization, Go scrapers can handle high-throughput workloads efficiently.
Handling Dynamic or Protected Content
Go-based scrapers work best for static content, but modern sites often rely on JavaScript or protected APIs. In such cases, you can:
- Reverse-engineer underlying API requests instead of scraping HTML
- Use browser automation tools like
chromedpor headless Chrome bindings - Combine Go scrapers with rendering services for JavaScript-heavy pages
Each approach involves trade-offs between performance and complexity.
MrScraper: A Managed Scraping Solution
For teams that want to avoid maintaining scraping infrastructure, managed solutions can simplify development.
MrScraper provides:
- Automatic proxy rotation and anti-bot handling
- JavaScript rendering support
- Clean, structured JSON output
- API-based scraping jobs that integrate easily with Go applications
Instead of managing retries, IPs, and browser automation yourself, you can call MrScraper’s API and focus on using the data.
Conclusion
Web scraping with Go offers a strong balance of performance and flexibility. You can start with Go’s standard libraries for simple scrapers, use goquery for easier DOM traversal, and scale up with Colly for advanced crawling tasks. Go’s concurrency model makes it well suited for high-throughput scraping, and when JavaScript rendering or anti-bot measures are required, browser automation or managed scraping services can fill the gap. With the right tools and patterns, Go enables reliable and scalable data extraction for modern applications.
Find more insights here
Web Scraping with Node.js: A Practical Developer Guide
Learn how to scrape websites using Node.js with practical examples. This guide covers Axios, Cheerio...
Scraping Tool: What It Is, How It Works, and How to Choose the Right One
Learn what a scraping tool is, how web scraping tools work, common use cases, and how to choose the...
Web Scraping in C++: A Detailed Guide for Developers
Learn how to build a web scraper in C++ using libcurl and libxml2. This guide covers HTTP requests,...