How to Web Scrape a Table in Python: From Static HTML to Dynamic Pages
Web scraping tables is one of the most practical ways to collect structured data from websites—whether it’s financial statistics, sports results, academic records, or product lists. In this guide, we’ll explore how to web scrape a table in Python, using both simple and advanced methods, with examples tailored to real-world use cases.
1. The Quick Way: Using pandas.read_html()
The easiest method for scraping tables is with pandas.read_html(), which automatically detects and converts HTML tables into Pandas DataFrames.
import pandas as pd
url = "https://en.wikipedia.org/wiki/Demographics_of_India"
tables = pd.read_html(url, match="Population distribution")
df = tables[0]
print(df.head())
- This method uses
BeautifulSoupandlxmlunder the hood. - The
matchparameter helps target a specific table.
Pros: Extremely fast and simple. Cons: Only works on static HTML tables.
2. More Control: BeautifulSoup + Requests
If you need finer control or want to clean the data during extraction, combining requests with BeautifulSoup is a reliable approach.
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://datatables.net/examples/styling/stripe.html"
resp = requests.get(url)
soup = BeautifulSoup(resp.text, "html.parser")
table = soup.find("table", class_="stripe")
rows = []
for tr in table.tbody.find_all("tr"):
cells = [td.get_text(strip=True) for td in tr.find_all("td")]
rows.append(cells)
df = pd.DataFrame(rows, columns=[th.get_text() for th in table.thead.find_all("th")])
print(df.head())
This is helpful when:
- The table is nested inside custom HTML structures.
- You want to customize how rows and columns are parsed.
3. Scraping Dynamic Tables with Selenium
If the table is loaded dynamically using JavaScript (AJAX), then a static HTML parser won’t work. In this case, you can use Selenium to load and render the page as a browser would.
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
driver = webdriver.Chrome()
driver.get("https://example.com/dynamic_table")
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table", id="myTable")
df = pd.read_html(str(table))[0]
driver.quit()
print(df.head())
Pros: Can handle JavaScript-heavy websites. Cons: Slower, requires browser drivers like ChromeDriver.
4. Accessing Hidden APIs Behind Tables
Sometimes the table content is not hardcoded into the HTML but fetched from an API in the background. This is actually a more efficient way to extract data:
- Open DevTools → Network → XHR/Fetch
- Locate the API URL used to load table data
- Use
requests.get()to retrieve JSON data
import requests
import pandas as pd
api = "https://www.levantineceramics.org/vessels/datatable.json"
data = requests.get(api).json()
df = pd.DataFrame(data["data"])
print(df.head())
Pros: Fast and clean. Cons: Requires inspecting the site’s network calls.
5. Scalable Scraping with Scrapy
If you're building a large-scale scraper or need asynchronous performance, Scrapy is a powerful Python framework for crawling and extracting data.
import scrapy
class TableSpider(scrapy.Spider):
name = "table_spider"
start_urls = ["https://example.com/page_with_table"]
def parse(self, response):
for row in response.xpath('//table//tr'):
yield {
'column1': row.xpath('td[1]/text()').get(),
'column2': row.xpath('td[2]/text()').get()
}
Pros: Great for multiple pages, built-in pipelines. Cons: More complex setup and learning curve.
Comparison Table
| Need | Method | Pros | Cons |
|---|---|---|---|
| Simple HTML tables | pandas.read_html() |
Fast and beginner-friendly | Only works on static content |
| Custom structure | BeautifulSoup + requests | High control, clean data | More code required |
| JavaScript tables | Selenium | Can render dynamic content | Slower, heavier setup |
| Background API | Direct API request | Fast and efficient | Requires DevTools inspection |
| Large-scale scraping | Scrapy | Scalable and async | Advanced setup |
Responsible Scraping
Before scraping, always:
- Check
robots.txtand the site’s Terms of Service - Use rate limiting to avoid overloading the server
- Add headers like user-agent to mimic a browser
- Use proxies or headless browsing to avoid blocks
No-Code Scraping with MrScraper
If coding isn’t your thing—or you need to extract tables from difficult or protected websites—use MrScraper.
MrScraper is a visual, AI-powered web scraping tool that makes it easy to:
- Extract tables with just a few clicks
- Scrape JavaScript-rendered pages
- Export to CSV or JSON
- Use proxy rotation and CAPTCHA bypass automatically
Whether you're scraping product lists, public records, or movie data, MrScraper handles the hard part for you—no code required.
Conclusion
Learning how to web scrape a table in Python opens up a world of possibilities for data analysis, automation, and research. Whether you’re scraping a static table from Wikipedia or a dynamic one from an e-commerce site, Python offers flexible tools to make the job easier.
And for those who want the simplest, most efficient solution, MrScraper helps you collect structured data from any website—without touching a line of code.
Ready to scrape your first table? Try MrScraper today.
Table of Contents
Take a Taste of Easy Scraping!
Get started now!
Step up your web scraping
Find more insights here
Data Parsing Explained: Definition, Benefits, and Real Use Cases
Data parsing is the process of extracting and converting raw information into structured data. Learn how it works, common methods, and why parsing is essential for automation, analytics, and modern data workflows.
A Practical Guide to Using SEO Proxies for Search Engine Optimization
SEO proxies help marketers collect accurate ranking data, scrape SERPs safely, and perform automated SEO tasks without IP blocks. Learn how they work, why they matter, and the best practices for using them effectively.
Understanding Raw Data: A Beginner Friendly Overview
Raw data is unprocessed information collected directly from a source before any cleaning or analysis. Learn how raw data works, why it's essential for analytics, and how organizations transform it into valuable insights.
@MrScraper_
@MrScraper