How to Scrape Twitter (X) Profiles with Python Using Playwright
Twitter — now known as X — has become one of the most valuable real-time data sources for tracking trends, analyzing conversations, doing market research, and understanding audience behavior. But manually copying tweet data is slow and impossible to scale when you need hundreds or thousands of entries.
This is where web scraping becomes essential.
In this guide, we’ll walk through how to scrape public X profiles using Python + Playwright, including how to authenticate using your own session cookies. Since X recently locked most content behind login, cookie-based authentication is the most reliable method.
We’ll cover everything from setting up your environment to extracting tweet text, timestamps, likes, reposts, views, and more.
Why Scrape Twitter Profiles?
X contains massive amounts of real-time public data, making it useful for:
- Social media analytics
- Sentiment analysis
- Market and product research
- Competitor monitoring
- Trend tracking
- Archiving public statements
- AI and machine learning datasets
Public profiles are especially valuable because they provide curated, chronological activity from individuals, brands, and public figures.
But to access that data programmatically, we need authentication — and we’ll do it properly.
How Authentication Works on X
Since X blocks unauthenticated access to tweets, Playwright must load your login session using cookies.txt.
This works because:
- Cookies represent your logged-in session
- Playwright loads them into the browser context
- X treats your scraper like a real user
A typical Netscape-style cookies.txt file looks like this:
.x.com TRUE / TRUE 1798777260 auth_token <value>
.x.com TRUE / TRUE 1798548372 ct0 <value>
Once these are loaded, Playwright opens X as if you logged in manually.
Tools You Need
Install the required packages:
pip install playwright
playwright install
We will use:
- Python (async)
- Playwright for browser automation
- cookies.txt for session authentication
Everything else runs directly in the script.
How This Scraper Works
1. Load Your Cookies
We parse cookies from cookies.txt (you can export this using any browser extension).
This makes X believe we’re a normal logged-in user.
2. Launch Playwright
A Chromium browser is opened — either headless or visible.
3. Navigate to the Target Profile
For example:
https://x.com/MrScraper_
We wait for the first set of tweets to load.
4. Auto-Scroll the Profile
X only loads a few tweets at first.
The script scrolls automatically to load more content.
5. Extract Tweets
For each <article> (tweet container), we extract:
- Text
- Timestamp
- Replies
- Reposts
- Likes
- Bookmarks
- Views
Newer versions of X store stats inside a single ARIA label, e.g.:
aria-label="1629 replies, 4089 reposts, 29401 likes, 1066 bookmarks, 2035963 views"
We capture these using regex.
6. Save Everything to JSON
All output is stored in:
<username>_tweets.json
Perfect for analytics, dashboards, competitor research, and more.
Python Code: Full Working X Profile Scraper
import asyncio
from playwright.async_api import async_playwright
from datetime import datetime
import json
import os
# ------------------------------------------------------
# Convert cookies.txt → Playwright cookies
# ------------------------------------------------------
def parse_cookies_txt(path):
cookies = []
with open(path, "r") as f:
for line in f:
line = line.strip()
if not line or line.startswith("#"):
continue
parts = line.split("\t")
if len(parts) != 7:
continue
domain, include_sub, p, secure, expiry, name, value = parts
cookies.append({
"name": name,
"value": value,
"domain": domain.lstrip("."),
"path": p,
"expires": float(expiry),
"secure": secure.upper() == "TRUE",
"httpOnly": False,
"sameSite": "Lax"
})
return cookies
# ------------------------------------------------------
# Scrape profile timeline
# ------------------------------------------------------
async def scrape_profile(username):
url = f"https://x.com/{username}"
cookies = parse_cookies_txt("cookies.txt")
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
context = await browser.new_context()
# Load the user's session cookies
await context.add_cookies(cookies)
page = await context.new_page()
print(f"Opening profile: {url}")
await page.goto(url, wait_until="domcontentloaded", timeout=60000)
# Give time for initial tweets to render
await asyncio.sleep(3)
print("Scrolling...")
for _ in range(20): # scroll deeper if needed
await page.evaluate("window.scrollBy(0, document.body.scrollHeight)")
await asyncio.sleep(4)
print("Extracting tweets...")
tweets = await page.evaluate("""
() => {
const out = [];
document.querySelectorAll("article").forEach(a => {
try {
// ---- Extract tweet text ----
const text = Array.from(a.querySelectorAll("div[dir='auto'] span"))
.map(s => s.innerText)
.join(" ")
.trim();
// ---- Extract date ----
const date = a.querySelector("time")?.getAttribute("datetime") || null;
// ---- Extract stats ----
let replies = null, reposts = null, likes = null, bookmarks = null, views = null;
const group = a.querySelector("div[role='group'][aria-label]");
if (group) {
const label = group.getAttribute("aria-label");
const matchReplies = label.match(/(\d[\\d,.KkM]*) replies/);
const matchReposts = label.match(/(\d[\\d,.KkM]*) reposts/);
const matchLikes = label.match(/(\d[\\d,.KkM]*) likes/);
const matchBookmarks = label.match(/(\d[\\d,.KkM]*) bookmarks/);
const matchViews = label.match(/(\d[\\d,.KkM]*) views/);
replies = matchReplies?.[1] || null;
reposts = matchReposts?.[1] || null;
likes = matchLikes?.[1] || null;
bookmarks = matchBookmarks?.[1] || null;
views = matchViews?.[1] || null;
}
out.push({
text,
date,
replies,
reposts,
likes,
bookmarks,
views
});
} catch(e) {}
});
return out;
}
""")
print(f"Collected {len(tweets)} tweets.")
# Save as JSON
file = f"{username}_tweets.json"
with open(file, "w") as f:
json.dump(tweets, f, indent=2)
print("Saved to:", file)
await browser.close()
# ------------------------------------------------------
# Run
# ------------------------------------------------------
if __name__ == "__main__":
username = "MrScraper_" # change this
asyncio.run(scrape_profile(username))
Conclusion
Scraping X profiles today requires:
- Authentication
- A real browser automation tool
- Stable selectors
- Smart handling of X’s dynamic DOM
By combining Playwright with your session cookies, you can reliably collect tweet text, engagement metrics, timestamps, and more.
This method can be extended to:
- Hashtag scraping
- Full timeline scraping
- Thread and reply extraction
- Image/video scraping
- Bookmark analytics
- DM automation (with caution)
This scraper is flexible, stable, and ideal for real-world research and analytics.
Table of Contents
Take a Taste of Easy Scraping!
Get started now!
Step up your web scraping
Find more insights here
How to Scrape a YouTube Channel with Python
Learn how to scrape YouTube channel videos using Python and Playwright. This guide covers scrolling, extracting titles, views, upload dates, and saving data as JSON—no API key required.
How to Scrape Amazon with Node.js: A Beginner-Friendly Guide
Learn how to scrape Amazon using Node.js and Puppeteer in a simple, beginner-friendly guide. This tutorial covers setup, scrolling, pagination, code examples, and tips for extracting product data safely and efficiently.
AI Web Scraping Tools: How Intelligent Scrapers Are Transforming Data Collection
Discover how AI web scraping tools work, why they are replacing traditional scrapers, and how businesses use intelligent extraction to collect reliable web data.
@MrScraper_
@MrScraper