engineering

How to Scrape Twitter (X) Profiles with Python Using Playwright

Learn how to scrape Twitter (X) profiles using Python and Playwright with cookie-based authentication. Extract tweets, timestamps, likes, reposts, views, and more using a reliable, fully working scraper.
How to Scrape Twitter (X) Profiles with Python Using Playwright

Twitter — now known as X — has become one of the most valuable real-time data sources for tracking trends, analyzing conversations, doing market research, and understanding audience behavior. But manually copying tweet data is slow and impossible to scale when you need hundreds or thousands of entries.

This is where web scraping becomes essential.

In this guide, we’ll walk through how to scrape public X profiles using Python + Playwright, including how to authenticate using your own session cookies. Since X recently locked most content behind login, cookie-based authentication is the most reliable method.

We’ll cover everything from setting up your environment to extracting tweet text, timestamps, likes, reposts, views, and more.

Why Scrape Twitter Profiles?

X contains massive amounts of real-time public data, making it useful for:

  • Social media analytics
  • Sentiment analysis
  • Market and product research
  • Competitor monitoring
  • Trend tracking
  • Archiving public statements
  • AI and machine learning datasets

Public profiles are especially valuable because they provide curated, chronological activity from individuals, brands, and public figures.

But to access that data programmatically, we need authentication — and we’ll do it properly.

How Authentication Works on X

Since X blocks unauthenticated access to tweets, Playwright must load your login session using cookies.txt.

This works because:

  • Cookies represent your logged-in session
  • Playwright loads them into the browser context
  • X treats your scraper like a real user

A typical Netscape-style cookies.txt file looks like this:


.x.com    TRUE    /    TRUE    1798777260    auth_token    <value>
.x.com    TRUE    /    TRUE    1798548372    ct0           <value>

Once these are loaded, Playwright opens X as if you logged in manually.

Tools You Need

Install the required packages:

pip install playwright
playwright install

We will use:

  • Python (async)
  • Playwright for browser automation
  • cookies.txt for session authentication

Everything else runs directly in the script.

How This Scraper Works

1. Load Your Cookies

We parse cookies from cookies.txt (you can export this using any browser extension).

This makes X believe we’re a normal logged-in user.

2. Launch Playwright

A Chromium browser is opened — either headless or visible.

3. Navigate to the Target Profile

For example:

https://x.com/MrScraper_

We wait for the first set of tweets to load.

4. Auto-Scroll the Profile

X only loads a few tweets at first.

The script scrolls automatically to load more content.

5. Extract Tweets

For each <article> (tweet container), we extract:

  • Text
  • Timestamp
  • Replies
  • Reposts
  • Likes
  • Bookmarks
  • Views

Newer versions of X store stats inside a single ARIA label, e.g.:

aria-label="1629 replies, 4089 reposts, 29401 likes, 1066 bookmarks, 2035963 views"

We capture these using regex.

6. Save Everything to JSON

All output is stored in:

<username>_tweets.json

Perfect for analytics, dashboards, competitor research, and more.

Python Code: Full Working X Profile Scraper

import asyncio
from playwright.async_api import async_playwright
from datetime import datetime
import json
import os

# ------------------------------------------------------
# Convert cookies.txt → Playwright cookies
# ------------------------------------------------------
def parse_cookies_txt(path):
    cookies = []
    with open(path, "r") as f:
        for line in f:
            line = line.strip()
            if not line or line.startswith("#"):
                continue

            parts = line.split("\t")
            if len(parts) != 7:
                continue

            domain, include_sub, p, secure, expiry, name, value = parts

            cookies.append({
                "name": name,
                "value": value,
                "domain": domain.lstrip("."),
                "path": p,
                "expires": float(expiry),
                "secure": secure.upper() == "TRUE",
                "httpOnly": False,
                "sameSite": "Lax"
            })
    return cookies


# ------------------------------------------------------
# Scrape profile timeline
# ------------------------------------------------------
async def scrape_profile(username):
    url = f"https://x.com/{username}"

    cookies = parse_cookies_txt("cookies.txt")

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        context = await browser.new_context()

        # Load the user's session cookies
        await context.add_cookies(cookies)

        page = await context.new_page()

        print(f"Opening profile: {url}")
        await page.goto(url, wait_until="domcontentloaded", timeout=60000)

        # Give time for initial tweets to render
        await asyncio.sleep(3)

        print("Scrolling...")
        for _ in range(20):  # scroll deeper if needed
            await page.evaluate("window.scrollBy(0, document.body.scrollHeight)")
            await asyncio.sleep(4)

        print("Extracting tweets...")

        tweets = await page.evaluate("""
            () => {
                const out = [];

                document.querySelectorAll("article").forEach(a => {
                    try {
                        // ---- Extract tweet text ----
                        const text = Array.from(a.querySelectorAll("div[dir='auto'] span"))
                            .map(s => s.innerText)
                            .join(" ")
                            .trim();

                        // ---- Extract date ----
                        const date = a.querySelector("time")?.getAttribute("datetime") || null;

                        // ---- Extract stats ----
                        let replies = null, reposts = null, likes = null, bookmarks = null, views = null;

                        const group = a.querySelector("div[role='group'][aria-label]");

                        if (group) {
                            const label = group.getAttribute("aria-label");

                            const matchReplies   = label.match(/(\d[\\d,.KkM]*) replies/);
                            const matchReposts   = label.match(/(\d[\\d,.KkM]*) reposts/);
                            const matchLikes     = label.match(/(\d[\\d,.KkM]*) likes/);
                            const matchBookmarks = label.match(/(\d[\\d,.KkM]*) bookmarks/);
                            const matchViews     = label.match(/(\d[\\d,.KkM]*) views/);

                            replies   = matchReplies?.[1]   || null;
                            reposts   = matchReposts?.[1]   || null;
                            likes     = matchLikes?.[1]     || null;
                            bookmarks = matchBookmarks?.[1] || null;
                            views     = matchViews?.[1]     || null;
                        }

                        out.push({
                            text,
                            date,
                            replies,
                            reposts,
                            likes,
                            bookmarks,
                            views
                        });

                    } catch(e) {}
                });

                return out;
            }
        """)

        print(f"Collected {len(tweets)} tweets.")

        # Save as JSON
        file = f"{username}_tweets.json"
        with open(file, "w") as f:
            json.dump(tweets, f, indent=2)

        print("Saved to:", file)
        await browser.close()


# ------------------------------------------------------
# Run
# ------------------------------------------------------
if __name__ == "__main__":
    username = "MrScraper_"  # change this
    asyncio.run(scrape_profile(username))

Conclusion

Scraping X profiles today requires:

  • Authentication
  • A real browser automation tool
  • Stable selectors
  • Smart handling of X’s dynamic DOM

By combining Playwright with your session cookies, you can reliably collect tweet text, engagement metrics, timestamps, and more.

This method can be extended to:

  • Hashtag scraping
  • Full timeline scraping
  • Thread and reply extraction
  • Image/video scraping
  • Bookmark analytics
  • DM automation (with caution)

This scraper is flexible, stable, and ideal for real-world research and analytics.

Get started now!

Step up your web scraping

Try MrScraper Now

Find more insights here

How to Scrape a YouTube Channel with Python

How to Scrape a YouTube Channel with Python

Learn how to scrape YouTube channel videos using Python and Playwright. This guide covers scrolling, extracting titles, views, upload dates, and saving data as JSON—no API key required.

How to Scrape Amazon with Node.js: A Beginner-Friendly Guide

How to Scrape Amazon with Node.js: A Beginner-Friendly Guide

Learn how to scrape Amazon using Node.js and Puppeteer in a simple, beginner-friendly guide. This tutorial covers setup, scrolling, pagination, code examples, and tips for extracting product data safely and efficiently.

AI Web Scraping Tools: How Intelligent Scrapers Are Transforming Data Collection

AI Web Scraping Tools: How Intelligent Scrapers Are Transforming Data Collection

Discover how AI web scraping tools work, why they are replacing traditional scrapers, and how businesses use intelligent extraction to collect reliable web data.

What people think about scraper icon scraper

Net in hero

The mission to make data accessible to everyone is truly inspiring. With MrScraper, data scraping and automation are now easier than ever, giving users of all skill levels the ability to access valuable data. The AI-powered no-code tool simplifies the process, allowing you to extract data without needing technical skills. Plus, the integration with APIs and Zapier makes automation smooth and efficient, from data extraction to delivery.


I'm excited to see how MrScraper will change data access, making it simpler for businesses, researchers, and developers to unlock the full potential of their data. This tool can transform how we use data, saving time and resources while providing deeper insights.

John

Adnan Sher

Product Hunt user

This tool sounds fantastic! The white glove service being offered to everyone is incredibly generous. It's great to see such customer-focused support.

Ben

Harper Perez

Product Hunt user

MrScraper is a tool that helps you collect information from websites quickly and easily. Instead of fighting annoying captchas, MrScraper does the work for you. It can grab lots of data at once, saving you time and effort.

Ali

Jayesh Gohel

Product Hunt user

Now that I've set up and tested my first scraper, I'm really impressed. It was much easier than expected, and results worked out of the box, even on sites that are tough to scrape!

Kim Moser

Kim Moser

Computer consultant

MrScraper sounds like an incredibly useful tool for anyone looking to gather data at scale without the frustration of captcha blockers. The ability to get and scrape any data you need efficiently and effectively is a game-changer.

John

Nicola Lanzillot

Product Hunt user

Support

Head over to our community where you can engage with us and our community directly.

Questions? Ask our team via live chat 24/5 or just poke us on our official Twitter or our founder. We're always happy to help.