Extracting Esports Brackets and Match Results with PyQuery and Pagination

PyQuery bracket scraping is the technique of using PyQuery’s jQuery-style CSS selectors to parse tournament bracket HTML, extract structured match data like team names and scores, and follow pagination across multiple rounds or pages. If you’ve tried scraping Liquipedia’s tournament brackets with generic tutorials, you’ve probably hit a wall — the nested table structure doesn’t behave like a typical article page, and single-page scrapers fall apart the moment a tournament spans multiple rounds. This guide walks you through a complete, working Python script that handles all of it.

TL;DR

  • Builds a Python script to extract bracket matchups, team names, and scores from Liquipedia
  • Uses PyQuery’s CSS selector chaining to target nested bracket table elements
  • Handles pagination across multi-round tournaments with a request loop
  • Outputs structured data as a list of dicts, ready for pandas or JSON export

Why PyQuery Wins for Bracket Scraping

Bracket HTML is deeply nested. A single match block on Liquipedia contains a wrapper div, inner table rows, team name spans, and score cells — all stacked inside a bracket container. BeautifulSoup handles this with chained find_all() calls that get verbose fast. PyQuery lets you collapse that traversal into a single CSS selector string.

Compare these two approaches for selecting bracket team names:

# BeautifulSoup
for match in soup.find_all('div', class_='bracket-game'):
    teams = match.find_all('div', class_=lambda c: c and 'bracket-team' in c)

# PyQuery
matches = doc('[class*="bracket-game"]')
teams = matches.find('[class*="bracket-team"]')

The PyQuery version reads like a CSS selector you’d write in browser DevTools. If you’ve spent any time with jQuery, the syntax feels immediate. For bracket structures with 4-5 levels of nesting, that difference compounds quickly.

Setting Up Your Environment

Install the two libraries you need before anything else:

pip install pyquery requests

Your import block for the full script looks like this:

import requests
import time
import json
from pyquery import PyQuery

HEADERS = {
    "User-Agent": "Mozilla/5.0 (compatible; esports-bracket-scraper/1.0)"
}

Liquipedia rate-limits aggressive scrapers. Always set a descriptive User-Agent header and add delays between requests. Check Liquipedia’s robots.txt and terms of service before running any scraper against their site — this is a practitioner habit, not a legal disclaimer. Liquipedia also offers an API for bulk historical data; use that when you need large datasets and use direct HTML parsing when you need custom or real-time extraction.

Inspecting Liquipedia’s Bracket HTML Structure

Before writing a single selector, open the tournament page in Chrome DevTools and map the structure. Liquipedia uses consistent CSS classes across most game wikis, though CS2 and League of Legends bracket layouts have minor differences you’ll need to account for.

A stripped match block looks like this:

<div class="bracket-game">
  <div class="bracket-team-top">
    <span class="bracket-team-name">Team Liquid</span>
    <span class="bracket-score">2</span>
  </div>
  <div class="bracket-team-bottom">
    <span class="bracket-team-name">Natus Vincere</span>
    <span class="bracket-score">0</span>
  </div>
</div>

The key classes you’re targeting: .bracket-game wraps each individual match, .bracket-team-top and .bracket-team-bottom hold each competitor, and .bracket-score holds the score. Round labels typically sit in a .bracket-header element above the match group. Verify these in DevTools before running your script — Liquipedia occasionally updates markup between game wikis.

Fetching and Loading the Page

def fetch_page(url):
    response = requests.get(url, headers=HEADERS)
    response.raise_for_status()
    return PyQuery(response.text)

doc = fetch_page("https://liquipedia.net/counterstrike/Major/2024/Bracket")
print(doc('title').text())

Passing response.text directly into PyQuery() initializes the document object. Calling doc('title').text() confirms you loaded the right page before running any extraction logic. If you see an empty string or an error page title, check your User-Agent header first.

Extracting Bracket Matchups with PyQuery Selectors

Use the following PyQuery selector to extract all bracket match blocks from a Liquipedia page:

matches = doc('.bracket-game')

From there, loop through each match and pull both competitors:

def extract_matches(doc, round_label="Unknown Round"):
    results = []
    for match in doc('.bracket-game').items():
        team1_name = match.find('.bracket-team-top .bracket-team-name').text()
        team1_score = match.find('.bracket-team-top .bracket-score').text()
        team2_name = match.find('.bracket-team-bottom .bracket-team-name').text()
        team2_score = match.find('.bracket-team-bottom .bracket-score').text()

        if team1_name and team2_name:
            results.append({
                "round": round_label,
                "team1": team1_name,
                "team2": team2_name,
                "score1": team1_score,
                "score2": team2_score
            })
    return results

The .items() call iterates over each matched element as its own PyQuery object, which lets you chain .find() relative to that match block. This is where PyQuery’s .find() and .filter() differ: .find() searches descendants, while .filter() narrows the current selection. Use .find() here because you’re going deeper into the match block, not filtering the match list itself.

Round labels live in .bracket-header elements. Extract them with:

round_label = doc('.bracket-header').eq(0).text()

Handling Pagination Across Tournament Rounds

Multi-round tournaments on Liquipedia often split group stages and playoffs across separate pages. Pagination links typically appear as anchor tags with a class like .pagination-next or inside a navigation wrapper. Add a time.sleep() delay between every paginated request — two seconds is a reasonable minimum.

def scrape_all_rounds(start_url):
    all_matches = []
    current_url = start_url

    while current_url:
        doc = fetch_page(current_url)
        round_label = doc('.bracket-header').eq(0).text() or "Round"
        all_matches.extend(extract_matches(doc, round_label))

        next_link = doc('a.pagination-next')
        if next_link:
            current_url = "https://liquipedia.net" + next_link.attr('href')
        else:
            current_url = None

        time.sleep(2)

    return all_matches

The loop continues until doc('a.pagination-next') returns an empty PyQuery object, which evaluates as falsy. Each page’s matches accumulate into all_matches. If you hit a 403 response mid-loop, increase your sleep delay and verify your User-Agent string is still set.

Structuring and Exporting the Extracted Data

matches = scrape_all_rounds("https://liquipedia.net/counterstrike/Major/2024/Bracket")

# Export to JSON
with open("bracket_results.json", "w") as f:
    json.dump(matches, f, indent=2)

# Convert to pandas DataFrame
import pandas as pd
df = pd.DataFrame(matches)
print(df.head())

The list of dicts structure maps directly to a DataFrame without any reshaping. Each row represents one match with round, team, and score columns ready for filtering or aggregation. For bracket visualizations or stat pipelines, the JSON export gives you a portable format that any downstream tool can read.

Common Errors and How to Fix Them

  • Empty .text() returns: Your selector missed. Run print(doc.html()) and compare the actual HTML to your selector string in DevTools. Class names sometimes vary between CS2 and LoL bracket pages.
  • 403 errors: Liquipedia blocked the request. Add or correct your User-Agent header and increase the time.sleep() delay between requests.
  • Missing scores for unplayed matches: Future bracket slots have empty score cells. Add a fallback: team1_score = match.find('.bracket-score').eq(0).text() or "TBD".
  • Pagination loop doesn’t terminate: Check that the next-page selector matches the actual anchor class. Use doc('a[class*="next"]') as a broader fallback if a.pagination-next returns nothing.

Frequently Asked Questions

Can I scrape Liquipedia with Python?

Yes. Liquipedia’s HTML is static and accessible with requests plus a proper User-Agent header. Check their robots.txt first, and use their API for bulk data requests.

What is the best library for scraping tournament brackets?

PyQuery is the strongest choice for bracket HTML because its CSS selector syntax handles deeply nested table structures in fewer lines than BeautifulSoup’s find_all() chains.

How do I handle multi-page tournament results in Python?

Build a while loop that fetches each page, extracts matches, then checks for a next-page link. Stop the loop when no next link exists.

Does Liquipedia have an API I should use instead?

Yes. Liquipedia provides an API suited for bulk historical data pulls. Use direct HTML scraping with PyQuery when you need custom extraction logic or real-time bracket data the API doesn’t expose.

Your next step is extending this script to capture group stage standings alongside bracket results. PyQuery’s .filter() method is what you’ll reach for there — it narrows an existing selection by an additional condition, which is exactly what you need when a page mixes bracket tables with standings tables in the same DOM.