PyQuery bracket scraping is the technique of using PyQuery’s jQuery-style CSS selectors to parse tournament bracket HTML, extract structured match data like team names and scores, and follow pagination across multiple rounds or pages. If you’ve tried scraping Liquipedia’s tournament brackets with generic tutorials, you’ve probably hit a wall — the nested table structure doesn’t behave like a typical article page, and single-page scrapers fall apart the moment a tournament spans multiple rounds. This guide walks you through a complete, working Python script that handles all of it.
- Builds a Python script to extract bracket matchups, team names, and scores from Liquipedia
- Uses PyQuery’s CSS selector chaining to target nested bracket table elements
- Handles pagination across multi-round tournaments with a request loop
- Outputs structured data as a list of dicts, ready for pandas or JSON export
Why PyQuery Wins for Bracket Scraping
Bracket HTML is deeply nested. A single match block on Liquipedia contains a wrapper div, inner table rows, team name spans, and score cells — all stacked inside a bracket container. BeautifulSoup handles this with chained find_all() calls that get verbose fast. PyQuery lets you collapse that traversal into a single CSS selector string.
Compare these two approaches for selecting bracket team names:
# BeautifulSoup
for match in soup.find_all('div', class_='bracket-game'):
teams = match.find_all('div', class_=lambda c: c and 'bracket-team' in c)
# PyQuery
matches = doc('[class*="bracket-game"]')
teams = matches.find('[class*="bracket-team"]')
The PyQuery version reads like a CSS selector you’d write in browser DevTools. If you’ve spent any time with jQuery, the syntax feels immediate. For bracket structures with 4-5 levels of nesting, that difference compounds quickly.
Setting Up Your Environment
Install the two libraries you need before anything else:
pip install pyquery requests
Your import block for the full script looks like this:
import requests
import time
import json
from pyquery import PyQuery
HEADERS = {
"User-Agent": "Mozilla/5.0 (compatible; esports-bracket-scraper/1.0)"
}
Liquipedia rate-limits aggressive scrapers. Always set a descriptive User-Agent header and add delays between requests. Check Liquipedia’s robots.txt and terms of service before running any scraper against their site — this is a practitioner habit, not a legal disclaimer. Liquipedia also offers an API for bulk historical data; use that when you need large datasets and use direct HTML parsing when you need custom or real-time extraction.
Inspecting Liquipedia’s Bracket HTML Structure
Before writing a single selector, open the tournament page in Chrome DevTools and map the structure. Liquipedia uses consistent CSS classes across most game wikis, though CS2 and League of Legends bracket layouts have minor differences you’ll need to account for.
A stripped match block looks like this:
<div class="bracket-game">
<div class="bracket-team-top">
<span class="bracket-team-name">Team Liquid</span>
<span class="bracket-score">2</span>
</div>
<div class="bracket-team-bottom">
<span class="bracket-team-name">Natus Vincere</span>
<span class="bracket-score">0</span>
</div>
</div>
The key classes you’re targeting: .bracket-game wraps each individual match, .bracket-team-top and .bracket-team-bottom hold each competitor, and .bracket-score holds the score. Round labels typically sit in a .bracket-header element above the match group. Verify these in DevTools before running your script — Liquipedia occasionally updates markup between game wikis.
Fetching and Loading the Page
def fetch_page(url):
response = requests.get(url, headers=HEADERS)
response.raise_for_status()
return PyQuery(response.text)
doc = fetch_page("https://liquipedia.net/counterstrike/Major/2024/Bracket")
print(doc('title').text())
Passing response.text directly into PyQuery() initializes the document object. Calling doc('title').text() confirms you loaded the right page before running any extraction logic. If you see an empty string or an error page title, check your User-Agent header first.
Extracting Bracket Matchups with PyQuery Selectors
Use the following PyQuery selector to extract all bracket match blocks from a Liquipedia page:
matches = doc('.bracket-game')
From there, loop through each match and pull both competitors:
def extract_matches(doc, round_label="Unknown Round"):
results = []
for match in doc('.bracket-game').items():
team1_name = match.find('.bracket-team-top .bracket-team-name').text()
team1_score = match.find('.bracket-team-top .bracket-score').text()
team2_name = match.find('.bracket-team-bottom .bracket-team-name').text()
team2_score = match.find('.bracket-team-bottom .bracket-score').text()
if team1_name and team2_name:
results.append({
"round": round_label,
"team1": team1_name,
"team2": team2_name,
"score1": team1_score,
"score2": team2_score
})
return results
The .items() call iterates over each matched element as its own PyQuery object, which lets you chain .find() relative to that match block. This is where PyQuery’s .find() and .filter() differ: .find() searches descendants, while .filter() narrows the current selection. Use .find() here because you’re going deeper into the match block, not filtering the match list itself.
Round labels live in .bracket-header elements. Extract them with:
round_label = doc('.bracket-header').eq(0).text()
Handling Pagination Across Tournament Rounds
Multi-round tournaments on Liquipedia often split group stages and playoffs across separate pages. Pagination links typically appear as anchor tags with a class like .pagination-next or inside a navigation wrapper. Add a time.sleep() delay between every paginated request — two seconds is a reasonable minimum.
def scrape_all_rounds(start_url):
all_matches = []
current_url = start_url
while current_url:
doc = fetch_page(current_url)
round_label = doc('.bracket-header').eq(0).text() or "Round"
all_matches.extend(extract_matches(doc, round_label))
next_link = doc('a.pagination-next')
if next_link:
current_url = "https://liquipedia.net" + next_link.attr('href')
else:
current_url = None
time.sleep(2)
return all_matches
The loop continues until doc('a.pagination-next') returns an empty PyQuery object, which evaluates as falsy. Each page’s matches accumulate into all_matches. If you hit a 403 response mid-loop, increase your sleep delay and verify your User-Agent string is still set.
Structuring and Exporting the Extracted Data
matches = scrape_all_rounds("https://liquipedia.net/counterstrike/Major/2024/Bracket")
# Export to JSON
with open("bracket_results.json", "w") as f:
json.dump(matches, f, indent=2)
# Convert to pandas DataFrame
import pandas as pd
df = pd.DataFrame(matches)
print(df.head())
The list of dicts structure maps directly to a DataFrame without any reshaping. Each row represents one match with round, team, and score columns ready for filtering or aggregation. For bracket visualizations or stat pipelines, the JSON export gives you a portable format that any downstream tool can read.
Common Errors and How to Fix Them
- Empty
.text()returns: Your selector missed. Runprint(doc.html())and compare the actual HTML to your selector string in DevTools. Class names sometimes vary between CS2 and LoL bracket pages. - 403 errors: Liquipedia blocked the request. Add or correct your
User-Agentheader and increase thetime.sleep()delay between requests. - Missing scores for unplayed matches: Future bracket slots have empty score cells. Add a fallback:
team1_score = match.find('.bracket-score').eq(0).text() or "TBD". - Pagination loop doesn’t terminate: Check that the next-page selector matches the actual anchor class. Use
doc('a[class*="next"]')as a broader fallback ifa.pagination-nextreturns nothing.
Frequently Asked Questions
Can I scrape Liquipedia with Python?
Yes. Liquipedia’s HTML is static and accessible with requests plus a proper User-Agent header. Check their robots.txt first, and use their API for bulk data requests.
What is the best library for scraping tournament brackets?
PyQuery is the strongest choice for bracket HTML because its CSS selector syntax handles deeply nested table structures in fewer lines than BeautifulSoup’s find_all() chains.
How do I handle multi-page tournament results in Python?
Build a while loop that fetches each page, extracts matches, then checks for a next-page link. Stop the loop when no next link exists.
Does Liquipedia have an API I should use instead?
Yes. Liquipedia provides an API suited for bulk historical data pulls. Use direct HTML scraping with PyQuery when you need custom extraction logic or real-time bracket data the API doesn’t expose.
Your next step is extending this script to capture group stage standings alongside bracket results. PyQuery’s .filter() method is what you’ll reach for there — it narrows an existing selection by an additional condition, which is exactly what you need when a page mixes bracket tables with standings tables in the same DOM.

Ryan French is the driving force behind PyQuery.org, a leading platform dedicated to the PyQuery ecosystem. As the founder and chief editor, Ryan combines his extensive experience in the developer arena with a passion for sharing knowledge about PyQuery, a third-party Python package designed for parsing and extracting data from XML and HTML pages. Inspired by the jQuery JavaScript library, PyQuery boasts a similar syntax, enabling developers to manipulate document trees with ease and efficiency.