Understanding Selectors
CSS selectors are the foundation of web scraping. Learn how to precisely target any element on a webpage.
Basic Selectors
| Selector | Example | Selects |
|---|---|---|
element | h1 | All <h1> elements |
.class | .product-title | Elements with class="product-title" |
#id | #main-content | Element with id="main-content" |
[attribute] | [data-price] | Elements with data-price attribute |
[attr=value] | [type="submit"] | Elements where type="submit" |
Combinators
| Combinator | Example | Selects |
|---|---|---|
| Descendant (space) | div p | All p inside div |
| Child (>) | ul > li | Direct li children of ul |
| Adjacent (+) | h1 + p | First p after h1 |
| Sibling (~) | h1 ~ p | All p siblings after h1 |
Pseudo-Selectors
| Pseudo | Example | Selects |
|---|---|---|
| :first-child | li:first-child | First li in parent |
| :last-child | li:last-child | Last li in parent |
| :nth-child(n) | tr:nth-child(2) | Second tr in parent |
| :not() | p:not(.hidden) | p without class hidden |
| :contains() | a:contains("Buy") | Links containing "Buy" |
Scrpy Selector Extensions
Scrpy extends standard CSS selectors with powerful extraction modifiers:
| Extension | Example | Extracts |
|---|---|---|
| ::text | h1::text | Text content of h1 |
| ::attr(name) | a::attr(href) | href attribute value |
| ::html | .content::html | Inner HTML of element |
| ::all | li::all | Array of all matches |
Real-World Examples
E-commerce Product
{
"selectors": {
"title": "h1.product-name::text",
"price": "[data-price]::attr(data-price)",
"originalPrice": ".price-was::text",
"images": ".gallery img::attr(src)::all",
"description": ".description p::text",
"rating": ".star-rating::attr(data-rating)",
"reviewCount": ".review-count::text"
}
}Job Listing
{
"selectors": {
"title": "h1.job-title::text",
"company": ".company-name a::text",
"location": ".job-location::text",
"salary": ".salary-range::text",
"description": ".job-description::html",
"requirements": ".requirements li::text::all",
"postedDate": "time::attr(datetime)"
}
}News Article
{
"selectors": {
"headline": "article h1::text",
"author": "[rel='author']::text",
"publishDate": "article time::attr(datetime)",
"content": "article .body p::text::all",
"tags": ".tags a::text::all",
"image": "article figure img::attr(src)"
}
}Tips for Better Selectors
Be Specific, Not Fragile
Use semantic classes over deeply nested selectors. .product-price is better than div > div > span:nth-child(2).
Use Data Attributes
Sites often store structured data in attributes like data-price or data-sku. These are more reliable than text content.
Test in DevTools First
Use document.querySelectorAll('your-selector') in the browser console to test selectors before using them with Scrpy.