Understanding Selectors

CSS selectors are the foundation of web scraping. Learn how to precisely target any element on a webpage.

Basic Selectors

SelectorExampleSelects
elementh1All <h1> elements
.class.product-titleElements with class="product-title"
#id#main-contentElement with id="main-content"
[attribute][data-price]Elements with data-price attribute
[attr=value][type="submit"]Elements where type="submit"

Combinators

CombinatorExampleSelects
Descendant (space)div pAll p inside div
Child (>)ul > liDirect li children of ul
Adjacent (+)h1 + pFirst p after h1
Sibling (~)h1 ~ pAll p siblings after h1

Pseudo-Selectors

PseudoExampleSelects
:first-childli:first-childFirst li in parent
:last-childli:last-childLast li in parent
:nth-child(n)tr:nth-child(2)Second tr in parent
:not()p:not(.hidden)p without class hidden
:contains()a:contains("Buy")Links containing "Buy"

Scrpy Selector Extensions

Scrpy extends standard CSS selectors with powerful extraction modifiers:

ExtensionExampleExtracts
::texth1::textText content of h1
::attr(name)a::attr(href)href attribute value
::html.content::htmlInner HTML of element
::allli::allArray of all matches

Real-World Examples

E-commerce Product
{
  "selectors": {
    "title": "h1.product-name::text",
    "price": "[data-price]::attr(data-price)",
    "originalPrice": ".price-was::text",
    "images": ".gallery img::attr(src)::all",
    "description": ".description p::text",
    "rating": ".star-rating::attr(data-rating)",
    "reviewCount": ".review-count::text"
  }
}
Job Listing
{
  "selectors": {
    "title": "h1.job-title::text",
    "company": ".company-name a::text",
    "location": ".job-location::text",
    "salary": ".salary-range::text",
    "description": ".job-description::html",
    "requirements": ".requirements li::text::all",
    "postedDate": "time::attr(datetime)"
  }
}
News Article
{
  "selectors": {
    "headline": "article h1::text",
    "author": "[rel='author']::text",
    "publishDate": "article time::attr(datetime)",
    "content": "article .body p::text::all",
    "tags": ".tags a::text::all",
    "image": "article figure img::attr(src)"
  }
}

Tips for Better Selectors

Be Specific, Not Fragile

Use semantic classes over deeply nested selectors. .product-price is better than div > div > span:nth-child(2).

Use Data Attributes

Sites often store structured data in attributes like data-price or data-sku. These are more reliable than text content.

Test in DevTools First

Use document.querySelectorAll('your-selector') in the browser console to test selectors before using them with Scrpy.

Next Steps