Advanced

Data Transformations

Clean, format, and transform scraped data automatically. No post-processing required.

Built-in Transformations

Apply transformations using the pipe syntax in selectors:

javascript
{
  "url": "https://store.com/product",
  "selectors": {
    "price": ".price | parseNumber",
    "title": "h1 | trim | uppercase",
    "date": ".posted-date | parseDate",
    "description": ".desc | trim | truncate:200"
  }
}

Available Transformations

String Transformations

TransformInputOutput
trim" hello ""hello"
uppercase"Hello""HELLO"
lowercase"Hello""hello"
truncate:N"Long text...""Long..."
replace:old:new"foo bar""baz bar"
stripHtml"<b>Hi</b>""Hi"

Number Transformations

TransformInputOutput
parseNumber"$1,299.99"1299.99
parseInt"42 items"42
round:N3.141593.14
abs-4242

Date Transformations

TransformInputOutput
parseDate"Jan 15, 2024""2024-01-15T00:00:00Z"
formatDate:YYYY-MM-DD"2024-01-15T...""2024-01-15"
relativeDate"2 days ago""2024-01-13T..."

Chaining Transformations

Chain multiple transformations with pipes:

javascript
{
  "selectors": {
    // Extract price, parse to number, round to 2 decimals
    "price": ".price | parseNumber | round:2",
    
    // Clean up title: trim whitespace, limit length
    "title": "h1 | trim | truncate:100",
    
    // Parse date and format consistently
    "date": ".date | parseDate | formatDate:YYYY-MM-DD"
  }
}

Regex Extraction

Extract specific patterns with regex:

javascript
{
  "selectors": {
    // Extract SKU from text like "Product SKU: ABC-123"
    "sku": ".product-info | regex:SKU:\s*(\w+-\d+)",
    
    // Extract all numbers
    "numbers": ".stats | regexAll:\d+",
    
    // Extract email addresses
    "email": ".contact | regex:[\w.-]+@[\w.-]+\.\w+"
  }
}

Conditional Transforms

javascript
{
  "selectors": {
    // Default value if empty
    "stock": ".stock | default:Out of Stock",
    
    // Boolean conversion
    "inStock": ".stock | contains:In Stock",
    
    // Conditional mapping
    "status": ".status | map:active:true,inactive:false"
  }
}

Related