Advanced
Data Transformations
Clean, format, and transform scraped data automatically. No post-processing required.
Built-in Transformations
Apply transformations using the pipe syntax in selectors:
javascript
{
"url": "https://store.com/product",
"selectors": {
"price": ".price | parseNumber",
"title": "h1 | trim | uppercase",
"date": ".posted-date | parseDate",
"description": ".desc | trim | truncate:200"
}
}Available Transformations
String Transformations
| Transform | Input | Output |
|---|---|---|
trim | " hello " | "hello" |
uppercase | "Hello" | "HELLO" |
lowercase | "Hello" | "hello" |
truncate:N | "Long text..." | "Long..." |
replace:old:new | "foo bar" | "baz bar" |
stripHtml | "<b>Hi</b>" | "Hi" |
Number Transformations
| Transform | Input | Output |
|---|---|---|
parseNumber | "$1,299.99" | 1299.99 |
parseInt | "42 items" | 42 |
round:N | 3.14159 | 3.14 |
abs | -42 | 42 |
Date Transformations
| Transform | Input | Output |
|---|---|---|
parseDate | "Jan 15, 2024" | "2024-01-15T00:00:00Z" |
formatDate:YYYY-MM-DD | "2024-01-15T..." | "2024-01-15" |
relativeDate | "2 days ago" | "2024-01-13T..." |
Chaining Transformations
Chain multiple transformations with pipes:
javascript
{
"selectors": {
// Extract price, parse to number, round to 2 decimals
"price": ".price | parseNumber | round:2",
// Clean up title: trim whitespace, limit length
"title": "h1 | trim | truncate:100",
// Parse date and format consistently
"date": ".date | parseDate | formatDate:YYYY-MM-DD"
}
}Regex Extraction
Extract specific patterns with regex:
javascript
{
"selectors": {
// Extract SKU from text like "Product SKU: ABC-123"
"sku": ".product-info | regex:SKU:\s*(\w+-\d+)",
// Extract all numbers
"numbers": ".stats | regexAll:\d+",
// Extract email addresses
"email": ".contact | regex:[\w.-]+@[\w.-]+\.\w+"
}
}Conditional Transforms
javascript
{
"selectors": {
// Default value if empty
"stock": ".stock | default:Out of Stock",
// Boolean conversion
"inStock": ".stock | contains:In Stock",
// Conditional mapping
"status": ".status | map:active:true,inactive:false"
}
}