Creating Custom Extractors¶
This guide explains how to extend SpotifyScraper with custom extractors for specialized data extraction needs.
Understanding the Extractor Architecture¶
SpotifyScraper uses a modular architecture that makes it easy to add custom extractors:
SpotifyClient
├── Browser (handles HTTP requests)
├── Parser (extracts JSON data)
└── Extractors (transform raw data)
├── TrackExtractor
├── AlbumExtractor
├── ArtistExtractor
└── PlaylistExtractor
Base Extractor Pattern¶
All extractors follow a common pattern:
from typing import Dict, Any, Optional
from spotify_scraper.extractors.base import BaseExtractor
class CustomExtractor(BaseExtractor):
"""Extract custom data from Spotify pages"""
def __init__(self, parser):
self.parser = parser
def extract(self, html: str, url: str) -> Dict[str, Any]:
"""Extract data from HTML content"""
# Parse JSON data from HTML
json_data = self.parser.extract_json(html)
# Transform data to desired format
result = self._transform_data(json_data, url)
return result
def _transform_data(self, data: Dict[str, Any], url: str) -> Dict[str, Any]:
"""Transform raw data to desired format"""
raise NotImplementedError