Extractors Module¶
The extractors module contains specialized classes for extracting data from different Spotify entities.
Overview¶
Extractors transform raw JSON data from Spotify pages into structured, consistent data formats.
from spotify_scraper.extractors import (
TrackExtractor,
AlbumExtractor,
ArtistExtractor,
PlaylistExtractor
)
TrackExtractor¶
Extracts track information from Spotify HTML pages.
Methods¶
extract¶
extract(html: str, url: str) -> Dict[str, Any]
Extract track data from HTML content.
Parameters:
- html (str): Raw HTML content from Spotify
- url (str): URL of the track page
Returns: - Dict containing track metadata
Example:
from spotify_scraper.extractors import TrackExtractor
from spotify_scraper.parsers import JSONParser
parser = JSONParser()
extractor = TrackExtractor(parser)
# Assuming you have HTML content
track_data = extractor.extract(html_content, track_url)
```### Data Structure
```python
{
"id": "4iV5W9uYEdYUVa79Axb7Rh",
"name": "Hotel California",
"duration_ms": 391376,
"explicit": False,
"artists": [
{
"id": "0OdUWJ0sBjDrqHygGUXeCF",
"name": "Eagles",
"type": "artist"
}
],
"album": {
"id": "4aawyAB9vmqN3uQ7FjRGTy",
"name": "Hotel California",
"images": [
{
"url": "https://i.scdn.co/image/...",
"width": 640,
"height": 640
}
]
},
"track_number": 1,
"disc_number": 1,
"popularity": 89,
"preview_url": "https://p.scdn.co/mp3-preview/...",
"external_urls": {
"spotify": "https://open.spotify.com/track/4iV5W9uYEdYUVa79Axb7Rh"
},
"uri": "spotify:track:4iV5W9uYEdYUVa79Axb7Rh"
}
AlbumExtractor¶
Extracts album information including track listings.
Methods¶
extract¶
extract(html: str, url: str) -> Dict[str, Any]
Extract album data from HTML content.Parameters:
- html (str): Raw HTML content from Spotify
- url (str): URL of the album page
Returns: - Dict containing album metadata and tracks
Data Structure¶
{
"id": "4aawyAB9vmqN3uQ7FjRGTy",
"name": "Hotel California",
"album_type": "album",
"release_date": "1976-12-08",
"total_tracks": 9,
"artists": [
{
"id": "0OdUWJ0sBjDrqHygGUXeCF",
"name": "Eagles"
}
],
"tracks": [
{
"id": "4iV5W9uYEdYUVa79Axb7Rh",
"name": "Hotel California",
"duration_ms": 391376,
"track_number": 1
}
// ... more tracks
],
"images": [...],
"copyrights": [...],
"label": "Asylum Records"
}
ArtistExtractor¶
Extracts artist profile information.
Methods¶
extract¶
extract(html: str, url: str) -> Dict[str, Any]
Extract artist data from HTML content.Parameters:
- html (str): Raw HTML content from Spotify
- url (str): URL of the artist page
Returns: - Dict containing artist metadata
Data Structure¶
{
"id": "0OdUWJ0sBjDrqHygGUXeCF",
"name": "Eagles",
"genres": ["classic rock", "rock", "soft rock"],
"popularity": 83,
"followers": {
"total": 28543211
},
"images": [...],
"type": "artist",
"uri": "spotify:artist:0OdUWJ0sBjDrqHygGUXeCF"
}
PlaylistExtractor¶
Extracts playlist information including all tracks.
Methods¶
extract¶
extract(html: str, url: str) -> Dict[str, Any]
Extract playlist data from HTML content.
Parameters:
- html (str): Raw HTML content from Spotify
- url (str): URL of the playlist page
Returns: - Dict containing playlist metadata and tracks
See Also¶
- Client - Main client interface
- Parser - JSON parsing utilities
- Custom Extractors - Creating custom extractors