Extraction
The Extractor class and Extraction result object are the main interfaces for parsing URLs.
Quick Start
import socials
url = socials.parse("https://github.com/lorey/socials")
print(url.platform)
# "github"
print(url.entity_type)
# "repo"
print(url.owner)
# "lorey"
urls = [
"https://github.com/lorey",
"https://twitter.com/karllorey",
"https://example.com",
]
extraction = socials.parse_all(urls)
print(extraction.all())
# [GitHubProfileURL(...), TwitterProfileURL(...)]
The Extractor Class
Extractor is the engine that parses URLs. The module-level socials.parse() and socials.parse_all() functions use a default Extractor internally.
Creating an Extractor
from socials import Extractor
# Default: all platforms
extractor = Extractor()
# Only specific platforms
extractor = Extractor(platforms=["github", "twitter"])
# Strict mode: raise error for unrecognized URLs
extractor = Extractor(strict=True)
Extractor Methods
| Method | Returns | Description |
|---|---|---|
parse(url) |
SocialsURL \| None |
Parse single URL |
extract(urls) |
Extraction |
Parse multiple URLs |
Strict Mode
By default, unrecognized URLs return None. With strict=True, they raise ParseError:
from socials import Extractor
from socials.protocols import ParseError
ext = Extractor()
print(ext.parse("https://example.com"))
# None
ext = Extractor(strict=True)
try:
ext.parse("https://example.com")
except ParseError:
print("Parse error raised")
# Parse error raised
Platform Filtering
Limit which platforms are recognized:
from socials import Extractor
ext = Extractor(platforms=["github", "twitter"])
print(ext.parse("https://linkedin.com/in/karllorey"))
# None
print(ext.parse("https://github.com/lorey"))
# GitHubProfileURL(...)
Available platforms: github, twitter, linkedin, facebook, instagram, youtube, email, phone
The Extraction Class
Extraction is a container for parsed results with helper methods for grouping and filtering.
Methods
| Method | Returns | Description |
|---|---|---|
all() |
list[SocialsURL] |
All parsed URL objects |
by_platform() |
dict[str, list[SocialsURL]] |
Group by platform |
by_type() |
dict[str, list[SocialsURL]] |
Group by entity type |
Grouping Results
import socials
urls = [
"https://github.com/lorey",
"https://github.com/lorey/socials",
"https://twitter.com/karllorey",
]
extraction = socials.parse_all(urls)
print(extraction.by_platform())
# {"github": [GitHubProfileURL(...), GitHubRepoURL(...)], "twitter": [...]}
print(extraction.by_type())
# {"profile": [GitHubProfileURL(...), TwitterProfileURL(...)], "repo": [...]}
Filtering
import socials
urls = ["https://github.com/lorey", "https://twitter.com/karllorey"]
extraction = socials.parse_all(urls)
github_urls = extraction.by_platform().get("github", [])
print(len(github_urls))
# 1
profiles = extraction.by_type().get("profile", [])
print(len(profiles))
# 2
Module-Level Functions
For convenience, socials provides module-level functions that use a default Extractor:
import socials
socials.parse("https://github.com/lorey")
# Parse single URL
socials.parse_all(["https://github.com/lorey"])
# Parse multiple URLs
Legacy API
For migration from 0.x, see Migration Guide.