Contributing
Guide for contributing to socials development.
Development Setup
Clone the repository and install dev dependencies:
git clone https://github.com/lorey/socials.git
cd socials
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
Or with uv:
uv sync --extra dev
Running Tests
pytest
pytest --cov=socials --cov-report=html
Linting and Formatting
We use ruff for linting and formatting:
ruff check .
ruff check --fix .
ruff format .
Type Checking
mypy socials
Adding a New Platform
- Create URL classes in
socials/platforms/yourplatform.py:
from typing import ClassVar, Literal
from pydantic import BaseModel
class YourPlatformProfileURL(BaseModel, frozen=True):
url: str
platform: Literal["yourplatform"] = "yourplatform"
entity_type: Literal["profile"] = "profile"
username: str
def __hash__(self) -> int:
return hash(self.url)
def get_parent(self) -> None:
return None
def get_root(self) -> YourPlatformProfileURL:
return self
def get_ancestors(self) -> list:
return []
- Create the parser:
import re
PROFILE_REGEX = re.compile(
r"^https?://(?:www\.)?yourplatform\.com/(?P<username>[A-Za-z0-9_]+)/?$"
)
class YourPlatformParser:
platform = "yourplatform"
schemes: ClassVar[set[str]] = {"http", "https"}
def handles_hostname(self, hostname: str) -> bool:
return hostname in {"yourplatform.com", "www.yourplatform.com"}
def parse(self, url: str) -> YourPlatformProfileURL | None:
if match := PROFILE_REGEX.match(url):
return YourPlatformProfileURL(url=url, **match.groupdict())
return None
- Register the parser in
socials/platforms/__init__.py:
from socials.platforms.yourplatform import YourPlatformParser
DEFAULT_PARSERS: dict[str, PlatformParser] = {
# ... existing parsers ...
"yourplatform": YourPlatformParser(),
}
- Add tests in
tests/test_yourplatform.py:
import socials
def test_yourplatform_profile():
url = socials.parse("https://yourplatform.com/username")
assert url is not None
assert url.platform == "yourplatform"
assert url.entity_type == "profile"
assert url.username == "username"
- Run tests and linting to verify everything works.
Regex Conventions
- Use
^and$anchors for exact matching - Use non-capturing groups
(?:...)for optional parts - Use named capture groups
(?P<name>...)for extracted values - Character classes should use
[A-Za-z0-9_-](avoiding chars between Z and a) - Optional trailing slash:
/?$ - Optional https:
https?:// - Optional www:
(?:www\.)?
Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Typed classes vs generic dict | Typed classes | Type safety, IDE support, clearer API |
| Protocol vs ABC | Protocol | Loose coupling, custom platforms don't inherit from core |
| URL evolution | Lenient parsing | URLs change, typed class represents entity not format |
| Error handling | Return None by default |
Lenient parsing suits scraping use cases |
| Immutability | frozen=True Pydantic models |
Hashable, safe to use in sets/dicts |
| Backwards compat | Keep Extraction wrapper |
0.x users can migrate gradually |
Pull Request Guidelines
- Fork the repository
- Create a feature branch
- Write tests for new functionality
- Ensure all tests pass and linting is clean
- Submit a pull request with a clear description
Releasing
Releases are managed by maintainers:
git tag v1.2.3
git push --tags
The CI/CD pipeline handles PyPI publishing automatically on tagged commits.