readability-lxml : In few words, Given a html document, it pulls out the main body text and cleans it up. It also can clean up title based on latest readability.js code.
uv : An extremely fast Python package and project manager, written in Rust.
A single tool to replace pip, pip-tools, pipx, poetry, pyenv, twine, virtualenv...
NLTK : The Natural Language Toolkit (NLTK) is a Python package for natural language processing.
It provides easy-to-use interfaces to over 50 corpora and lex...
docopt : Command-line interface description language. docopt helps you define interface for your command-line app, and automatically generate parser for it.
SpaCy.io : spaCy is a library for industrial-strength natural language processing in Python and Cython. It features state-of-the-art speed and accuracy, a concis...
sh (previously pbs) is a full-fledged subprocess interface for Python that allows you to call any program as if it were a function