HTML Purifier : HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS)...
cliff : Command Line Interface Formulation Framework. cliff is a framework for building command line programs. It uses plugins to define sub-commands, output ...
pyo : pyo is a Python module written in C to help digital signal processing script creation.
pydantic : Data validation and settings management using python type annotations.
ScrapeGraphAI : ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML...
In few words, Given a html document, it pulls out the main body text and cleans it up. It also can clean up title based on latest readability.js code.