HTML Purifier : HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS)...
NLTK : The Natural Language Toolkit (NLTK) is a Python package for natural language processing.
It provides easy-to-use interfaces to over 50 corpora and lex...
Cliquet : Cliquet is a toolkit to ease the implementation of HTTP microservices, such as data-driven REST APIs.
ScrapeGraphAI : ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML...
impress.js : It's a presentation framework based on the power of CSS3 transforms and transitions in modern browsers and inspired by the idea behind prezi.com
In few words, Given a html document, it pulls out the main body text and cleans it up. It also can clean up title based on latest readability.js code.