souvenir
Tag cloud
Picture wall
Daily
RSS Feed
  • RSS Feed
  • Daily Feed
Filters

Links per page

  • 20 links
  • 50 links
  • 100 links

Filters

Untagged links
readability-lxml http://pypi.python.org/pypi/readability-lxml
11/07/2012 cluster icon
  • Scrapy : Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be...
  • NLTK : The Natural Language Toolkit (NLTK) is a Python package for natural language processing. It provides easy-to-use interfaces to over 50 corpora and lex...
  • HTML_QuickForm2 : This PHP package provides methods to create, validate and render HTML forms.
  • ScrapeGraphAI : ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML...
  • Bespoke.js : Bespoke.js is a super minimal, modular presentation library for modern browsers, designed to foster a rich plugin ecosystem.
thumbnail

In few words, Given a html document, it pulls out the main body text and cleans it up. It also can clean up title based on latest readability.js code.

python html readable library
1650 links
Shaarli - The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community - Theme by kalvn