ScrapeGraphAI : ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML...
readability-lxml : In few words, Given a html document, it pulls out the main body text and cleans it up. It also can clean up title based on latest readability.js code.
Cliquet : Cliquet is a toolkit to ease the implementation of HTTP microservices, such as data-driven REST APIs.
Goutte : Goutte is a screen scraping and web crawling library for PHP.
Goutte provides a nice API to crawl websites and extract data from the HTML/XML response...
Essentia : Open-source library and tools for audio and music analysis, description and synthesis
Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.