Search: [scraper]

ScrapeGraphAI https://github.com/ScrapeGraphAI/Scrapegraph-ai

10/12/2024

ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.).

Colly http://go-colly.org/

24/05/2021

Colly is a Go framework that provides a clean interface to write any kind of crawler/scraper/spider

With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

Scrapoxy https://scrapoxy.io/

23/06/2020

Scrapoxy hides your webscraper behind a cloud.

It starts a pool of proxies to relay your requests.

Now, you can crawl without thinking about blacklisting!

Cool QL Cool https://coolql.cool/

30/12/2018

CoolQLCool (CQC) is an open source GraphQL server that allows you to turn websites into GraphQL APIs.

Portia https://github.com/scrapinghub/portia

21/02/2016

Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web page to identify the data you wish to extract, and Portia will understand based on these annotations how to scrape data from similar pages.

Upton https://github.com/propublica/upton

04/02/2014

Upton is a framework for easy web-scraping with a useful debug mode that doesn't hammer your target's servers. It does the repetitive parts of writing scrapers, so you only have to write the unique parts for each site.

Scrapy http://scrapy.org

27/09/2013

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Goutte https://github.com/fabpot/Goutte

23/11/2011

Goutte is a screen scraping and web crawling library for PHP.

Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses.

Links per page

Filters