Scrapy

Scrapy
Developer(s)	Zyte (formerly Scrapinghub)
Initial release	26 June 2008
Stable release	2.4.1 / 17 November 2020[1]
Repository	github.com/scrapy/scrapy;
Written in	Python
Operating system	Windows, macOS, Linux
Type	Web crawler
License	BSD License
Website	scrapy.org

Scrapy (/ˈskreɪpaɪ/ SKRAY-peye) is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler.[2] It is currently maintained by Zyte formerly Scrapinghub, a web-scraping development and services company.

Scrapy project architecture is built around "spiders", which are self-contained crawlers that are given a set of instructions. Following the spirit of other don't repeat yourself frameworks, such as Django,[3] it makes it easier to build and scale large crawling projects by allowing developers to reuse their code. Scrapy also provides a web-crawling shell, which can be used by developers to test their assumptions on a site’s behavior.[4]

Some well-known companies and products using Scrapy are: Lyst,[5][6] Parse.ly,[7] Sayone Technologies,[8] Sciences Po Medialab,[9] Data.gov.uk’s World Government Data site.[10]

History

Scrapy was born at London-based web-aggregation and e-commerce company Mydeco, where it was developed and maintained by employees of Mydeco and Insophia (a web-consulting company based in Montevideo, Uruguay). The first public release was in August 2008 under the BSD license, with a milestone 1.0 release happening in June 2015.[11] In 2011, Zyte (formerly Scrapinghub) became the new official maintainer.[12][13]

References

"Release notes — Scrapy documentation". doc.scrapy.org. Retrieved 18 November 2020.
Scrapy at a glance.
"Frequently Asked Questions". Retrieved 28 July 2015.
"Scrapy shell". Retrieved 28 July 2015.
Bell, Eddie; Heusser, Jonathan. "Scalable Scraping Using Machine Learning". Retrieved 28 July 2015.
Scrapy | Companies using Scrapy
Montalenti, Andrew. "Web Crawling & Metadata Extraction in Python".
"Scrapy Companies". Scrapy website.
Hyphe v0.0.0: the first release of our new webcrawler is out!
Ben Firshman [@bfirsh] (21 January 2010). "World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords bit.ly/5jU3La #opendata #datastore" (Tweet) – via Twitter.
Medina, Julia (19 June 2015). "Scrapy 1.0 official release out!". scrapy-users (Mailing list).
Pablo Hoffman (2013). List of the primary authors & contributors. Retrieved 18 November 2013.
Interview Scraping Hub.

External links

Official website

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] "Release notes — Scrapy documentation". doc.scrapy.org. Retrieved 18 November 2020.

[2] Scrapy at a glance.

[3] "Frequently Asked Questions". Retrieved 28 July 2015.

[4] "Scrapy shell". Retrieved 28 July 2015.

[5] Bell, Eddie; Heusser, Jonathan. "Scalable Scraping Using Machine Learning". Retrieved 28 July 2015.

[6] Scrapy | Companies using Scrapy

[7] Montalenti, Andrew. "Web Crawling & Metadata Extraction in Python".

[8] "Scrapy Companies". Scrapy website.

[9] Hyphe v0.0.0: the first release of our new webcrawler is out!

[10] Ben Firshman [@bfirsh] (21 January 2010). "World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords bit.ly/5jU3La #opendata #datastore" (Tweet) – via Twitter.

[11] Medina, Julia (19 June 2015). "Scrapy 1.0 official release out!". scrapy-users (Mailing list).

[list-12] Pablo Hoffman (2013). List of the primary authors & contributors. Retrieved 18 November 2013.

[13] Interview Scraping Hub.