Jump to content

Scrapy

From Wikipedia, the free encyclopedia
Scrapy
Developer(s)Zyte (formerly Scrapinghub)
Initial release26 June 2008 (2008-06-26)
Stable release
2.12.0[1] Edit this on Wikidata / 18 November 2024; 18 days ago (18 November 2024)
Repository
Written inPython
Operating systemWindows, macOS, Linux
TypeWeb crawler
LicenseBSD License
Websitescrapy.org Edit this on Wikidata

Scrapy (/ˈskrp/[2] SKRAY-peye) is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler.[3] It is currently maintained by Zyte (formerly Scrapinghub), a web-scraping development and services company.

Scrapy project architecture is built around "spiders", which are self-contained crawlers that are given a set of instructions. Following the spirit of other don't repeat yourself frameworks, such as Django,[4] it makes it easier to build and scale large crawling projects by allowing developers to reuse their code.

Some well-known companies and products using Scrapy are: Lyst,[5][6] Parse.ly,[7] Sayone Technologies,[8] Sciences Po Medialab,[9] Data.gov.uk’s World Government Data site.[10]

History

[edit]

Scrapy was born at London-based web-aggregation and e-commerce company Mydeco, where it was developed and maintained by employees of Mydeco and Insophia (a web-consulting company based in Montevideo, Uruguay). The first public release was in August 2008 under the BSD license, with a milestone 1.0 release happening in June 2015.[11] In 2011, Zyte (formerly Scrapinghub) became the new official maintainer.[12][13]

References

[edit]
  1. ^ "Release 2.12.0". 18 November 2024. Retrieved 29 November 2024.
  2. ^ "Commit 975f150". GitHub. Archived from the original on 2021-10-18. Retrieved 2021-10-18.
  3. ^ Scrapy at a glance Archived 2018-09-17 at the Wayback Machine.
  4. ^ "Frequently Asked Questions". Frequently Asked Questions, Scrapy 2.8.0 documentation. Archived from the original on 11 November 2020. Retrieved 28 July 2015.
  5. ^ Bell, Eddie; Heusser, Jonathan. "Scalable Scraping Using Machine Learning". Archived from the original on 4 June 2016. Retrieved 28 July 2015.
  6. ^ "Scrapy | Companies using Scrapy". Archived from the original on 2020-11-12. Retrieved 2015-07-28.
  7. ^ Montalenti, Andrew (October 27, 2012). "Web Crawling & Metadata Extraction in Python". Web Crawling & Metadata Extraction in Python - Speaker Deck. Archived from the original on September 19, 2020. Retrieved May 11, 2015.
  8. ^ "Scrapy Companies". Scrapy | Companies using Scrapy. Archived from the original on 2020-11-12. Retrieved 2017-11-09.
  9. ^ "Hyphe v0.0.0: the first release of our new webcrawler is out!". 17 November 2013. Archived from the original on 2016-06-13. Retrieved 2015-07-28.
  10. ^ Ben Firshman [@bfirsh] (21 January 2010). "World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords http://bit.ly/5jU3La #opendata #datastore" (Tweet) – via Twitter.
  11. ^ Medina, Julia (19 June 2015). "Scrapy 1.0 official release out!". scrapy-users (Mailing list). Archived from the original on 25 January 2010. Retrieved 28 July 2015.
  12. ^ Hoffman, Pablo (2013). List of the primary authors & contributors. Archived from the original on 29 May 2017. Retrieved 18 November 2013.
  13. ^ Interview Scraping Hub Archived 2020-10-29 at the Wayback Machine.