There are no reviews yet. Be the first to send feedback to the community and the maintainers!
crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.fingerprint-suite
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.proxy-chain
Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.got-scraping
HTTP client made for scraping based on got.actor-page-analyzer
Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.apify-cli
Apify command-line interface helps you create, develop, build and run Apify actors, and manage the Apify cloud platform.apify-sdk-js
Apify SDK monorepoapify-sdk-python
The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.actor-scraper
House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.browser-pool
A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.fingerprint-generator
Generates realistic browser fingerprintsapify-actor-docker
Base Docker images for Apify actors.apify-client-js
Apify API client for JavaScript / Node.js.fingerprint-injector
Home of fingerprint injector.header-generator
NodeJs package for generating browser-like headers.covid-19
Open APIs with statistics about Covid-19apify-client-python
Apify API client for Pythonapify-docs
This project is the home of Apify's documentation.actor-templates
This project is the 🏠 home of Apify actor template projects to help users quickly get started.xlsx-stream
JavaScript / Node.js library to stream data into an XLSX fileapify-ts
Crawlee dev repogot-cjs
An action to release a CommonJS version of the popular library got, which is soon to be available only in an ESM format.actor-web-automation-agent
This is the experimental version of Web Automation Agent. The agent uses natural language instructions to browse the web and extract data.actor-content-checker
You can use this act to monitor any page's content and get a notification when content changes.super-scraper
Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!devtools-server
Runs a simple server that allows you to connect to Chrome DevTools running on dynamic hosts, not only localhost.actor-quick-start
Contains a boilerplate of an Apify actor to help you get started quickly build your own actors.apify-shared-js
Utilities and constants shared across Apify projects.better-sqlite3-with-prebuilds
Better SQLite prebuild & publish actionchat-with-a-website
A simple app that lets you chat with a given website.actor-scrapy-executor
Apify actor to run web spiders written in Python in the Scrapy libraryapify-zapier-integration
Apify integration for Zapieridcac
I Don't Care About Cookies extension compiled for use with Playwright/Puppeteerhomebrew-tap
A Homebrew tap for Apify toolsworkflows
Apify's reusable github workflowsactor-legacy-phantomjs-crawler
The actor implements the legacy Apify Crawler product. It uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of JavaScript code.act-crawler-results-to-s3
Apify actor to upload crawler results to AWS S3.actor-example-python
Example Apify Actor written in Pythonbrowser-headers-generator
Package generating randomized browser-like headers.input-schema-editor-react
Apify input schema editor written in React.jscrawlee-parallel-scraping-example
An example repository showcasing how you can scrape in parallel using one request queueact-crawl-url-list
Apify actor to crawl a list of URLsactor-imagediff
Returns an image containing difference of two given images.apify-web-covid-19
A list of public COVID-19 APIs to be rendered on https://apify.com/covid-19actor-example-proxy-intercept-request
Example: Intercept requests from https connection using "Man in the middle" proxy solution.apify-storage-local-js
Local emulation of the apify-client NPM package, which enables local use of Apify SDK.actor-vector-database-integrations
Transfer data from Apify Actors to vector databases (Chroma, Milvus, Pinecone, PostgreSQL (PG-Vector), Qdrant, and Weaviate)aidevworld2023
How to get clean web data for chatbots and LLMs slides and supporting materials.actor-example-php
Example of Apify actor using PHPapify-php-tutorial
apify-eslint-config
Apify ESLint preset to be shared between projectshttp-request
A HTTP request library for Node.js, with a common-sense API, support for Brotli compression and without bugs in "request" NPM packageslack-messages-action
It wraps up messages sending from Apify GitHub workflows into Slack.scraping-tools-js
A library of utility functions that make scraping, data extraction and usage of headless browsers easier and faster.actor-beautifulsoup-scraper
apify-tsconfig
TypeScript configuration shared across projects in Apify.generative-bayesian-network
playwright-test-actor
Source code for the Playwright Test public actor.apify-sdk-v2
Snapshot of Apify SDK v2 + sdk.apify.com website. This project is no longer maintained. See the https://github.com/apify/apify-sdk-js repo instead!actor-algolia-website-indexer
Apify actor that crawls website and indexes selected web pages to Algolia index. It's used to power the search on https://help.apify.comapify-eslint-config-ts
Typescript ESLint configuration shared across projects in Apify.actor-proxy-test
appmixer-components
Home of all the future Appmixer components on the Apify platform.actor-example-secret-input
Example actor showcasing the secret input fieldsactor-scrapy-books-example
Example of Python Scrapy project. It scrapes book data from https://books.toscrape.com/.komparz
Special, yet insignificant actorsactor-crawler-cheerio
DEPRECATED: An actor that crawls websites and parses HTML pages using Cheerio library. Supports recursive crawling as well as URL lists.actor-crawler-puppeteer
DEPRECATED: An Apify actor that enables crawling of websites using headless Chrome and Puppeteer. The actor is highly customizable and supports recursive crawling of websites as well as lists of URLs.actor-monorepo-example
An example repository with multiple Apify Actors sharing code between each other.apify-haystack
The official integration for Apify and Haystack 2.0openapi
An OpenAPI specification for the Apify API.scrapy-migrator
A standalone POC script for wrapping Scrapy projects with Apify middleware.Love Open Source and this site? Check out how you can help us