• Stars
    star
    119
  • Rank 296,472 (Top 6 %)
  • Language
    TypeScript
  • Created over 6 years ago
  • Updated 29 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Apify command-line interface helps you create, develop, build and run Apify actors, and manage the Apify cloud platform.

Apify command-line interface (Apify CLI)

npm version Build Status

Apify command-line interface (Apify CLI) helps you create, develop, build and run Apify actors, and manage the Apify cloud platform from any computer.

Apify actors are cloud programs that can perform arbitrary web scraping, automation or data processing job. They accept input, perform their job and generate output. While you can develop actors in an online IDE directly in the Apify web application, for complex projects it is more convenient to develop actors locally on your computer using Apify SDK and only push the actors to the Apify cloud during deployment. This is where the Apify CLI comes in.

Note that actors running on the Apify platform are executed in Docker containers, so with an appropriate Dockerfile you can build your actors in any programming language. However, we recommend using JavaScript / Node.js, for which we provide most libraries and support.

Installation

Via Homebrew

On macOS (or Linux), you can install the Apify CLI via the Homebrew package manager.

brew install apify-cli

Via NPM

First, make sure you have Node.js version 16 or higher with NPM installed on your computer:

node --version
npm --version

Install or upgrade Apify CLI by running:

npm -g install apify-cli

If you receive an EACCES error, you might need to run the command as root:

sudo npm -g install apify-cli

Alternativaly, you can use Node Version Manager (nvm) and install Apify CLI only into a selected user-level Node version without requiring root privileges:

nvm install 16
nvm use 16
npm -g install apify-cli

Finally, verify that Apify CLI was installed correctly by running:

apify --version

which should print something like:

apify-cli/0.10.0 darwin-x64 node-v16.14.2

You can also skip the manual global installation altogether and use npx apify-cli with all the following commands instead.

Basic usage

The following examples demonstrate the basic usage of Apify CLI.

Create a new actor from scratch

apify create my-hello-world

First, you will be prompted to select a template with the boilerplate for the actor, to help you get started quickly. The command will create a directory called my-hello-world that contains a Node.js project for the actor and a few configuration files.

If you decided to skip the installation and go with npx, the command will be npx apify-cli create my-hello-world.

Create a new actor from existing project

cd ./my/awesome/project
apify init

This command will only set up local actor development environment in an existing directory, i.e. it will create the .actor/actor.json file and apify_storage directory.

Before you can run your project locally using apify run, you have to set up the right start command in package.json under scripts.start. For example:

{
    ...
    "scripts": {
        "start": "node your_main_file.js",
    },
    ...
}

You can find more information about by running apify help run.

Create a new Actor from Scrapy project

If you want to run a Scrapy project on Apify platform, follow the Scrapy integration guide here.

Run the actor locally

cd my-hello-world
apify run

This command runs the actor on your local machine. Now's your chance to develop the logic - or magic 😏

Login with your Apify account

apify login

Before you can interact with the Apify cloud, you need to create an Apify account and log in to it using the above command. You will be prompted for your Apify API token. Note that the command will store the API token and other sensitive information to ~/.apify.

Push the actor to the Apify cloud

apify push

This command uploads your project to the Apify cloud and builds an actor from it. On the platform, actor needs to be built before it can be run.

Run an actor on the Apify cloud

apify call

Runs the actor corresponding to the current directory on the Apify platform.

This command can also be used to run other actors, for example:

apify call apify/hello-world

So what's in this .actor/actor.json file?

This file associates your local development project with an actor on the Apify platform. It contains information such as actor name, version, build tag and environment variables. Make sure you commit this file to the Git repository.

For example, .actor/actor.json file can look as follows:

{
  "actorSpecification": 1,
  "name": "name-of-my-scraper",
  "version": "0.0",
  "buildTag": "latest",
  "environmentVariables": {
      "MYSQL_USER": "my_username",
      "MYSQL_PASSWORD": "@mySecretPassword"
  },
  "dockerfile": "./Dockerfile",
  "readme": "./ACTOR.md",
  "input": "./input_schema.json",
  "storages": {
    "dataset": "./dataset_schema.json"
  }
}

Dockerfile field
If you specify the path to your Docker file under the dockerfile field, this file will be used for actor builds on the platform. If not specified, the system will look for Docker files at .actor/Dockerfile and Dockerfile in this order of preference.

Readme field
If you specify the path to your readme file under the readme field, the readme at this path will be used on the platform. If not specified, readme at .actor/README.md and README.md will be used in this order of preference.

Input field
You can embed your input schema object directly in actor.json under input field. Alternatively, you can provide a path to a custom input schema. If not provided, the input schema at .actor/INPUT_SCHEMA.json and INPUT_SCHEMA.json is used in this order of preference.

Storages.dataset field
You can define the schema of the items in your dataset under the storages.dataset field. This can be either an embedded object or a path to a JSON schema file. You can read more about the schema of your actor output here.

Note on migration from deprecated config "apify.json"
Note that previously, actor config was stored in the apify.json file that has been deprecated. You can find the (very slight) differences and migration info in migration guidelines.

Environment variables

There are two options how you can set up environment variables for actors.

Set up environment variables in .actor/actor.json

All keys from env will be set as environment variables into Apify platform after you push actor to Apify. Current values on Apify will be overridden.

{
    "actorSpecification": 1,
    "name": "dataset-to-mysql",
    "version": "0.1",
    "buildTag": "latest",
    "environmentVariables": {
      "MYSQL_USER": "my_username",
      "MYSQL_PASSWORD": "@mySecretPassword"
    }
}

Set up environment variables in Apify Console

In Apify Console select your actor, you can set up variables into Source tab. After setting up variables in the app, remove the environmentVariables from .actor/actor.json. Otherwise, variables from .actor/actor.json will override variables in the app.

{
    "actorSpecification": 1,
    "name": "dataset-to-mysql",
    "version": "0.1",
    "buildTag": "latest"
}

How to set secret environment variables in .actor/actor.json

CLI provides commands to manage secrets environment variables. Secrets are stored to the ~/.apify directory. You can add a new secret using the command:

apify secrets:add mySecretPassword pwd1234

After adding a new secret you can use the secret in .actor/actor.json.

{
    "actorSpecification": 1,
    "name": "dataset-to-mysql",
    ...
    "environmentVariables": {
      "MYSQL_PASSWORD": "@mySecretPassword"
    },
    ...
}

Need help?

To see all CLI commands simply run:

apify help

To get information about a specific command run:

apify help COMMAND

Still haven't found what you were looking for? Please go to Apify Help center or contact us.

Command reference

This section contains printouts of apify help for all commands.

apify actor

Commands are designed to be used in actor runs. All commands are in PoC state, do not use in production environments.

USAGE
  $ apify actor

See code: src/commands/actor/index.js

apify actor:get-input

Gets the actor input value from the default key-value store associated with the actor run.

USAGE
  $ apify actor:get-input

See code: src/commands/actor/get-input.js

apify actor:get-value KEY

Gets a value from the default key-value store associated with the actor run.

USAGE
  $ apify actor:get-value KEY

ARGUMENTS
  KEY  Key of the record in key-value store

See code: src/commands/actor/get-value.js

apify actor:push-data [ITEM]

Stores an object or an array of objects to the default dataset of the actor run.

USAGE
  $ apify actor:push-data [ITEM]

ARGUMENTS
  ITEM  JSON string with one object or array of objects containing data to be stored in the default dataset.

DESCRIPTION
  It is possible to pass data using item argument or stdin.
  Passing data using argument:
  $ apify actor:push-data {"foo": "bar"}
  Passing data using stdin with pipe:
  $ cat ./test.json | apify actor:push-data

See code: src/commands/actor/push-data.js

apify actor:set-value KEY [VALUE]

Sets or removes record into the default KeyValueStore associated with the actor run.

USAGE
  $ apify actor:set-value KEY [VALUE]

ARGUMENTS
  KEY    Key of the record in key-value store.

  VALUE  Record data, which can be one of the following values:
         - If empty, the record in the key-value store is deleted.
         - If no `contentType` flag is specified, value is expected to be any JSON string value.
         - If options.contentType is set, value is taken as is.

OPTIONS
  -c, --contentType=contentType  Specifies a custom MIME content type of the record. By default "application/json" is
                                 used.

DESCRIPTION
  It is possible to pass data using argument or stdin.
  Passing data using argument:
  $ apify actor:set-value KEY my-value
  Passing data using stdin with pipe:
  $ cat ./my-text-file.txt | apify actor:set-value KEY --contentType text/plain

See code: src/commands/actor/set-value.js

apify call [ACTID]

Runs a specific actor remotely on the Apify cloud platform.

USAGE
  $ apify call [ACTID]

ARGUMENTS
  ACTID  Name or ID of the actor to run (e.g. "apify/hello-world" or "E2jjCZBezvAZnX8Rb"). If not provided, the command
         runs the remote actor specified in the ".actor/actor.json" file.

OPTIONS
  -b, --build=build                      Tag or number of the build to run (e.g. "latest" or "1.2.34").
  -m, --memory=memory                    Amount of memory allocated for the actor run, in megabytes.
  -t, --timeout=timeout                  Timeout for the actor run in seconds. Zero value means there is no timeout.
  -w, --wait-for-finish=wait-for-finish  Seconds for waiting to run to finish, if no value passed, it waits forever.

DESCRIPTION
  The Actor is run under your current Apify account. Therefore you need to be logged in by calling "apify login". It
  takes input for the Actor from the default local key-value store by default.

See code: src/commands/call.js

apify create [ACTORNAME]

Creates a new actor project directory from a selected boilerplate template.

USAGE
  $ apify create [ACTORNAME]

ARGUMENTS
  ACTORNAME  Name of the actor and its directory

OPTIONS
  -t, --template=template    Template for the actor. If not provided, the command will prompt for it.
                             Visit
                             https://raw.githubusercontent.com/apify/actor-templates/master/templates/manifest.json to
                             find available template names.

  --skip-dependency-install  Skip installing actor dependencies.

See code: src/commands/create.js

apify info

Displays information about the currently active Apify account.

USAGE
  $ apify info

DESCRIPTION
  The information is printed to the console.

See code: src/commands/info.js

apify init [ACTORNAME]

Initializes a new actor project in an existing directory.

USAGE
  $ apify init [ACTORNAME]

ARGUMENTS
  ACTORNAME  Name of the actor. If not provided, you will be prompted for it.

DESCRIPTION
  The command only creates the ".actor/actor.json" file and the "storage" directory in the current directory, but will
  not touch anything else.

  WARNING: The directory at "storage" will be overwritten if it already exists.

See code: src/commands/init.js

apify login

Logs in to your Apify account using a provided API token.

USAGE
  $ apify login

OPTIONS
  -t, --token=token  [Optional] Apify API token

DESCRIPTION
  The API token and other account information is stored in the ~/.apify directory, from where it is read by all other
  "apify" commands. To log out, call "apify logout".

See code: src/commands/login.js

apify logout

Logs out of your Apify account.

USAGE
  $ apify logout

DESCRIPTION
  The command deletes the API token and all other account information stored in the ~/.apify directory. To log in again,
   call "apify login".

See code: src/commands/logout.js

apify pull [ACTORID]

Pulls an Actor from the Apify platform to the current directory. If it is defined as Git repository, it will be cloned. If it is defined as Web IDE, it will fetch the files.

USAGE
  $ apify pull [ACTORID]

ARGUMENTS
  ACTORID  Name or ID of the actor to run (e.g. "apify/hello-world" or "E2jjCZBezvAZnX8Rb"). If not provided, the
           command will update the Actor in the current directory based on its name in ".actor/actor.json" file.

OPTIONS
  -v, --version=version  Actor version number which will be pulled, e.g. 1.2. Default: the highest version

See code: src/commands/pull.js

apify push [ACTORID]

Uploads the actor to the Apify platform and builds it there.

USAGE
  $ apify push [ACTORID]

ARGUMENTS
  ACTORID  Name or ID of the Actor to push (e.g. "apify/hello-world" or "E2jjCZBezvAZnX8Rb"). If not provided, the
           command will create or modify the actor with the name specified in ".actor/actor.json" file.

OPTIONS
  -b, --build-tag=build-tag              Build tag to be applied to the successful Actor build. By default, it is taken
                                         from the ".actor/actor.json" file

  -v, --version=version                  Actor version number to which the files should be pushed. By default, it is
                                         taken from the ".actor/actor.json" file.

  -w, --wait-for-finish=wait-for-finish  Seconds for waiting to build to finish, if no value passed, it waits forever.

  --no-prompt                            Do not prompt for opening the actor details in a browser. This will also not
                                         open the browser automatically.

  --version-number=version-number        DEPRECATED: Use flag version instead. Actor version number to which the files
                                         should be pushed. By default, it is taken from the ".actor/actor.json" file.

DESCRIPTION
  The Actor settings are read from the ".actor/actor.json" file in the current directory, but they can be overridden
  using command-line options.
  NOTE: If the source files are smaller than 3 MB then they are uploaded as
  "Multiple source files", otherwise they are uploaded as "Zip file".

  WARNING: If the target Actor already exists in your Apify account, it will be overwritten!

See code: src/commands/push.js

apify run

Runs the actor locally in the current directory.

USAGE
  $ apify run

OPTIONS
  -p, --purge              Shortcut that combines the --purge-queue, --purge-dataset and --purge-key-value-store
                           options.

  --purge-dataset          Deletes the local directory containing the default dataset before the run starts.

  --purge-key-value-store  Deletes all records from the default key-value store in the local directory before the run
                           starts, except for the "INPUT" key.

  --purge-queue            Deletes the local directory containing the default request queue before the run starts.

DESCRIPTION
  It sets various APIFY_XYZ environment variables in order to provide a working execution environment for the actor. For
   example, this causes the actor input, as well as all other data in key-value stores, datasets or request queues to be
   stored in the "storage" directory, rather than on the Apify platform.

  NOTE: You can override the command's default behavior for Node.js actors by overriding the "start" script in the
  package.json file. You can set up your own main file or environment variables by changing it.

See code: src/commands/run.js

apify secrets

Manages secret values for actor environment variables.

USAGE
  $ apify secrets

DESCRIPTION
  Example:
  $ apify secrets:add mySecret TopSecretValue123

  Now the "mySecret" value can be used in an environment variable defined in ".actor/actor.json" file by adding the "@"
  prefix:

  {
    "actorSpecification": 1,
    "name": "my_actor",
    "environmentVariables": { "SECRET_ENV_VAR": "@mySecret" },
    "version": "0.1
  }

  When the actor is pushed to Apify cloud, the "SECRET_ENV_VAR" and its value is stored as a secret environment variable
   of the actor.

See code: src/commands/secrets/index.js

apify secrets:add NAME VALUE

Adds a new secret value.

USAGE
  $ apify secrets:add NAME VALUE

ARGUMENTS
  NAME   Name of the secret
  VALUE  Value of the secret

DESCRIPTION
  The secrets are stored to a file at ~/.apify

See code: src/commands/secrets/add.js

apify secrets:rm NAME

Removes the secret.

USAGE
  $ apify secrets:rm NAME

ARGUMENTS
  NAME  Name of the secret

See code: src/commands/secrets/rm.js

apify vis [PATH]

Validates input schema and prints errors found.

USAGE
  $ apify vis [PATH]

ARGUMENTS
  PATH  Optional path to your INPUT_SCHEMA.json file. If not provided ./INPUT_SCHEMA.json is used.

DESCRIPTION
  The input schema for the actor is used from these locations in order of preference.
  The first one found is validated as it would be the one used on the Apify platform.
  1. Directly embedded object in ".actor/actor.json" under 'input' key
  2. Path to JSON file referenced in ".actor/actor.json" under 'input' key
  3. JSON file at .actor/INPUT_SCHEMA.json
  4. JSON file at INPUT_SCHEMA.json

  You can also pass any custom path to your input schema to have it validated instead.

See code: src/commands/vis.js

More Repositories

1

crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
TypeScript
14,725
star
2

crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
Python
3,734
star
3

fingerprint-suite

Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
TypeScript
875
star
4

proxy-chain

Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.
JavaScript
825
star
5

got-scraping

HTTP client made for scraping based on got.
TypeScript
490
star
6

actor-page-analyzer

Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.
JavaScript
147
star
7

apify-sdk-js

Apify SDK monorepo
TypeScript
117
star
8

apify-sdk-python

The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.
Python
115
star
9

actor-scraper

House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.
JavaScript
115
star
10

browser-pool

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
TypeScript
87
star
11

fingerprint-generator

Generates realistic browser fingerprints
TypeScript
67
star
12

apify-actor-docker

Base Docker images for Apify actors.
Dockerfile
67
star
13

apify-client-js

Apify API client for JavaScript / Node.js.
JavaScript
63
star
14

fingerprint-injector

Home of fingerprint injector.
TypeScript
63
star
15

header-generator

NodeJs package for generating browser-like headers.
TypeScript
63
star
16

covid-19

Open APIs with statistics about Covid-19
JavaScript
46
star
17

apify-client-python

Apify API client for Python
Python
43
star
18

apify-docs

This project is the home of Apify's documentation.
API Blueprint
24
star
19

actor-templates

This project is the 🏠 home of Apify actor template projects to help users quickly get started.
Python
24
star
20

xlsx-stream

JavaScript / Node.js library to stream data into an XLSX file
JavaScript
23
star
21

apify-ts

Crawlee dev repo
TypeScript
22
star
22

got-cjs

An action to release a CommonJS version of the popular library got, which is soon to be available only in an ESM format.
JavaScript
21
star
23

actor-web-automation-agent

This is the experimental version of Web Automation Agent. The agent uses natural language instructions to browse the web and extract data.
TypeScript
19
star
24

actor-content-checker

You can use this act to monitor any page's content and get a notification when content changes.
JavaScript
17
star
25

super-scraper

Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!
TypeScript
16
star
26

devtools-server

Runs a simple server that allows you to connect to Chrome DevTools running on dynamic hosts, not only localhost.
JavaScript
15
star
27

actor-quick-start

Contains a boilerplate of an Apify actor to help you get started quickly build your own actors.
Dockerfile
15
star
28

apify-shared-js

Utilities and constants shared across Apify projects.
TypeScript
12
star
29

better-sqlite3-with-prebuilds

Better SQLite prebuild & publish action
10
star
30

chat-with-a-website

A simple app that lets you chat with a given website.
Python
9
star
31

actor-scrapy-executor

Apify actor to run web spiders written in Python in the Scrapy library
Python
9
star
32

apify-zapier-integration

Apify integration for Zapier
JavaScript
8
star
33

idcac

I Don't Care About Cookies extension compiled for use with Playwright/Puppeteer
JavaScript
8
star
34

homebrew-tap

A Homebrew tap for Apify tools
Ruby
7
star
35

workflows

Apify's reusable github workflows
6
star
36

actor-legacy-phantomjs-crawler

The actor implements the legacy Apify Crawler product. It uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of JavaScript code.
JavaScript
6
star
37

act-crawler-results-to-s3

Apify actor to upload crawler results to AWS S3.
JavaScript
6
star
38

actor-example-python

Example Apify Actor written in Python
Python
5
star
39

browser-headers-generator

Package generating randomized browser-like headers.
JavaScript
4
star
40

input-schema-editor-react

Apify input schema editor written in React.js
JavaScript
4
star
41

crawlee-parallel-scraping-example

An example repository showcasing how you can scrape in parallel using one request queue
TypeScript
4
star
42

act-crawl-url-list

Apify actor to crawl a list of URLs
JavaScript
4
star
43

actor-imagediff

Returns an image containing difference of two given images.
JavaScript
3
star
44

apify-web-covid-19

A list of public COVID-19 APIs to be rendered on https://apify.com/covid-19
JavaScript
3
star
45

actor-example-proxy-intercept-request

Example: Intercept requests from https connection using "Man in the middle" proxy solution.
JavaScript
3
star
46

apify-storage-local-js

Local emulation of the apify-client NPM package, which enables local use of Apify SDK.
TypeScript
3
star
47

actor-vector-database-integrations

Transfer data from Apify Actors to vector databases (Chroma, Milvus, Pinecone, PostgreSQL (PG-Vector), Qdrant, and Weaviate)
Python
3
star
48

aidevworld2023

How to get clean web data for chatbots and LLMs slides and supporting materials.
JavaScript
3
star
49

actor-example-php

Example of Apify actor using PHP
PHP
2
star
50

apify-php-tutorial

PHP
2
star
51

apify-eslint-config

Apify ESLint preset to be shared between projects
JavaScript
2
star
52

http-request

A HTTP request library for Node.js, with a common-sense API, support for Brotli compression and without bugs in "request" NPM package
JavaScript
2
star
53

slack-messages-action

It wraps up messages sending from Apify GitHub workflows into Slack.
TypeScript
2
star
54

scraping-tools-js

A library of utility functions that make scraping, data extraction and usage of headless browsers easier and faster.
JavaScript
2
star
55

actor-beautifulsoup-scraper

Python
2
star
56

apify-tsconfig

TypeScript configuration shared across projects in Apify.
Shell
1
star
57

generative-bayesian-network

JavaScript
1
star
58

waw-file-specification

Contains specification of the Web Automation Workflow (WAW) file.
1
star
59

playwright-test-actor

Source code for the Playwright Test public actor.
TypeScript
1
star
60

apify-sdk-v2

Snapshot of Apify SDK v2 + sdk.apify.com website. This project is no longer maintained. See the https://github.com/apify/apify-sdk-js repo instead!
JavaScript
1
star
61

actor-algolia-website-indexer

Apify actor that crawls website and indexes selected web pages to Algolia index. It's used to power the search on https://help.apify.com
JavaScript
1
star
62

apify-eslint-config-ts

Typescript ESLint configuration shared across projects in Apify.
JavaScript
1
star
63

actor-proxy-test

JavaScript
1
star
64

appmixer-components

Home of all the future Appmixer components on the Apify platform.
JavaScript
1
star
65

actor-example-secret-input

Example actor showcasing the secret input fields
Dockerfile
1
star
66

actor-scrapy-books-example

Example of Python Scrapy project. It scrapes book data from https://books.toscrape.com/.
Python
1
star
67

komparz

Special, yet insignificant actors
JavaScript
1
star
68

actor-crawler-cheerio

DEPRECATED: An actor that crawls websites and parses HTML pages using Cheerio library. Supports recursive crawling as well as URL lists.
JavaScript
1
star
69

actor-crawler-puppeteer

DEPRECATED: An Apify actor that enables crawling of websites using headless Chrome and Puppeteer. The actor is highly customizable and supports recursive crawling of websites as well as lists of URLs.
JavaScript
1
star
70

actor-monorepo-example

An example repository with multiple Apify Actors sharing code between each other.
JavaScript
1
star
71

apify-haystack

The official integration for Apify and Haystack 2.0
Python
1
star
72

openapi

An OpenAPI specification for the Apify API.
JavaScript
1
star
73

scrapy-migrator

A standalone POC script for wrapping Scrapy projects with Apify middleware.
Python
1
star