The W3C WebDriver Spec: A Simplified Guide

This document is essentially a cheat sheet for the official WebDriver spec (which has in-progress drafts available on GitHub). The official spec is designed for implementers to have very detailed information about processing algorithms and so on. Much of the information in the spec is not targeted towards those who are simply writing client libraries, or even users who want a closer look at the API. It also uses language which is so exact it can sometimes obfuscate an intuitive understanding of a section.

The approach used here is to simply look at the supported endpoints along with their inputs and outputs, without worrying too much how the implementation is supposed to work. This should be beneficial to client library implementers as well as remote end implementers looking for some quick highlights. And in most cases there are examples to illustrate

DISCLAIMER: This is not the official spec. It is my interpretation of it and an attempt to present the most salient bits of it in a more digestible fashion. You should always consult the official spec before beginning work on a client or server implementation!

Introduction

What is WebDriver? From the spec:

WebDriver is a remote control interface that enables introspection and control of user agents. It provides a platform- and language-neutral wire protocol as a way for out-of-process programs to remotely instruct the behavior of web browsers.

Essentially, it's a client-server protocol that allows you to automate web browsers. Clients send well-formed requests, the server interprets them according to the protocol, and then performs the automation behaviors as defined by the implementation steps in the spec. The most common use of this technology is for automated testing.

The WebDriver spec grew out of the Selenium project, and that is still the community of users pushing forward the associated browser automation technology and using it every day to write and run automated tests. Browser vendors now also support the WebDriver spec natively.

WebDriver has gone beyond the web, with implementations for mobile and desktop app automation. The Appium project is a set of WebDriver-compliant servers that allow automation of these non-web-browser platforms.

Basic Architecture

Automation is organized around WebDriver sessions, whose state is maintained across requests via a 'session id' token shared by the server and client. Creating a new session involves sending parameters in the form of capabilities, which tell the server what you want to automate and under what conditions. The server prepares the appropriate browser with any modifications as specified in the capabilities, and the session is then underway. Automation commands and responses are sent back and forth (keyed on the session id), until the client sends a request to delete the session, at which point the browser and other resources are quit or cleaned up, and the session id is discarded.

Requests and Responses

Overview

When the client (called a local end) sends a request to the server (called a remote end), this is known in the spec as a 'command'. Since this is an HTTP protocol, commands have several components:

An HTTP verb
A path

The remote end at this point looks up the command based on the HTTP verb and path. The spec defines a list of endpoints that map verb + path to a command name. The path portion is actually a list of "URI Templates" that show how path components should be extracted as parameters for the command. For example, in:

/session/{session id}/element

The {session id} bit is saying that this component of the path is a "url variable" called session id whose value will be sent to the command (in this case Find Element). Once a command is matched to the request, other data is potentially parsed from the request body (these are the "parameters"), the command is executed (having been passed any url variables and request parameters), and a response is returned.

Request format and handling

A request from the local end to the remote end is a valid HTTP request, with a verb, path, and potentially a body. As mentioned above, the remote end validates the request and attempts to map it to a command. If the request can't be mapped to a command, an unknown method error is returned (see below for what it means to return an error).

There is one command (New Session) which does not require a session id url variable. Every other command requires this variable, since every other command is executed in the context of an existing session. If we are not requesting a new session, the remote end immediately validates the session id against the list of active sessions. If it's not found, an invalid session id error is returned.

In the case of a POST request, the local end might have sent data in the request body. This data must always be JSON data. The remote end first parses it as JSON (if this fails, an invalid argument error is returned). If the result of the parse is not a JSON object (i.e., if it's a string or array or number or what have you), an invalid argument is likewise returned. Otherwise, the result of parsing is the set of "parameters" which is passed to the command.

(In the case of a POST request without a request body, the "parameters" value is null.)

Response format

When a remote end sends an HTTP response, it first of all uses an appropriate HTTP status code and message (for example, 404 and no such element), based on the command that was attempted and the result. The spec defines status codes and messages for various responses, including success and error conditions (see below).

It then sets the following headers:

Content-Type: "application/json; charset=utf-8"
Cache-Control: "no-cache"

If any data needs to be returned with the response, it is serialized into a JSON object with the key value, e.g.:

{"value": null}

And this becomes the body of the HTTP response.

Normal responses

When an error has not occurred, the HTTP status is 200, and the response body is the appropriate JSON object with the response data in the value property of the JSON object.

Error handling

When an error occurs, the remote end first of all determines the appropriate error code and corresponding HTTP status code (see below for the full list). For example, if an element could not be found, the error code is no such element and the corresponding HTTP status code is 404. The remote end then constructs a data JSON object with the properties error, message, and stacktrace. Here error is just the JSON code for the error (see table below; usually the same as the error code itself). message is whatever implementation-specific error message is appropriate. And likewise stacktrace is an implementation-specific stacktrace useful to implementation maintainers in diagnosing any issues.

An example error JSON object could look like:

{
  "error": "no such element",
  "message": "My fake implementation couldn't find your element",
  "stacktrace": "Fake:21> Not a real stacktrace"
}

Since this JSON object becomes the data for the response, the full response from the remote end would be an HTTP status code of 404, the headers listed above, and finally the following JSON string as the response body:

{
  "value": {
    "error": "no such element",
    "message": "My fake implementation couldn't find your element",
    "stacktrace": "Fake:21> Not a real stacktrace"
   }
}

Error codes

The following is a list of all the possible errors, their HTTP status codes, and their JSON error codes:

Error code	HTTP Status	JSON code	Description
element click intercepted	400	element click intercepted	The Element Click command could not be completed because the element receiving the events is obscuring the element that was requested clicked.
element not selectable	400	element not selectable	An attempt was made to select an element that cannot be selected.
element not interactable	400	element not interactable	A command could not be completed because the element is not pointer- or keyboard interactable.
insecure certificate	400	insecure certificate	caused the user agent to hit a certificate warning, which is usually the result of an expired or invalid TLS certificate.
invalid argument	400	invalid argument	The arguments passed to a command are either invalid or malformed.
invalid cookie domain	400	invalid cookie domain	An illegal attempt was made to set a cookie under a different domain than the current page.
invalid coordinates	400	invalid coordinates	The coordinates provided to an interactions operation are invalid.
invalid element state	400	invalid element state	A command could not be completed because the element is in an invalid state, e.g. attempting to click an element that is no longer attached to the document.
invalid selector	400	invalid selector	Argument was an invalid selector.
invalid session id	404	invalid session id	Occurs if the given session id is not in the list of active sessions, meaning the session either does not exist or that it’s not active.
javascript error	500	javascript error	An error occurred while executing JavaScript supplied by the user.
move target out of bounds	500	move target out of bounds	The target for mouse interaction is not in the browser’s viewport and cannot be brought into that viewport.
no such alert	400	no such alert	An attempt was made to operate on a modal dialog when one was not open.
no such cookie	404	no such cookie	No cookie matching the given path name was found amongst the associated cookies of the current browsing context’s active document.
no such element	404	no such element	An element could not be located on the page using the given search parameters.
no such frame	400	no such frame	A command to switch to a frame could not be satisfied because the frame could not be found.
no such window	400	no such window	A command to switch to a window could not be satisfied because the window could not be found.
script timeout	408	script timeout	A script did not complete before its timeout expired.
session not created	500	session not created	A new session could not be created.
stale element reference	400	stale element reference	A command failed because the referenced element is no longer attached to the DOM.
timeout	408	timeout	An operation did not complete before its timeout expired.
unable to set cookie	500	unable to set cookie	A command to set a cookie’s value could not be satisfied.
unable to capture screen	500	unable to capture screen	A screen capture was made impossible.
unexpected alert open	500	unexpected alert open	A modal dialog was open, blocking this operation.
unknown command	404	unknown command	A command could not be executed because the remote end is not aware of it.
unknown error	500	unknown error	An unknown error occurred in the remote end while processing the command.
unknown method	405	unknown method	The requested command matched a known URL but did not match an method for that URL.
unsupported operation	500	unsupported operation	Indicates that a command that should have executed properly cannot be supported for some reason.

The Endpoints

In this section, we go through each endpoint and examine its inputs and outputs and potential errors. The conventions I use are:

"URL variables": variable strings slotted into URI templates
"Request parameters": properties of the JSON object in the request body. Could be "None", which means no body
"Response value": the value of the value property of the response body, when that is a single, non-object value.
"Response properties": properties of a JSON object which is the value of the value property of the response body. For example, in this JSON response body:
```
{"value": {"foo": "bar"}}
```
I'm calling foo a "response property" with a value of "bar".
"Possible errors": errors and codes it's possible for the command to return in case of an error specific to that command. Note that regardless of what's in this list, it's always possible for some errors to occur (e.g., invalid session id or unknown error. As another example, most endpoints attempt to handle user prompts in the course of operation, which might result in unexpected alert open. See Handling User Prompts for more information). A value of "None" here means "no particularly relevant errors", not that it's not possible for an error to occur!

List of all endpoints

Method	URI Template	Command
POST	/session	New Session
DELETE	/session/{session id}	Delete Session
GET	/status	Status
GET	/session/{session id}/timeouts	Get Timeouts
POST	/session/{session id}/timeouts	Set Timeouts
POST	/session/{session id}/url	Go
GET	/session/{session id}/url	Get Current URL
POST	/session/{session id}/back	Back
POST	/session/{session id}/forward	Forward
POST	/session/{session id}/refresh	Refresh
GET	/session/{session id}/title	Get Title
GET	/session/{session id}/window	Get Window Handle
DELETE	/session/{session id}/window	Close Window
POST	/session/{session id}/window	Switch To Window
GET	/session/{session id}/window/handles	Get Window Handles
POST	/session/{session id}/frame	Switch To Frame
POST	/session/{session id}/frame/parent	Switch To Parent Frame
GET	/session/{session id}/window/rect	Get Window Rect
POST	/session/{session id}/window/rect	Set Window Rect
POST	/session/{session id}/window/maximize	Maximize Window
POST	/session/{session id}/window/minimize	Minimize Window
POST	/session/{session id}/window/fullscreen	Fullscreen Window
POST	/session/{session id}/element	Find Element
POST	/session/{session id}/elements	Find Elements
POST	/session/{session id}/element/{element id}/element	Find Element From Element
POST	/session/{session id}/element/{element id}/elements	Find Elements From Element
GET	/session/{session id}/element/active	Get Active Element
GET	/session/{session id}/element/{element id}/selected	Is Element Selected
GET	/session/{session id}/element/{element id}/attribute/{name}	Get Element Attribute
GET	/session/{session id}/element/{element id}/property/{name}	Get Element Property
GET	/session/{session id}/element/{element id}/css/{property name}	Get Element CSS Value
GET	/session/{session id}/element/{element id}/text	Get Element Text
GET	/session/{session id}/element/{element id}/name	Get Element Tag Name
GET	/session/{session id}/element/{element id}/rect	Get Element Rect
GET	/session/{session id}/element/{element id}/enabled	Is Element Enabled
POST	/session/{session id}/element/{element id}/click	Element Click
POST	/session/{session id}/element/{element id}/clear	Element Clear
POST	/session/{session id}/element/{element id}/value	Element Send Keys
GET	/session/{session id}/source	Get Page Source
POST	/session/{session id}/execute/sync	Execute Script
POST	/session/{session id}/execute/async	Execute Async Script
GET	/session/{session id}/cookie	Get All Cookies
GET	/session/{session id}/cookie/{name}	Get Named Cookie
POST	/session/{session id}/cookie	Add Cookie
DELETE	/session/{session id}/cookie/{name}	Delete Cookie
DELETE	/session/{session id)/cookie	Delete All Cookies
POST	/session/{session id}/actions	Perform Actions
DELETE	/session/{session id}/actions	Release Actions
POST	/session/{session id}/alert/dismiss	Dismiss Alert
POST	/session/{session id}/alert/accept	Accept Alert
GET	/session/{session id}/alert/text	Get Alert Text
POST	/session/{session id}/alert/text	Send Alert Text
GET	/session/{session id}/screenshot	Take Screenshot
GET	/session/{session id}/element/{element id}/screenshot	Take Element Screenshot
POST	/session/{session id}/print	Print Page

New Session

HTTP Method	Path Template
POST	/session

Capability	Key	Value Type	Description
Browser name	`browserName`	string	Identifies the user agent.
Browser version	`browserVersion`	string	Identifies the version of the user agent.
Platform name	`platformName`	string	Identifies the operating system of the endpoint node.
Accept insecure TLS certificates	`acceptInsecureCerts`	boolean	Indicates whether untrusted and self-signed TLS certificates are implicitly trusted on navigation for the duration of the session.
Page load strategy	`pageLoadStrategy`	string	Defines the current session’s page load strategy. Can be `none` (doesn't wait for readiness), `normal` (waits for document `interactive` state), or `eager` (waits for document `complete` state).
Proxy configuration	`proxy`	JSON Object	Defines the current session’s proxy configuration. This is a potentially complex object: see the spec for more info.
Window dimensioning/positioning	`setWindowRect`	boolean	Indicates whether the remote end supports all of the commands in Resizing and Positioning Windows.
Session timeouts configuration	`timeouts`	JSON Object	Describes the timeouts imposed on certain session operations, as described in the Set Timeouts command.
Unhandled prompt behavior	`unhandledPromptBehavior`	string	Describes the current session’s user prompt handler.

Strategy	Keyword
CSS selector	`css selector`
Link text selector	`link text`
Partial link text selector	`partial link text`
Tag name	`tag name`
XPath selector	`xpath`

jlipps/simple-wd-spec

jlipps

Reviews

Repository Details

The W3C WebDriver Spec: A Simplified Guide

Introduction

Basic Architecture

Requests and Responses

Overview

Request format and handling

Response format

Normal responses

Error handling

Error codes

The Endpoints

List of all endpoints

New Session

Delete Session

Status

Get Timeouts

Set Timeouts

Go

Get Current URL

Back

Forward

Refresh

Get Title

Get Window Handle

Close Window

Switch to Window

Get Window Handles

Switch To Frame

Switch To Parent Frame

Get Window Rect

Set Window Rect

Maximize Window

Minimize Window

Fullscreen Window

Find Element

Find Elements

Find Element From Element

Find Elements From Element

Get Active Element

Is Element Selected

Get Element Attribute

Get Element Property

Get Element CSS Value

Get Element Text

Get Element Tag Name

Get Element Rect

Is Element Enabled

Element Click

Element Clear

Element Send Keys

Get Page Source

Execute Script

Execute Async Script

Get All Cookies

Get Named Cookie

Add Cookie

Delete Cookie

Delete All Cookies

Perform Actions

Input Sources and Corresponding Actions

Release Actions

Dismiss Alert

Accept Alert

Get Alert Text

Send Alert Text

Take Screenshot

Take Element Screenshot

Print Page

Other Topics

Capabilities

Processing Capabilities

Window Handles

Handling User Prompts

Location Strategies

More Repositories