• Stars
    star
    109
  • Rank 319,077 (Top 7 %)
  • Language
    Python
  • License
    Other
  • Created almost 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Workflow-Guided Exploration: sample-efficient RL agent for web tasks

wge

Authors: Evan Zheran Liu*, Kelvin Guu*, Panupong (Ice) Pasupat*, Tianlin Shi, Percy Liang (* equal contribution)

Source code accompanying our ICLR 2018 paper:
Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration

Reproducible experiments using this code are located on our Codalab worksheet.

Purpose

The goal of this project is to train machine learning models (agents) to do things in a browser that can be specified in natural language, e.g. "Book a flight from San Francisco to New York for Dec 23rd."

Setup

General setup

  • Python dependencies

    pip install -r requirements.txt
    
    • If this gives you problems, try again and add pip's --ignore-installed flag.
  • Node and npm

    • Make sure Node and npm are installed via brew install node. If they are, node -v and npm -v should print version numbers.
  • PyTorch

    • Install PyTorch v0.1.12. Newer versions of PyTorch are not backwards compatible.
  • Selenium

    • Outside this repository, download ChromeDriver. Unzip it and then add the directory containing the chromedriver executable to the PATH environment variable
      export PATH=$PATH:/path/to/chromedriver
      
    • If instead you're using Anaconda, use conda install -c conda-forge selenium.

Data directory setup

  • This code depends on the environmental variable $RL_DATA being set, pointing to a configured data directory.

  • Create a data directory mkdir -p /path/to/data and set export $RL_DATA=/path/to/data. In order for the code to run, $RL_DATA will need to be set to point at this directory.

  • Next, set up the data directory:

    cd $RL_DATA
    # Download glove from https://nlp.stanford.edu/data/glove.6B.zip and place
    # in current directory however you want
    # Suggested: wget https://nlp.stanford.edu/data/glove.6B.zip
    unzip glove.6B.zip
    mv glove.6B glove
    

Demonstration directory setup

# Where $REPO_DIR is the path to the root of this Git repository.
git clone https://github.com/stanfordnlp/miniwob-plusplus-demos.git $REPO_DIR/third-party/miniwob-demos
export RL_DEMO_DIR=$REPO_DIR/third-party/miniwob-demos/

MiniWoB setup

  • There are 2 ways to access MiniWoB tasks:
    1. Use the file:// protocol (Recommended): Open miniwob-sandbox/html/ in the browser, and then export the URL to the MINIWOB_BASE_URL environment variable:
    export MINIWOB_BASE_URL='file:///path/to/miniwob-sandbox/html/'
    
    1. Run a simple server: go to miniwob-sandbox/html/ and run the supplied http-serve.
    • The tasks should now be accessible at http://localhost:8080/miniwob/
    • To use a different port (say 8765), run http-serve 8765, and then export the following to the MINIWOB_BASE_URL environment variable:
    export MINIWOB_BASE_URL='http://localhost:8765/'
    
  • Once you've followed one of the steps above, test MiniWoBEnvironment by running
    pytest wge/tests/miniwob/test_environment.py -s
    

MiniWoB versions of FormWoB

Follow the "Run a simple server" instruction in the MiniWoB setup section above.

Launching an Experiment

To train a model on a task, run:

python main.py configs/default-base.txt --task click-tab-2
  • This executes the main entrypoint script, main.py. In particular, we pass it a base HOCON format config file and the task click-tab-2.
  • Additional configs can be merged in by passing them as commandline arguments from configs/config-mixins
  • Make sure that the following environmental variables are set: MINIWOB_BASE_URL, RL_DEMO_DIR, REPO_DIR.
  • You may also want to set the PYTHONPATH to the same place as REPO_DIR to make imports work out properly
  • You can also run this via docker by first running python run_docker.py to launch Docker and then running the above command. Unfortunately, you will not be able to see the model train in the Docker container.
  • The different tasks can be found in the subdirectories of third-party/miniwob-sandbox/html

If the script is working, you should see several Chrome windows pop up (operated by Selenium) and a training progress bar in the terminal.

Experiment management

All training runs are managed by the MiniWoBTrainingRuns object. For example, to get training run #141, do this:

runs = MiniWoBTrainingRuns()
run = runs[141]  # a MiniWoBTrainingRun object

A TrainingRun is responsible for constructing a model, training it, saving it and reloading it (see superclasses gtd.ml.TrainingRun and gtd.ml.TorchTrainingRun for details.)

The most important methods on MiniWobTrainingRun are:

  • __init__: the policy, the environment, demonstrations, etc, are all loaded here.
  • train: actual training of the policy happens here

Model architecture

During training, there are several key systems involved:

  • the environment
  • policies
    • the model policy
    • the exploration policy
  • episode generators
    • basic episode generator
    • best first episode generator
  • the replay buffer

Environment

All environments implement the Environment interface. A policy interacts with the environment by calling the environment's step method and passing in actions.

Note that an environment object is batched. It actually represents a batch of environments, each running in parallel (so that we can train faster).

We mostly use MiniWoBEnvironment and FormWoBEnvironment.

Policies

See the Policy interface. The most important methods are act, update_from_episodes and update_from_replay_buffer.

Note that all of these methods are also batched (i.e. they operate on multiple episodes in parallel)

The model policy is the main one that we are trying to train. See MiniWoBPolicy as an example.

Episode generators

See the EpisodeGenerator interface. An EpisodeGenerator runs a Policy on an Environment to produce an Episode.

Replay buffer

See the ReplayBuffer interface. A ReplayBuffer stores episodes produced by the exploration policy. The final model policy is trained off episodes sampled from the replay buffer.

Configuration

All configs are in the configs folder. They are specified in HOCON format. The arguments to main.py should be a list of paths to config files. main.py then merges these config files according to the rules explained here.

More Repositories

1

dspy

DSPy: The framework for programming—not prompting—foundation models
Python
18,220
star
2

CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
Java
9,678
star
3

stanza

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
Python
7,278
star
4

GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
C
6,867
star
5

cs224n-winter17-notes

Course notes for CS224N Winter17
TeX
1,587
star
6

pyreft

ReFT: Representation Finetuning for Language Models
Python
1,137
star
7

treelstm

Tree-structured Long Short-Term Memory networks (http://arxiv.org/abs/1503.00075)
Lua
875
star
8

pyvene

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Python
625
star
9

string2string

String-to-String Algorithms for Natural Language Processing
Jupyter Notebook
533
star
10

python-stanford-corenlp

Python interface to CoreNLP using a bidirectional server-client interface.
Python
516
star
11

mac-network

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Python
494
star
12

phrasal

A large-scale statistical machine translation system written in Java.
Java
208
star
13

spinn

SPINN (Stack-augmented Parser-Interpreter Neural Network): fast, batchable, context-aware TreeRNNs
Python
205
star
14

coqa-baselines

The baselines used in the CoQA paper
Python
176
star
15

cocoa

Framework for learning dialogue agents in a two-player game setting.
Python
158
star
16

stanza-old

Stanford NLP group's shared Python tools.
Python
138
star
17

chirpycardinal

Stanford's Alexa Prize socialbot
Python
131
star
18

stanfordnlp

[Deprecated] This library has been renamed to "Stanza". Latest development at: https://github.com/stanfordnlp/stanza
Python
114
star
19

pdf-struct

Logical structure analysis for visually structured documents
Python
81
star
20

edu-convokit

Edu-ConvoKit: An Open-Source Framework for Education Conversation Data
Jupyter Notebook
75
star
21

cs224n-web

http://cs224n.stanford.edu
HTML
60
star
22

ColBERT-QA

Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)
40
star
23

stanza-train

Model training tutorials for the Stanza Python NLP Library
Python
37
star
24

phrasenode

Mapping natural language commands to web elements
Python
37
star
25

contract-nli-bert

A baseline system for ContractNLI (https://stanfordnlp.github.io/contract-nli/)
Python
29
star
26

color-describer

Code for Learning to Generate Compositional Color Descriptions
OpenEdge ABL
26
star
27

stanza-resources

23
star
28

python-corenlp-protobuf

Python bindings for Stanford CoreNLP's protobufs.
Python
20
star
29

miniwob-plusplus-demos

Demos for the MiniWoB++ benchmark
17
star
30

multi-distribution-retrieval

Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval
Python
14
star
31

huggingface-models

Scripts for pushing models to huggingface repos
Python
11
star
32

nlp-meetup-demo

Java
8
star
33

sentiment-treebank

Updated version of SST
Python
8
star
34

en-worldwide-newswire

An English NER dataset built from foreign newswire
Python
7
star
35

plot-data

datasets for plotting
Jupyter Notebook
6
star
36

contract-nli

ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts
HTML
4
star
37

plot-interface

Web interface for the plotting project
JavaScript
3
star
38

handparsed-treebank

Extra hand parsed data for training models
Perl
2
star
39

coqa

CoQA -- A Conversational Question Answering Challenge
Shell
2
star
40

pdf-struct-models

A repository for hosting models for https://github.com/stanfordnlp/pdf-struct
HTML
2
star
41

chirpy-parlai-blenderbot-fork

A fork of ParlAI supporting Chirpy Cardinal's custom neural generator
Python
2
star
42

wob-data

Data for QAWoB and FlightWoB web interaction benchmarks from the World of Bits paper (Shi et al., 2017).
Python
2
star
43

pdf-struct-dataset

Dataset for pdf-struct (https://github.com/stanfordnlp/pdf-struct)
HTML
1
star
44

nn-depparser

A re-implementation of nndep using PyTorch.
Python
1
star