• Stars
    star
    201
  • Rank 188,340 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 2 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[NeurIPS 2022] πŸ›’WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

πŸ›’ WebShop

Python version License PyPI version Pytest workflow

Implementation of the WebShop environment and search agents for the paper:

WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Shunyu Yao*, Howard Chen*, John Yang, Karthik Narasimhan

This repository contains code for reproducing results. If you find this work useful in your research, please cite:

@inproceedings{yao2022webshop,
  bibtex_show = {true},
  title = {WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents},
  author = {Yao, Shunyu and Chen, Howard and Yang, John and Narasimhan, Karthik},
  booktitle = {ArXiv},
  year = {preprint},
  html = {https://arxiv.org/abs/2207.01206},
  tag = {NLP}
}

πŸ“– Table of Contents

πŸ‘‹ Overview

WebShop is a simulated e-commerce website environment with 1.18 million real-world products and 12,087 crowd-sourced text instructions. In this environment, an agent needs to navigate multiple types of webpages and issue diverse actions to find, customize, and purchase a product given an instruction. WebShop provides several challenges including understanding compositional instructions, query (re-)formulation, dealing with noisy text in webpages, and performing strategic exploration.

Hugging Face Demo: Devise your own natural language query for a product and ask for an agent trained with WebShop to find it on Amazon or eBay, deployed as a πŸ€— Hugging Face space here!

Python Package: If you would like to interact with the WebShop environment interface instead of installing and setting up the source code, you can install the webshop python package directly. The project page contains (WIP) documentation on how to use the library, and the library can be installed via pip install webshop.

πŸš€ Setup

Our code is implemented in Python. To setup, do the following:

  1. Install Python 3.8.13
  2. Install Java
  3. Download the source code:
> git clone https://github.com/princeton-nlp/webshop.git webshop
  1. Create a virtual environment using Anaconda and activate it
> conda create -n webshop python=3.8.13
> conda activate webshop
  1. Install requirements into the webshop virtual environment via the setup.sh script
> ./setup.sh [-d small|all]

The setup script performs several actions in the following order:

  • Installs Python dependencies listed in requirements.txt
  • Downloads product and instruction data for populating WebShop
  • Downloads spaCy en_core_web_lg model
  • Construct search engine index from product, instruction data
  • Downloads 50 randomly chosen trajectories generated by MTurk workers The -d flag argument allows you to specify whether you would like to pull the entire product + instruction data set (-d all) or a subset of 1000 random products (-d small).
  1. By default the WebShop only loads 1,000 products for a faster environment preview. To load all products, change web_agent_site/utils.py:
# DEFAULT_ATTR_PATH = join(BASE_DIR, '../data/items_ins_v2_1000.json')
# DEFAULT_FILE_PATH = join(BASE_DIR, '../data/items_shuffle_1000.json')
DEFAULT_ATTR_PATH = join(BASE_DIR, '../data/items_ins_v2.json')
DEFAULT_FILE_PATH = join(BASE_DIR, '../data/items_shuffle.json')
  1. (Optional) Download ResNet image feature files here and put into data/ for running models that require image features.

  2. (Optional) Human demonstration data and be downloaded here.

πŸ› οΈ Usage

The WebShop environment can be rendered in two modes - html and simple - each of which offer a different observation space. The simple mode strips away the extraneous meta-data that the html mode includes to make model training and evaluation easier.

Webpage Environment (html mode)

Launch the WebShop webpage:

> ./run_dev.sh

The site should then be viewable in the browser. Go to http://localhost:3000/ABC, where you should land on the search home page with a random instruction.

Navigating the website will automatically generate a corresponding trajectory file in the user_session_logs/mturk folder. Each file corresponds to a single instruction/web session, and each step of the file corresponds to a single action (i.e. search[...], click[...]).

The current WebShop build comes with two flags:

  • --log: Include this flag to create a trajectory .jsonl log file of actions on WebShop
  • --attrs: Include this flag to display an Attributes tab on the item_page of WebShop

Text Environment (simple mode)

The simple mode of the WebShop environment is packaged and readily available as an OpenAI environment. The OpenAI gym definitions of the text environment can be found in the web_agent_site/envs folder.

To start using the gym and building agents that interact with the WebShop environment, include the following statements in your Python file:

import gym
from web_agent_site.envs import WebAgentTextEnv

env = gym.make('WebAgentTextEnv-v0', observation_mode='text', num_products=...)

Now, you can write your own agent that interacts with the environment via the standard OpenAI gym interface.

Examples of a RandomPolicy agent interacting with the WebShop environment in both html and simple mode can be found in the run_envs folder. To run these examples locally, run the run_web_agent_text_env.sh or run_web_agent_site_env.sh script:

> ./run_web_agent_text_env.sh
Products loaded.
Keys Cleaned.
Attributes Loaded.
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1000/1000
Loaded 6910 goals.
Amazon Shopping Game [SEP] Instruction: [SEP] Find me slim f...
Available actions: {'has_search_bar': True, 'clickables': ['search']}
Taking action "search[shoes]" -> Reward = 0.0
...

In order to run the run_web_agent_site_env.sh script, you must download a version of ChromeDriver compatible with your Chrome browser version. Once you have downloaded and unzipped the executable, rename it chromedriver and place it in the webshop/envs folder.

Baseline Models

To run baseline models (rule, IL, RL, IL+RL) from the paper, please refer to the README.md in the baseline_models folder.

Sim-to-real Transfer

To read more about how the sim-to-real transfer of agents trained on WebShop to other environments works, please refer to the README.md in the transfer folder.

πŸ’« Contributions

We would love to hear from the broader NLP and Machine Learning community, and we welcome any contributions, pull requests, or issues! To do so, please either file a new pull request or issue and fill in the corresponding templates accordingly. We'll be sure to follow up shortly!

πŸͺͺ License

Check LICENSE.md

More Repositories

1

SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.29% of bugs in the SWE-bench evaluation set and takes just 1.5 minutes to run.
Python
10,387
star
2

tree-of-thought-llm

[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Python
4,170
star
3

SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
Python
3,238
star
4

SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
Python
1,228
star
5

MeZO

[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
Python
975
star
6

PURE

[NAACL 2021] A Frustratingly Easy Approach for Entity and Relation Extraction https://arxiv.org/abs/2010.12812
Python
763
star
7

LM-BFF

[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723
Python
707
star
8

DensePhrases

[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624
Python
593
star
9

LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Python
439
star
10

ALCE

[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627
Python
380
star
11

AutoCompressors

[EMNLP 2023] Adapting Language Models to Compress Long Contexts
Python
227
star
12

LESS

Preprint: Less: Selecting Influential Data for Targeted Instruction Tuning
Jupyter Notebook
208
star
13

TRIME

[EMNLP 2022] Training Language Models with Memory Augmentation https://arxiv.org/abs/2205.12674
Python
185
star
14

CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
Python
180
star
15

intercode

[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
Python
168
star
16

OptiPrompt

[NAACL 2021] Factual Probing Is [MASK]: Learning vs. Learning to Recall https://arxiv.org/abs/2104.05240
Python
167
star
17

TransformerPrograms

[NeurIPS 2023] Learning Transformer Programs
Python
146
star
18

EntityQuestions

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers https://arxiv.org/abs/2109.08535
Python
124
star
19

DinkyTrain

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration πŸšƒ
Python
108
star
20

CEPE

Preprint: Long-Context Language Modeling with Parallel Encodings
Python
99
star
21

QuRating

Selecting High-Quality Data for Training Language Models
Python
85
star
22

NLProofS

EMNLP 2022: Generating Natural Language Proofs with Verifier-Guided Search https://arxiv.org/abs/2205.12443
Python
79
star
23

LLMBar

[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
Python
74
star
24

MQuAKE

[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Jupyter Notebook
73
star
25

MADE

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering
Python
71
star
26

LM-Kernel-FT

A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643
Python
68
star
27

USACO

Can Language Models Solve Olympiad Programming?
Python
66
star
28

calm-textgame

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games
Python
62
star
29

c-sts

[EMNLP 2023] C-STS: Conditional Semantic Textual Similarity
Python
59
star
30

ShortcutGrammar

EMNLP 2022: Finding Dataset Shortcuts with Grammar Induction https://arxiv.org/abs/2210.11560
Jupyter Notebook
57
star
31

DataMUX

[NeurIPS 2022] DataMUX: Data Multiplexing for Neural Networks
Jupyter Notebook
57
star
32

EvalConvQA

[ACL 2022] Ditch the Gold Standard: Re-evaluating Conversational Question Answering
Python
44
star
33

Collie

[ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks
Jupyter Notebook
44
star
34

MABEL

EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975
Python
36
star
35

rationale-robustness

NAACL 2022: Can Rationalization Improve Robustness? https://arxiv.org/abs/2204.11790
Python
26
star
36

InstructEval

Evaluation suite for the systematic evaluation of instruction selection methods.
Jupyter Notebook
23
star
37

LM-Science-Tutor

Python
22
star
38

WhatICLLearns

[ACL 2023 Findings] What In-Context Learning β€œLearns” In-Context: Disentangling Task Recognition and Task Learning
Python
21
star
39

Cognac

Repo for paper: Controllable Text Generation with Language Constraints
Python
19
star
40

PTP

Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073
Python
18
star
41

semsup

Semantic Supervision: Enabling Generalization over Output Spaces
Python
16
star
42

datamux-pretraining

MUX-PLMs: Pretraining LMs with Data Multiplexing
Python
14
star
43

corpus-poisoning

[EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156
Python
14
star
44

XTX

[ICLR 2022 Spotlight] Multi-Stage Episodic Control for Strategic Exploration in Text Games
Python
13
star
45

SRL-NLC

Safe Reinforcement Learning with Natural Language Constraints
13
star
46

MultilingualAnalysis

Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"
Python
13
star
47

blindfold-textgame

[NAACL 2021] Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents
Python
12
star
48

dyck-transformer

[ACL 2021] Self-Attention Networks Can Process Bounded Hierarchical Languages
Python
11
star
49

metric-wsd

NAACL'2021: Non-Parametric Few-Shot Learning for Word Sense Disambiguation
Python
10
star
50

align-mlm

Python
10
star
51

semsup-xc

SemSup-XC: Semantic Supervision for Extreme Classification
Jupyter Notebook
10
star
52

lwm

We develop world models that can be adapted with natural language. Intergrating these models into artificial agents allows humans to effectively control these agents through verbal communication.
Python
7
star
53

CARETS

Python
6
star
54

Heuristic-Core

The code accompanying the paper "The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models" - https://arxiv.org/abs/2403.03942
Python
5
star
55

SPARTAN

SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers
Python
5
star
56

attribute-tagging

[LaReL 2022] Towards an Enhanced, Faithful, and Adaptable Web Interaction Environment
Python
4
star
57

NegotiationToM

Code release for Improving Dialog Systems for Negotiation with Personality Modeling.
Python
3
star
58

MoQA

Python
3
star
59

il-scaling-in-games

Official code repo of "Scaling Laws for Imitation Learning in NetHack"
Python
3
star