These are the scripts used for the getting the dataset shown here: https://rb.gy/lfqkx1
All the prerequisite files are in requirements.txt Also, install selenium webdriver firefox.
- Download this repository
- Install the requirements via $ pip install -r requirements.txt
scrape_benzinga.py is for getting scrape data for individual stocks. scrape_benzinga_full.py is for scraping the entire Benzinga news database.
To use scrape_benzinga.py,
- Put your script in this directory
- import scrape_benzinga
scrape_benzinga has two functions:
- get_benzinga_data
- get_benzinga_data_with_lookback
get_benzinga_data_with_lookback takes 1 argument: stock It returns all analyst ratings, not partner headlines because analyst ratings are the only type of article that can be scraped for their exact dates. I might add partner headline support in the future. get_benzinga_data takes 2 arguments: stock, days_to_lookback This gets the entire history of stock news for the designated stock whose release dates are within the days_to_lookback range: e.g. 7 day lookback returns all articles published earlier than 7 days ago. This function returns both analyst_ratings and partner_headlines
To use scraper_benzinga_full.py, simply call the script via python scraper_benzinga_full.py and follow the prompts shown on the command line.