• Stars
    star
    130
  • Rank 277,575 (Top 6 %)
  • Language
    Python
  • Created over 7 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for the second edition Web Scraping with Python book by Packt Publications

Web Scraping with Python

Welcome to the code repository for Web Scraping with Python, Second Edition! I hope you find the code and data here useful. If you have any questions reach out to @kjam on Twitter or GitHub.

Code Structure

All of the code samples are in folders separated by chapter. Scripts are intended to be run from the code folder, allowing you to easily import from the chapters.

Code Examples

I have not included every code sample you've found in the book, but I have included a majority of the finished scripts. Although these are included, I encourage you to write out each code sample on your own and use these only as a reference.

Firefox Issues

Depending on your version of Firefox and Selenium, you may run into JavaScript errors. Here are some fixes:

  • Use an older version of Firefox
  • Upgrade Selenium to >=3.0.2 and download the geckodriver. Make sure the geckodriver is findable by your PATH variable. You can do this by adding this line to your .bashrc or .bash_profile. (Wondering what these are? Please read the Appendix C on learning the command line).
  • Use PhantomJS with Selenium (change your browser line to webdriver.PhantomJS('path/to/your/phantomjs/installation'))
  • Use Chrome, InternetExplorer or any other supported browser

Feel free to reach out if you have any questions!

Issues with Module Import

Seeing chp1 ModuleNotFound errors? Try adding this snippet to the file:

import os
import sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir)))

What this does is append the main module to your system path, which is where Python looks for imports. On some installations, I have noticed the current directory is not immediately added (common practice), so this code explicitly adds that directory to your path.

Corrections?

If you find any issues in these code examples, feel free to submit an Issue or Pull Request. I appreciate your input!

First edition repository

If you are looking for the first edition's repository, you can find it here: Web Scraping with Python, First Edition

Questions?

Reach out to @kjam on Twitter or GitHub. @kjam is also often on freenode. :)

More Repositories

1

data-cleaning-101

Data Cleaning Libraries with Python
Jupyter Notebook
279
star
2

python-web-scraping-tutorial

A Python-based web and data scraping tutorial
Python
210
star
3

data-pipelines-course

Course materials for my data pipeline video course with O'Reilly
Jupyter Notebook
194
star
4

data-wrangling-pycon

An Introduction to Data Wrangling with Python
Jupyter Notebook
81
star
5

practical-data-privacy

Practical Data Privacy
Jupyter Notebook
70
star
6

python_flight_search

Using Python to search for flights.
Python
54
star
7

datafuzz

A data science Python library aimed at adding fuzz, noise and other issues to your data for testing purposes.
Python
30
star
8

data-wrangling-video

Code and examples for O'Reilly's Data Wrangling with Python video course
Jupyter Notebook
28
star
9

intro-to-ml

A basic introduction to machine learning (one day training).
Jupyter Notebook
16
star
10

random_hackery

Just little bits.
Jupyter Notebook
10
star
11

europarl_scraper

European Parliament website Python scraper
Jupyter Notebook
9
star
12

uf-data-mining-and-analysis

University of Florida Data Mining and Analysis
Jupyter Notebook
8
star
13

web-scraping-speed-comparison

A Python web scraping speed comparison
Python
6
star
14

uf-intro-to-programming

University of Florida Audience Analytics Introduction to Programming with Data course
HTML
6
star
15

cherrypy-poll

Polling with cherrypy: A beginner's project guide to python programming
Python
6
star
16

kjam-datalab-notebooks

Some Example Jupyter Notebooks using Google's DataLab
4
star
17

cron-parser

Python script that allows you to easily update a server cron that has many different projects without overwriting other crons.
Python
1
star
18

chatbot_scraper

Python scraper(s) for chatbot logs. Currently supports botbot.me logs.
Python
1
star