• Stars
    star
    210
  • Rank 187,585 (Top 4 %)
  • Language
    Python
  • Created over 10 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Python-based web and data scraping tutorial

PyCon Introduction to Web and Data Scraping Tutorial

A tutorial-based introduction to web scraping with Python.

Virtual Env

If you'd like to use virtual environments, please follow the following instructions. It is not required for the tutorial but may be helpful.

For more details on virtual environments

If you don't have virtual env wrapper and/or pip:

$ easy_install pip
$ pip install virtualenvwrapper

and read the additional instructions here

$ mkvirtualenv scraper_tutorial
$ pip install -r requirements.txt

LXML and Selenium

You will need both LXML and Selenium to follow this tutorial in it's entirety.

If you are using a Mac, I would highly recommend using Homebrew. It will help make pip install very easy for you to use.

If you are using Windows, it might be worth it to run this within a Linux Virtual Machine. If you are a Windows + Python guru, please follow these installation instructions. I can help as needed but I have not programmed on Windows in more than 5 years.

Please reach out to me if you have any questions on getting the initial requirements set up. Thanks!

Firefox Web Browser

Firefox comes as the default web driver for Selenium. To use Selenium easily, please download and install Firefox.

Using PIP

If you have never used PIP before you will need to sudo easy_install pip or brew install pip. PIP is a python package manager and it's really super so I highly advise using it!

Questions?

/msg kjam on freenode or @kjam on twitter

More Repositories

1

data-cleaning-101

Data Cleaning Libraries with Python
Jupyter Notebook
279
star
2

data-pipelines-course

Course materials for my data pipeline video course with O'Reilly
Jupyter Notebook
194
star
3

wswp

Code for the second edition Web Scraping with Python book by Packt Publications
Python
130
star
4

data-wrangling-pycon

An Introduction to Data Wrangling with Python
Jupyter Notebook
81
star
5

practical-data-privacy

Practical Data Privacy
Jupyter Notebook
70
star
6

python_flight_search

Using Python to search for flights.
Python
54
star
7

datafuzz

A data science Python library aimed at adding fuzz, noise and other issues to your data for testing purposes.
Python
30
star
8

data-wrangling-video

Code and examples for O'Reilly's Data Wrangling with Python video course
Jupyter Notebook
28
star
9

intro-to-ml

A basic introduction to machine learning (one day training).
Jupyter Notebook
16
star
10

random_hackery

Just little bits.
Jupyter Notebook
10
star
11

europarl_scraper

European Parliament website Python scraper
Jupyter Notebook
9
star
12

uf-data-mining-and-analysis

University of Florida Data Mining and Analysis
Jupyter Notebook
8
star
13

web-scraping-speed-comparison

A Python web scraping speed comparison
Python
6
star
14

uf-intro-to-programming

University of Florida Audience Analytics Introduction to Programming with Data course
HTML
6
star
15

cherrypy-poll

Polling with cherrypy: A beginner's project guide to python programming
Python
6
star
16

kjam-datalab-notebooks

Some Example Jupyter Notebooks using Google's DataLab
4
star
17

cron-parser

Python script that allows you to easily update a server cron that has many different projects without overwriting other crons.
Python
1
star
18

chatbot_scraper

Python scraper(s) for chatbot logs. Currently supports botbot.me logs.
Python
1
star