Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Perl

TypeScript

Scala

Zig

Swift

R

CoffeeScript

Nix

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Go

Python

F#

Elixir

Ruby

Kotlin

Objective-C

C++

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇵🇬 Papua New Guinea

🇯🇴 Jordan

🇲🇹 Malta

🇰🇿 Kazakhstan

🇮🇱 Israel

🇦🇲 Armenia

🇰🇾 Cayman Islands

🇭🇹 Haiti

All Countries Compare Countries

niklasb/dryscrape

This repository has been archived on 10/Dec/2018
Stars
532
Rank 83,377 (Top 2 %)
Language
Python
License
MIT License
Created almost 13 years ago
Updated about 7 years ago

niklasb/dryscrape

niklasb

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

[not actively maintained] A lightweight Python library that uses Webkit to enable easy scraping of dynamic, Javascript-heavy web pages

NOTE: This package is not actively maintained. It uses QtWebkit, which is end-of-life and probably doesn't get security fixes backported. Consider using a similar package like Spynner instead.

Overview

Author: Niklas Baumstark

dryscrape is a lightweight web scraping library for Python. It uses a headless Webkit instance to evaluate Javascript on the visited pages. This enables painless scraping of plain web pages as well as Javascript-heavy “Web 2.0” applications like Facebook.

It is built on the shoulders of capybara-webkit's webkit-server. A big thanks goes to thoughtbot, inc. for building this excellent piece of software!

Changelog

1.0: Added Python 3 support, small performance fixes, header names are now properly normalized. Also added the function dryscrape.start_xvfb() to easily start Xvfb.
0.9.1: Changed semantics of the headers function in a backwards-incompatible way: It now returns a list of (key, value) pairs instead of a dictionary.

Supported Platforms

The library has been confirmed to work on the following platforms:

Mac OS X 10.9 Mavericks and 10.10 Yosemite
Ubuntu Linux
Arch Linux

Other unixoid systems should work just fine.

Windows is not officially supported, although dryscrape should work with cygwin.

A word about Qt 5.6

The 5.6 version of Qt removes the Qt WebKit module in favor of the new module Qt WebEngine. So far webkit-server has not been ported to WebEngine (and likely won't be in the near future), so Qt <= 5.5 is a requirement.

Installation, Usage, API Docs

Documentation can be found at dryscrape's ReadTheDocs page.

Quick installation instruction for Ubuntu:

# apt-get install qt5-default libqt5webkit5-dev build-essential python-lxml python-pip xvfb
# pip install dryscrape

Contact, Bugs, Contributions

If you have any problems with this software, don't hesitate to open an issue on Github or open a pull request or write a mail to niklas baumstark at Gmail.

libc-database

Build a database of libc offsets to simplify exploitation

3dpwn

VirtualBox 3D exploits & PoCs

sploits

hack2win-chrome

This is collaborative work of Ned Williamson and Niklas Baumstark

contest-algos

bspfuzz

35c3ctf-challs

elgoog

elgoog/searchme challenge from 34C3 CTF / WCTF 2018: sources & exploit

webkit-server

[not actively maintained] The C++ webkit-server from capybara-webkit with useful extensions and Python bindings

memfuzzing

Memory fuzzing based on sinn3r's In Memory Fuzzer

ruby-dynamic-binding

Implements a flexible form of dynamic binding to Ruby which allows to run a Proc inside a custom name lookup context

bingrep

A small utility to grep for pointers & binary data in memory dumps / live process memory

34c3ctf-sols

Solutions for my 34C3CTF challenges

dump-seccomp

GDB plugin to dump SECCOMP rules set via prctnl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER)

tcr

ICPC team contest reference of German team hacKIT

ctf-tools

rpi-qemu

haskell-brainfuck

BF interpreter written in Haskell as a small exercise

ub-to-rce

kitbot

Yet another, minimalistic IRC bot

rubyfun

33c3ctf-mario

Source for mario challenge from 33C3 CTF

apache-ssl-key-extract

Modification of passe-partout utility (http://www.hsc.fr/ressources/outils/passe-partout/) to read memory from files instead of relying on ptrace

codingpad-ideone

A modification of the excellent Codingpad Chrome extension by Felix Kling that uses ideone.com as a backend instead of codepad.org.

33c3ctf-coercive

code and exploit for 33C3 CTF task 'coercive'

pbbs-maxflow

gdbinit

mona

Corelan Repository for mona.py

save-the-robot

boxes

Stuff to manage virtual machines

arch-initramfs-dropbear-decrypt

mkinitcpio hooks for Arch Linux to unlock encrypted partitions on boot via remote login

chrome-builds

linux-syscalls

Create tables to get an overview over system calls numbers and signatures for x86 and x86-64

haskell-soy

A Haskell implementation of Google's Closure Templates

vvz-ssh

An SSH tunnel for VVZ

niklasb.github.com

My github pages

sudoku-pdf

A set of scripts to generate Sudoku puzzles and write them to a PDF

lz-index

Implementation of an LZ index based on the SDSL library

sslutils

Some helpful(?) stuff for working with CAs

winhook

crhash

A customizable hash brute forcer

ctf-glicko2

Source code for Glicko-2 rating app for CTF teams 2016.

ida-colors

gpuc-rainbow

test

webgdb

dotfiles

linux-config

xkcd-hash

faustctf-vpn-gateway

VPN setup used for FaustCTF

vimrc

My vimrc (loosely based on https://github.com/nvie/vimrc)

contest-tasks-webapp

Hosted at http://dtun.de/tasks/

linux-notes

Notes for Linux stuff

random-scripts