• Stars
    star
    264
  • Rank 154,230 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 9 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Easier wrangling of web data.

Soupy

Latest Version License Build Status Coverage Status

Soupy is a wrapper around BeautifulSoup that makes it easier to build complex queries when wrangling web data.

Here's an example of a Soupy query.

from soupy import Soupy, Q

html = """
<div id="main">
  <div>The web is messy</div>
  and full of traps
  <div>but Soupy loves you</div>
</div>"""

print(Soupy(html).find(id='main').children
      .each(Q.text.strip()) # extract text from each node, trim whitespace
      .filter(len)          # remove empty strings
      .val())               # dump out of Soupy

# ['The web is messy', 'and full of traps', 'but Soupy loves you']

The same query using BeautifulSoup:

 from bs4 import BeautifulSoup, NavigableString

html = """
<div id="main">
  <div>The web is messy</div>
  and full of traps
  <div>but Soupy loves you</div>
</div>"""

result = []
for node in BeautifulSoup(html).find(id='main').children:
    if isinstance(node, NavigableString):
        text = node.strip()
    else:
        text = node.text.strip()
    if len(text):
        result.append(text)

print(result)

For more information, see the Soupy Documentation

Installation

pip install soupy

Dependencies

six and BeautifulSoup4

Soupy is supported on Python 2.6+ and 3.3+

More Repositories

1

holdme

A fast Cython Texas Hold'em odds library
C
55
star
2

smother

coverage, just moreso
Python
43
star
3

mpl-modest-image

Friendlier matplotlib interaction with large images
Python
15
star
4

crime

Demo exploration of FBI crime statistics using Glue and plotly
CSS
15
star
5

mplfacet

Faceted plots in matplotlib. Because nature abhors boilerplate code.
Python
11
star
6

brut

Machine Learning on the Milky Way Project DR1
Python
11
star
7

oni

Balance Calculator for Oxygen Not Included
Python
9
star
8

mplgl

Investigating an OpenGL backend for Matplotlib
Python
9
star
9

beaumont-idl-library

Collection of IDL astrophysically-oriented routines. Written and maintained by Chris Beaumont
IDL
8
star
10

plotornot

So plot. right now.
Python
7
star
11

toasty

Library to build WorldWide Telescope TOAST tiles
Python
6
star
12

DendroDocs

Documentation for Making, Visualizing, and Analyzing Dendrograms
Python
3
star
13

IDLdep

Find dependencies in IDL code
Prolog
3
star
14

Euler

Python Solutions to the Project Euler Problems
Python
3
star
15

Dendro

Fast Dendrograms in C++
Shell
2
star
16

mpl-decompile

The Matplotlib Decompiler
Python
2
star
17

joblog

Job management for scikit learn classifiers
Python
2
star
18

pydata_talk

1
star
19

dendro_idl

Fork of Erik Rosolowsky's IDL dendrogram code
Prolog
1
star
20

glue-druid

Demo of druid data access in glue
Python
1
star
21

The-Debris-Problem

Python code to solve the 2012 IACS computational challenge
Python
1
star
22

glue_datacon_talk

Material for Glue talk at Boston DataCon
Python
1
star
23

scidbpy-aflgen

Generate data for SciDBpy AFL bindings
Python
1
star
24

retire

Self-optimizing retirement strategies using utility theory
Jupyter Notebook
1
star
25

conda-pyside-tools

Conda build recipe for pyside-tools
Shell
1
star
26

thesis

We're done here, school
1
star