• Stars
    star
    144
  • Rank 255,590 (Top 6 %)
  • Language
    Python
  • Created about 10 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Blogbar,聚合个人博客。

Blogbar

http://www.blogbar.cc

个人博客之死,就是个人博客之生。

将信息的快速传递交给新兴媒介,让个人博客回归原来的位置:一种信息的雕刻与沉淀的工具。

世界太嘈杂,这里只有个人兆赫

Blogbar,聚合个人博客。

##技术栈

##开发环境搭建

git clone https://github.com/blogbar/blogbar.git
cd blogbar
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
bower install

db/blogbar.sql导入本地数据库。

config/development_sample.py另存为config/development.py,并按需更新配置项。

python manage.py run

##扩展

如果一个博客不提供 Feed,但是这个博客的价值又非常高(比如 Livid王垠Lifesinger等等),可继承爬取博客的爬虫基类 BaseSpider(位于 spiders/base.py)实现,步骤如下:

####类变量赋值

在子类中对如下类变量重新赋值:

url = ""  # 网址
posts_url = ""  # 包含博文列表的网址(选填,只有当博客网址与博文列表网址不同时才需填写)
title = ""  # 博客标题
subtitle = ""  # 博客副标题(选填)
author = ""  # 博主

####重载方法

重载如下 2 个方法:

  • get_posts:获取博文列表
  • get_post:获取单篇博文内容

具体使用方法见 BaseSpider 类,以及用于爬取网页内容的 lxml 库。

####调试

编写过程中如需调试抓取结果,可使用 test_spider.py 提供的测试方法:

  • $ python test_spider.py get_posts
  • $ python test_spider.py get_post
  • $ python test_spider.py all

具体见 test_spider.py

####提交

测试通过后,可发起 pull request。

####示例

以下是爬取 Livid 博客的示例代码:

# coding: utf-8
from .base import BaseSpider, get_inner_html
from datetime import datetime


class LividSpider(BaseSpider):
    url = "http://livid.v2ex.com"
    title = "Livid"
    author = "Livid"

    @staticmethod
    def get_posts(tree):
        posts = []
        for li in tree.cssselect('.posts li'):
            date_element = li.cssselect('span')[0]
            published_at = datetime.strptime(date_element.text_content(), "%d %b %Y")
            link = li.cssselect('a')[0]
            posts.append({
                'url': link.get('href'),
                'title': link.text_content(),
                'published_at': published_at
            })
        return posts

    @staticmethod
    def get_post(tree):
        content_element = tree.cssselect('div.span10')[0]
        return get_inner_html(content_element)

More Repositories

1

Flask-Boost

Flask application generator for boosting your development.
Python
509
star
2

permission

Simple and flexible permission control for Flask app.
Python
102
star
3

1jingdian

[OFFLINE] 每天分享好句子。
Python
87
star
4

jquery-s2t

A jQuery plugin to convert between Simplified Chinese and Traditional Chinese.
JavaScript
71
star
5

react-redux-example

Production ready example of react & redux.
JavaScript
25
star
6

resume

Resume = Jade + YAML.
CSS
18
star
7

yprogrammer

[OFFLINE] High quality resources, for web programmers.
CSS
10
star
8

optico

Website for OPTICO Communication.
HTML
7
star
9

farbox-template-wiki

Farbox template for personal wiki.
JavaScript
6
star
10

react-wechat

Compose post of WeChat Official Account via React.
JavaScript
5
star
11

generator-rr

React+Redux scaffolding generator based on Yeoman.
JavaScript
4
star
12

transy

(暂停维护)A web app helps people translate English to Chinese.
CSS
4
star
13

airdna

品读文献,赞美科学。
CSS
4
star
14

zhongguan

中关村字典
Python
4
star
15

lishengchun

[OFFLINE] 李胜春的国画作品。
JavaScript
3
star
16

beginner

Learning resources for beginners.
3
star
17

ios-cookbook

iOS Cookbook.
2
star
18

InterfaceBuilderX

UIView codes generator for Swift App.
Vue
2
star
19

jenkins-presentation

Presentation about Jenkins.
JavaScript
2
star
20

shufa.io

Chinese Famous Calligraphic.
Python
1
star
21

iOS-Base

Instructions for iOS project.
Objective-C
1
star
22

flask-tips

Tips for Flask Web Development.
1
star
23

Kael

Build UI with the power of Function Builder.
Swift
1
star
24

Observable-Swift-Example

Example for Observable-Swift.
Swift
1
star