• Stars
    star
    445
  • Rank 98,085 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 5 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

RSS feed aggregator with collections and NLP article summarization

Infomate.club

Build Status

Infomate is a small web service that shows multiple RSS sources on one page and performs tricky parsing and summarizing articles using TextRank algorithm.

It helps to keep track of news from different areas without subscribing to hundreds of media accounts and getting annoying notifications.

Thematic and people-based collections does a really good job for discovery of new sources of information. Since we all are biased, such compilations can really help us to get out of information bubbles.

Live URL: infomate.club

🐶 This is a pet-project

Which means you really shouldn't expect much from it. I wrote the MVP over the weekend to solve my own pain. No state-of-art kubernetes bullshit, no architecture patterns, even no tests at all. It's here just to show people what a pet-project might look like.

This code has been written for fun, not for business. There is usually a big difference.

🤔 How it works

It's basically a Django web app with a bunch of scripts for RSS parsing. It stores the parsed data in a PostgreSQL database.

The web app is only used to show the data (with heavy caching). Parsing and feed updates are performed by the three scripts running in cron. Like poor people do.

Feedparser and BeautifulSoup are used to find, download and parse RSS.

Text summarization is done via newspaper3k with some additional protection against bad types of content like podcasts and too big pages in general, which can eat all your memory. Anything can happen in the RSS world :)

▶️ Running it locally

The easy way. Install docker on your machine. Then:

git clone [email protected]:vas3k/infomate.club.git
cd infomate.club
docker-compose up --build

On the first run you might need to wait until the "migrate_and_init" container will finish its job populating your database. After that you can open localhost:8000 in your favorite browser and enjoy.

If something stucked or you want to terminate it completely, use this command in another terminal:

docker-compose down --remove-orphans

⚙️ boards.yml format

All collections and feeds are stored in one file — boards.yml. This is your main and only entry point to add new stuff.

boards:
- name: Tech            # board title
  slug: tech            # board url
  is_visible: true      # visibility on the main page
  is_private: false     # private boards require logging in
  curator:              # board author profile
    name: John Wick 
    title: Main news
    avatar: https://i.vas3k.ru/fhr.png 
    bio: Major technology media in English and Russian
    footer: >
      this is a general selection of popular technology media.
      The page is updated once per hour.
  blocks:               # list of logical feed blocks
  - name: English       # block title
    slug: en            # unique board id
    feeds:         
      - name: Hacker News
        url: https://news.ycombinator.com
        rss: https://news.ycombinator.com/rss
      - name: dev.to
        url: https://dev.to
        rss: https://dev.to/feed
      - name: TechCrunch
        rss: http://feeds.feedburner.com/TechCrunch/
        url: https://techcrunch.com
        is_parsable: false  # do not try to parse pages, show RSS content only
        conditions:
          - type: not_in
            field: title
            word: Trump   # exclude articles with a word "Trump" in title

💎 Running in production

Deployment is done using a simple Github Action which builds a docker container, puts it into Github Registry, logs into your server via SSH and pulls it. The pipeline is triggered on every push to master branch. If you want to set up your own fork, please add these constants to your repo SECRETS:

APP_HOST — e.g. "https://your.host.com"
GHCR_TOKEN — your personal guthib access token with permissions to read/write into Github Registry
SECRET_KEY — random string for django stuff (not really used)
SENTRY_DSN — if you want to use Sentry
PRODUCTION_SSH_HOST — hostname or IP of your server
PRODUCTION_SSH_USERNAME — user which can deploy to your server
PRODUCTION_SSH_KEY — private key for this user

After you install them all and commit something to the master, the action should run and deploy it to your server on port 8816.

Don't forget to set up nginx as a proxy for that app (add SSL and everything else in there). Here's example config for that: etc/nginx/infomate.club.conf

If something doesn't work, check the action itself: .github/workflows/deploy.yml

🎉 Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

You can help us with opened issues too. There's always something to work on.

We don't have any strict rules on formatting, just explain your motivation and the changes you've made to the PR description so that others understand what's going on.

👩‍💼 License

Apache 2.0 © Vasily Zubarev

TL;DR: you can modify, distribute and use it commercially, but you MUST reference the original author or give a link to service

More Repositories

1

btt-touchbar-presets

BetterTouchTool Touch Bar Presets
AppleScript
1,837
star
2

vas3k.club

No bullshit IT community with private membership
Python
809
star
3

GodMode2

Semi-automatic admin site generator for any SQL database
CSS
272
star
4

python-glr-parser

Попытка сделать свой GLR-парсер для русского языка на Python
Python
141
star
5

vas3k.blog

My blog codebase
Python
124
star
6

home-assistant-berlin-transport

Berlin (BVG) and Brandenburg (VBB) transport widget for Home Assistant
Python
114
star
7

pepic

Image and video proxy for my pet-projects
Go
90
star
8

stuff

Jupyter Notebook
44
star
9

i.vas3k.ru

One-page script for easy uploading, resizing and inserting pictures to blog
Python
17
star
10

poor-python-yandex-tomita-parser

Простая обертка на языке Python для яндексового Tomita Parser'а (больше не нужна, Яндекс открыл исходники)
Python
17
star
11

player.vas3k.ru

Open online music player with last.fm scrobbling
JavaScript
17
star
12

GodMode

Automatic admin interface for MySQL, PostgreSQL, MongoDB, etc
CSS
15
star
13

valyrics

Spotify and iTunes lyrics widget for Notification Center
Swift
13
star
14

lovelace-berlin-transport-card

Lovelace card for https://github.com/vas3k/home-assistant-berlin-transport
11
star
15

NGTMap

iOS-приложение для мониторинга общественного транспорта Новосибирска
Objective-C
7
star
16

geektool-gismeteo-parser

Parse gismeteo.ru and display current weather in you GeekTool app
Python
5
star
17

vas3k

Jupyter Notebook
4
star
18

FormValidator

Pure client-side javascript form validator
JavaScript
4
star
19

l100n

Unusual JavaScript library for pure client-side localization by css-selectors
JavaScript
3
star
20

loggy

Неудавшаяся попытка сделать Sentry. Сдохла.
Python
1
star