• Stars
    star
    231
  • Rank 173,434 (Top 4 %)
  • Language
    Go
  • License
    GNU Affero Genera...
  • Created almost 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

community search engine

Lieu

an alternative search engine

Created in response to the environs of apathy concerning the use of hypertext search and discovery. In Lieu, the internet is not what is made searchable, but instead one's own neighbourhood. Put differently, Lieu is a neighbourhood search engine, a way for personal webrings to increase serendipitous connexions.

lieu screenshot

Goals

  • Enable serendipitous discovery
  • Support personal communities
  • Be reusable, easily

Usage

How to search

For the full search syntax (including how to use site: and -site:), see the search syntax and API documentation. For more tips, read the appendix.

Getting Lieu running

$ lieu help
Lieu: neighbourhood search engine

Commands
- precrawl  (scrapes config's general.url for a list of links: <li> elements containing an anchor <a> tag)
- crawl     (start crawler, crawls all urls in config's crawler.webring file)
- ingest    (ingest crawled data, generates database)
- search    (interactive cli for searching the database)
- host      (hosts search engine over http)

Example:
    lieu precrawl > data/webring.txt
    lieu crawl > data/crawled.txt
    lieu ingest
    lieu host

Lieu's crawl & precrawl commands output to standard output, for easy inspection of the data. You typically want to redirect their output to the files Lieu reads from, as defined in the config file. See below for a typical workflow.

Workflow

  • Edit the config
  • Add domains to crawl in config.crawler.webring
    • If you have a webpage with links you want to crawl:
    • Set the config's url field to that page
    • Populate the list of domains to crawl with precrawl: lieu precrawl > data/webring.txt
  • Crawl: lieu crawl > data/crawled.txt
  • Create database: lieu ingest
  • Host engine: lieu host

After ingesting the data with lieu ingest, you can also use lieu to search the corpus in the terminal with lieu search.

Theming

Tweak the theme values of the config, specified below.

Config

The config file is written in TOML.

[general]
name = "Merveilles Webring"
# used by the precrawl command and linked to in /about route
url = "https://webring.xxiivv.com"
# used by the precrawl command to populate the Crawler.Webring file;
# takes simple html selectors. might be a bit wonky :)
webringSelector = "li > a[href]:first-of-type"
port = 10001

[theme]
# colors specified in hex (or valid css names) which determine the theme of the lieu instance
# NOTE: If (and only if) all three values are set lieu uses those to generate the file html/assets/theme.css at startup.
# You can also write directly to that file istead of adding this section to your configuration file
foreground = "#ffffff"
background = "#000000"
links = "#ffffff"

[data]
# the source file should contain the crawl command's output 
source = "data/crawled.txt"
# location & name of the sqlite database
database = "data/searchengine.db"
# contains words and phrases disqualifying scraped paragraphs from being presented in search results
heuristics = "data/heuristics.txt"
# aka stopwords, in the search engine biz: https://en.wikipedia.org/wiki/Stop_word
wordlist = "data/wordlist.txt"

[crawler]
# manually curated list of domains, or the output of the precrawl command
webring = "data/webring.txt"
# domains that are banned from being crawled but might originally be part of the webring
bannedDomains = "data/banned-domains.txt"
# file suffixes that are banned from being crawled
bannedSuffixes = "data/banned-suffixes.txt"
# phrases and words which won't be scraped (e.g. if a contained in a link)
boringWords = "data/boring-words.txt"
# domains that won't be output as outgoing links
boringDomains = "data/boring-domains.txt"
# queries to search for finding preview text
previewQueryList = "data/preview-query-list.txt"

For your own use, the following config fields should be customized:

  • name
  • url
  • port
  • source
  • webring
  • bannedDomains

The following config-defined files can stay as-is unless you have specific requirements:

  • database
  • heuristics
  • wordlist
  • bannedSuffixes
  • previewQueryList

For a full rundown of the files and their various jobs, see the files description.

Developing

Build a binary:

# this project has an experimental fulltext-search feature, so we need to include sqlite's fts engine (fts5)
go build --tags fts5
# or using go run
go run --tags fts5 . 

Create new release binaries:

./release.sh

License

Source code AGPL-3.0-or-later, Inter is available under SIL OPEN FONT LICENSE Version 1.1, Noto Serif is licensed as Apache License, Version 2.0.

More Repositories

1

monotome

a personal knowledge base system. markdown markup, runs in the browser
JavaScript
201
star
2

trustnet

a flexible and distributed system for deriving, and interacting with, computational trust
JavaScript
132
star
3

cerca

lean forum software
Go
120
star
4

datradio

p2p music player for {old} beaker and dat
JavaScript
82
star
5

plain

network .md into .html with plaintext files
Go
76
star
6

decent-dat-intro

a buffet of tiny demos for easing in everyone with developing decentralized apps
JavaScript
69
star
7

hyperdb-examples

a small introduction to getting started with hyperdb
JavaScript
53
star
8

paperslip

share hard-to-transmit snippets with easy-to-pronounce names using dht magic
JavaScript
37
star
9

appleseed-metric

a trust propagation algorithm and trust metric for local group trust computation
JavaScript
30
star
10

piratcloud

an ipfs-based encrypted backup solution
Go
20
star
11

sector7557

a peer-to-peer distributed multi-user dungeon, built ontop of cabal
JavaScript
18
star
12

rotonde-choo

an electron client for the decentralized rotonde network
JavaScript
17
star
13

storyteller

a small templating language and python parser for generating small stories
Python
12
star
14

tooty

smol & custom mastodon client, intended solely for posting
Go
11
star
15

caballo

horse themed social network *neighs*
JavaScript
10
star
16

mould

Generate forms + the server that serves 'em using a custom plaintext syntax
Go
9
star
17

hyperdungeon

a distributed mud experiment ontop of hypercore & hyperdb
JavaScript
8
star
18

p2p-oresund

27/10 MalmΓΆ: an offline gathering for peer-to-peer conspirators artists and estranged developers
HTML
6
star
19

hypercore-examples

short examples on how to use hypercore & hyperdiscovery
JavaScript
5
star
20

quackfarm

everyone is welcome to quackfarm πŸ¦†
JavaScript
5
star
21

hyperotonde

save rotonde feeds for use with hashbase.io and datproject.org
JavaScript
4
star
22

cabal-crepes

wip: cabal replication diagnostics
JavaScript
4
star
23

kindling

emailing yourself project gutenberg links and having the books sent to your kindle is nice
Python
4
star
24

diplomat

https://moderator.rocks ui & frontend
JavaScript
3
star
25

kindle-to-md

takes a kindle's My Clippings.txt and outputs markdown files for quotes
Python
3
star
26

highroller

cabal bot dice game: let's get rolling!!!!!
JavaScript
3
star
27

rotonde-greeter

a bot that greets people on rotonde!
JavaScript
3
star
28

colorsampler

a tool for sampling on-screen colors
Python
3
star
29

mpv-control

a collection of scripts to control the mpv media player from the terminal
Shell
3
star
30

alarmPy

a google calendar powered alarm clock, running on raspberry pi
Python
2
star
31

cabal-dns

fork of dat-dns for cabal
JavaScript
2
star
32

rotonde-mentions

a mentions service bot for the rotonde network
JavaScript
2
star
33

prompts

read a textfile of prompts and import into anki via ankiconnect
Python
2
star
34

no-network

a list of networks that don't work with peer to peer protocols
2
star
35

cli-manual

parse markdown input, making it more readable in the terminal
JavaScript
2
star
36

dtn-presentation

HTML
2
star
37

36c3-shitty

a shitty 36c3 calendar app, based on watson's excellent 35c3 app
JavaScript
2
star
38

trustnet-graph

Underlying data structure for TrustNet (not really that interesting)
JavaScript
1
star
39

rotonde-cli

a command line client for rotonde
JavaScript
1
star
40

rotonde-python

another rotonde cli, this time in python
Python
1
star
41

solar-player

a python script for controlling an mpv player using GPIO inputs
Python
1
star