• Stars
    star
    620
  • Rank 72,140 (Top 2 %)
  • Language
    Java
  • Created over 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A headless browser as loader microservice in the YaCy Grid

YaCy Grid Component: Loader

The YaCy Grid is the second-generation implementation of YaCy, a peer-to-peer search engine. A YaCy Grid installation consists of a set of micro-services which communicate with each other using the MCP, see https://github.com/yacy/yacy_grid_mcp

Purpose

The Loader is a microservice which can be deployed i.e. using Docker. Each search engine needs a file loader and this component will do that work. The special feature of this loader is it's embedded headless browser which makes it possible to load rich content and provide that content for a search engine.

What it does

When the Loader component is started, it searches for a MCP and connects to it. By default the local host is searched for a MCP but you can configure one yourself.

The Loader will then wait for client requests and performs web loading upon request. It also has a MCP queue listener to react on loading requests in the working queues. After loading of content the loader will push back results to the MCP storage and puts another message on the MCP message queue to process the loaded content.

Installation: Download, Build, Run

At this time, yacy_grid_parser is not provided in compiled form, you easily build it yourself. It's not difficult and done in one minute! The source code is hosted at https://github.com/yacy/yacy_grid_loader, you can download and run it with:

> git clone --recursive https://github.com/yacy/yacy_grid_loader.git

If you just want to make a update, do the following

> git pull origin master
> git submodule foreach git pull origin master

To build and start the loader, run

> cd yacy_grid_loader
> gradle run

Please read also https://github.com/yacy/yacy_grid_mcp/blob/master/README.md for further details.

Contribute

This is a community project and your contribution is welcome!

  1. Check for open issues or open a fresh one to start a discussion around a feature idea or a bug.
  2. Fork the repository on GitHub to start making your changes (branch off of the master branch).
  3. Write a test that shows the bug was fixed or the feature works as expected.
  4. Send a pull request and bug us on Gitter until it gets merged and published. :)

What is the software license?

LGPL 2.1

Have fun!

@0rb1t3r

More Repositories

1

yacy_search_server

Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance
Java
3,077
star
2

yacy_grid_mcp

The YaCy Grid Master Connect Program
Java
654
star
3

cider

"Content Integration Framework: Document Extraction and Retrieval" - A document parser framework that stores parsed entities into jena ( http://jena.sourceforge.net/ ) RDF vocabularies and provides knowledge-base enhanced semantic ananlysis of content. Annotated content can be used by search engines to present content navigation which will be implemented in the YaCy Search Engine
Java
650
star
4

yacy_webclient_bootstrap

YaCy Search Client using bootstrapcss
HTML
639
star
5

yacy_grid_crawler

Crawler Microservice for the YaCy Grid
Java
636
star
6

yacy_grid_parser

Parser Microservice for the YaCy Grid
Java
632
star
7

yacy_search_androidclient

An Android App which searches on a YaCy search server
Java
625
star
8

YaCyBar

A firefox toolbar for YaCy
JavaScript
617
star
9

yacy_webclient_yaml4

A web client for a YaCy search server based on yaml4 css
CSS
615
star
10

yacy_webclient_authentication

Authentication layer for a YaCy webclient
PHP
613
star
11

yacy_artwork

Yacy Artwork
610
star
12

yacy_docs

Documentation Project for Yacy
603
star
13

yacy_grid_framework

Starting Point for a local YaCy Grid
Shell
433
star
14

yacy_forum_archive

A back-up of the YaCy forum which lived until february 2018
HTML
425
star
15

yacy_grid_cluster

Management tools for a Raspberry Pi Demonstration Cluster of the YaCy Grid
JavaScript
287
star
16

yacy_grid_search

Search API and Search Aggregation module for the YaCy Grid
Java
92
star
17

searchlab

Portal for YaCy Grid and Data Science Applications
Java
19
star
18

searchlab_apps

Search Apps for the Searchlab
CSS
9
star