• This repository has been archived on 29/Jun/2022
  • Stars
    star
    356
  • Rank 115,034 (Top 3 %)
  • Language
    Clojure
  • Created about 11 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Stream data into ES (Wikipedia, Twitter, stdin, or other ESes)

stream2es

Standalone utility to stream different inputs into Elasticsearch.

Read This First

If you've just wandered here, first check out Logstash. It's a much more general tool, and one of our featured products. If for some reason it doesn't do something that's important to you, create an issue there. stream2es is a dev tool that originated before the author knew much about Logstash. That said, there are some important differences that are specific to Elasticsearch. stream2es supports bulks by byte-length (--bulk-bytes) instead of doc count, which is crucial with docs of varying size. It also supports exporting raw bulks via --tee-bulk to a hashed dir on the filesystem, and you can make the incoming stream finite with --max-docs.

Install

You'll need Java 8+. Run java -version to make sure.

Unix

Download stream2es and make it executable:

% curl -O download.elasticsearch.org/stream2es/stream2es; chmod +x stream2es

Windows

> curl -O download.elasticsearch.org/stream2es/stream2es
> java -jar stream2es help

Usage

stdin

By default, stream2es reads JSON documents from stdin.

% echo '{"f":1}' | stream2es
2014-10-08T12:29:56.318-0500 INFO  00:00.116 8.6d/s 0.4K/s (0.0mb) indexed 1 streamed 1 errors 0
%

If you want more logging, set --log debug. If you don't want any output, set --log warn.

Wikipedia

Index the latest Wikipedia article dump.

% stream2es wiki --target http://localhost:9200/tmp --log debug
create index http://localhost:9200/tmp
stream wiki from http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
^Cstreamed 1158 docs 1082 bytes xfer 15906901 errors 0

If you're at a café or want to use a local copy of the dump, supply --source:

% ./stream2es wiki --max-docs 5 --source /d/data/enwiki-20121201-pages-articles.xml.bz2

Note that if you live-stream the WMF-hosted dump, it will cut off after a while. Grab a torrent and index it locally if you need more than a few thousand docs.

Generator

stream2es can fuzz data for you. It can create blank documents, or documents with integer fields, or documents with string fields if you supply a dictionary.

Blank documents are easy:

stream2es generator

Ints need to know how big you want them. This template would give you a single field with values between 0 and 127, inclusive.

stream2es generator --fields f1:int:128

To add a string, we need to add a template for it, and a file of newline-separated lines of text. Given a field template of NAME:str:N, stream2es will select N random words from the dictionary for each field.

# zsh
% stream2es generator --fields f1:int:128,f2:str:2 --dictionary <(/bin/echo -e "foo\nbar\nbaz")
#### same as:
% stream2es generator --fields f1:int:128,f2:str:2 --dictionary /dev/stdin --max-docs 5 <<EOF
foo
bar
baz
EOF
% curl -s localhost:9200/foo/_search\?format=yaml | fgrep -A2 _source
    _source:
      f1: 28
      f2: "foo baz"
--
    _source:
      f1: 88
      f2: "baz foo"
--
    _source:
      f1: 26
      f2: "baz baz"
--
    _source:
      f1: 68
      f2: "bar baz"
--
    _source:
      f1: 64
      f2: "foo foo"
%

Fortunately, most *nix systems come with /usr/share/dict/words (Ubuntu package wamerican-small, for example), which is a great choice if you just need some (English) text. Install other langs if you prefer.

Elasticsearch

Note: ES 2.3 added a reindex API that completely obviates this feature of stream2es. Also, Logstash 1.5.0 has an Elasticsearch input.

If you use the es stream, you can copy indices from one Elasticsearch to another. Example:

% stream2es es \
     --source http://foo.local:9200/wiki \
     --target http://bar.local:9200/wiki2

This is a convenient way to reindex data if you need to change the number of shards or update your mapping.

Twitter

In order to stream Twitter, you have to create an app and authorize it.

Create app

Visit (https://dev.twitter.com/apps/new) and create an app. Call it stream2es. Note the Consumer key and Consumer secret.

Authorize app

Now run stream2es twitter --authorize --key CONSUMER_KEY --secret CONSUMER_SECRET and complete the dialog.

Run with new creds

You should now be able to stream twitter with simply stream2es twitter. stream2es will grab the most recent cached credentials from ~/.authinfo.stream2es.

Tracking keywords

By default, stream2es will stream random sample of all public tweets, however you can configure stream2es to track specific keywords as follows:

	stream2es twitter --track "Linux%%New York%%March Madness"

Options

% stream2es --help
Copyright 2013 Elasticsearch

Usage: stream2es [CMD] [OPTS]

..........

You can change index settings by supplying --settings:

% echo '{"name":"alfredo"}' | ./stream2es stdin --settings '
{
    "settings": {
        "refresh_interval": "2m"
    }
}'

Contributing

stream2es is written in Clojure. You'll need leiningen 2.0+ to build.

% lein bin
% target/stream2es

You'll also need this little git alias if you want to do make:

[alias]
ver = "!git log --pretty=format:'%ai %h' -1 | perl -pe 's,(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d) (\\d\\d):(\\d\\d):(\\d\\d) [^ ]+ ([a-z0-9]+),\\1\\2\\3\\7,'"

License

This software is licensed under the Apache 2 license, quoted below.

Copyright 2009-2013 Elasticsearch <http://www.elasticsearch.org>

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

More Repositories

1

elasticsearch

Free and Open, Distributed, RESTful Search Engine
Java
65,029
star
2

kibana

Your window into the Elastic Stack
TypeScript
19,124
star
3

logstash

Logstash - transport and process your logs, events, or other data
Java
13,615
star
4

beats

🐠 Beats - Lightweight shippers for Elasticsearch & Logstash
Go
11,967
star
5

elasticsearch-php

Official PHP client for Elasticsearch.
PHP
5,190
star
6

elasticsearch-js

Official Elasticsearch client library for Node.js
TypeScript
5,174
star
7

go-elasticsearch

The official Go client for Elasticsearch
Go
4,933
star
8

elasticsearch-py

Official Python client for Elasticsearch
Python
4,034
star
9

elasticsearch-dsl-py

High level Python client for Elasticsearch
Python
3,695
star
10

elasticsearch-definitive-guide

The Definitive Guide to Elasticsearch
HTML
3,521
star
11

elasticsearch-net

This strongly-typed, client library enables working with Elasticsearch. It is the official client maintained and supported by Elastic.
C#
3,469
star
12

curator

Curator: Tending your Elasticsearch indices
Python
3,020
star
13

elasticsearch-rails

Elasticsearch integrations for ActiveModel/Record and Ruby on Rails
Ruby
3,017
star
14

examples

Home for Elasticsearch examples available to everyone. It's a great way to get started.
Jupyter Notebook
2,587
star
15

cloud-on-k8s

Elastic Cloud on Kubernetes
Go
2,461
star
16

elasticsearch-ruby

Ruby integrations for Elasticsearch
Ruby
1,928
star
17

elasticsearch-hadoop

🐘 Elasticsearch real-time search and analytics natively integrated with Hadoop
Java
1,915
star
18

helm-charts

You know, for Kubernetes
Python
1,807
star
19

search-ui

Search UI. Libraries for the fast development of modern, engaging search experiences.
TypeScript
1,796
star
20

logstash-forwarder

An experiment to cut logs in preparation for processing elsewhere. Replaced by Filebeat: https://github.com/elastic/beats/tree/master/filebeat
Go
1,788
star
21

detection-rules

Python
1,751
star
22

ansible-elasticsearch

Ansible playbook for Elasticsearch
Ruby
1,567
star
23

otel-profiling-agent

The production-scale datacenter profiler
Go
1,231
star
24

stack-docker

Project no longer maintained.
Shell
1,189
star
25

apm-server

APM Server
Go
1,100
star
26

ecs

Elastic Common Schema
Python
920
star
27

protections-artifacts

Elastic Security detection content for Endpoint
YARA
848
star
28

ember

Elastic Malware Benchmark for Empowering Researchers
Jupyter Notebook
799
star
29

elasticsearch-docker

Official Elasticsearch Docker image
Python
790
star
30

elasticsearch-rs

Official Elasticsearch Rust Client
Rust
612
star
31

elasticsearch-cloud-aws

AWS Cloud Plugin for Elasticsearch
580
star
32

apm-agent-dotnet

Elastic APM .NET Agent
C#
540
star
33

apm-agent-nodejs

Elastic APM Node.js Agent
JavaScript
540
star
34

apm-agent-java

Elastic APM Java Agent
Java
536
star
35

eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Python
516
star
36

elasticsearch-mapper-attachments

Mapper Attachments Type plugin for Elasticsearch
Java
503
star
37

elasticsearch-servicewrapper

A service wrapper on top of elasticsearch
Shell
489
star
38

apm-agent-go

Official Go agent for Elastic APM
Go
390
star
39

sense

A JSON aware developer's interface to Elasticsearch. Comes with handy machinery such as syntax highlighting, autocomplete, formatting and code folding.
JavaScript
382
star
40

apm-agent-python

Official Python agent for Elastic APM
Python
381
star
41

elastic-charts

📊 Elastic Charts library
TypeScript
362
star
42

timelion

Timelion was absorbed into Kibana 5. Don't use this. Time series composer for Elasticsearch and beyond.
JavaScript
347
star
43

elasticsearch-labs

Notebooks & Example Apps for Search & AI Applications with Elasticsearch
Jupyter Notebook
341
star
44

apm

Elastic Application Performance Monitoring - resources and general issue tracking for Elastic APM.
Gherkin
317
star
45

elasticsearch-net-example

A tutorial repository for Elasticsearch and NEST
305
star
46

elasticsearch-migration

This plugin will help you to check whether you can upgrade directly to the next major version of Elasticsearch, or whether you need to make changes to your data and cluster before doing so.
291
star
47

logstash-docker

Official Logstash Docker image
Python
286
star
48

elasticsearch-py-async

Backend for elasticsearch-py based on python's asyncio module.
Python
283
star
49

support-diagnostics

Support diagnostics utility for elasticsearch and logstash
Java
278
star
50

elasticsearch-java

Official Elasticsearch Java Client
Java
274
star
51

es2unix

Command-line ES
Clojure
274
star
52

elasticsearch-analysis-smartcn

Smart Chinese Analysis Plugin for Elasticsearch
268
star
53

dockerfiles

Dockerfiles for the official Elastic Stack images
Shell
253
star
54

go-sysinfo

go-sysinfo is a library for collecting system information.
Go
249
star
55

kibana-docker

Official Kibana Docker image
Python
243
star
56

elasticsearch-metrics-reporter-java

Metrics reporter, which reports to elasticsearch
Java
232
star
57

apm-agent-php

Elastic APM PHP Agent
PHP
229
star
58

docs

Ruby
229
star
59

elasticsearch-river-twitter

Twitter River Plugin for elasticsearch (STOPPED)
Java
202
star
60

elasticsearch-formal-models

Formal models of core Elasticsearch algorithms
Isabelle
200
star
61

rally-tracks

Track specifications for the Elasticsearch benchmarking tool Rally
Python
197
star
62

beats-dashboards

DEPRECATED. Moved to https://github.com/elastic/beats. Please use the new repository to add new issues.
Shell
192
star
63

elasticsearch-analysis-icu

ICU Analysis plugin for Elasticsearch
189
star
64

elasticsearch-river-rabbitmq

RabbitMQ River Plugin for elasticsearch (STOPPED)
Java
173
star
65

elasticsearch-analysis-kuromoji

Japanese (kuromoji) Analysis Plugin
168
star
66

terraform-provider-ec

Terraform provider for the Elasticsearch Service and Elastic Cloud Enterprise
Go
165
star
67

beats-docker

Official Beats Docker images
Python
165
star
68

elasticsearch-river-couchdb

CouchDB River Plugin for elasticsearch (STOPPED)
Java
163
star
69

apm-agent-ruby

Elastic APM agent for Ruby
Ruby
156
star
70

integrations

Elastic Integrations
Handlebars
155
star
71

require-in-the-middle

Module to hook into the Node.js require function
JavaScript
149
star
72

harp

Secret management by contract toolchain
Go
143
star
73

dorothy

Dorothy is a tool to test security monitoring and detection for Okta environments
Python
141
star
74

ml-cpp

Machine learning C++ code
C++
139
star
75

ecs-logging-java

Centralized logging for Java applications with the Elastic stack made easy
Java
137
star
76

SWAT

Simple Workspace Attack Tool (SWAT) is a tool for simulating malicious behavior against Google Workspace in reference to the MITRE ATT&CK framework.
Python
135
star
77

go-libaudit

go-libaudit is a library for communicating with the Linux Audit Framework.
Go
133
star
78

ansible-beats

Ansible Beats Role
Ruby
131
star
79

logstash-contrib

THIS REPOSITORY IS NO LONGER USED.
Ruby
128
star
80

elasticsearch-analysis-phonetic

Phonetic Analysis Plugin for Elasticsearch
127
star
81

azure-marketplace

Elasticsearch Azure Marketplace offering + ARM template
Shell
122
star
82

bpfcov

Source-code based coverage for eBPF programs actually running in the Linux kernel
C
115
star
83

anonymize-it

a general utility for anonymizing data
Python
114
star
84

windows-installers

Windows installers for the Elastic stack
C#
113
star
85

terraform-provider-elasticstack

Terraform provider for Elastic Stack
Go
111
star
86

makelogs

JavaScript
108
star
87

golang-crossbuild

Shell
107
star
88

elasticsearch-lang-python

Python language Plugin for elasticsearch
104
star
89

elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Go
102
star
90

go-freelru

GC-less, fast and generic LRU hashmap library for Go
Go
101
star
91

elasticsearch-lang-javascript

JavaScript language Plugin for elasticsearch
93
star
92

stack-docs

Elastic Stack Documentation
Java
92
star
93

elasticsearch-specification

Elasticsearch full specification
TypeScript
89
star
94

elasticsearch-perl

Official Perl low-level client for Elasticsearch.
Perl
87
star
95

next-eui-starter

Start building Kibana protoypes quickly with the Next.js EUI Starter
TypeScript
87
star
96

vue-search-ui-demo

A demo of implementing Elastic's Search UI and App Search using Vue.js
Vue
87
star
97

elasticsearch-transport-thrift

Thrift Transport for elasticsearch (STOPPED)
Java
84
star
98

ecs-dotnet

.NET integrations that use the Elastic Common Schema (ECS)
HTML
82
star
99

generator-kibana-plugin

DEPRECATED Yeoman Generator for Kibana Plugins, please use https://github.com/elastic/template-kibana-plugin/
JavaScript
79
star
100

hipio

A DNS server that parses a domain for an IPv4 Address
Haskell
76
star