• Stars
    star
    126
  • Rank 283,145 (Top 6 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created about 8 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Kafka Connect connector to stream data in real time from Twitter.

Introduction

This connector uses the twitter streaming api to listen for status update messages and convert them to a Kafka Connect struct on the fly. The goal is to match as much of the Twitter Status object as possible.

Configuration

TwitterSourceConnector

This Twitter Source connector is used to pull data from Twitter in realtime.

name=connector1
tasks.max=1
connector.class=com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector

# Set these required values
twitter.oauth.accessTokenSecret=
process.deletes=
filter.keywords=
kafka.status.topic=
kafka.delete.topic=
twitter.oauth.consumerSecret=
twitter.oauth.accessToken=
twitter.oauth.consumerKey=
Name Description Type Default Valid Values Importance
filter.keywords Twitter keywords to filter for. list high
filter.userIds Twitter user IDs to follow. list "" low
kafka.delete.topic Kafka topic to write delete events to. string high
kafka.status.topic Kafka topic to write the statuses to. string high
process.deletes Should this connector process deletes. boolean high
twitter.oauth.accessToken OAuth access token password high
twitter.oauth.accessTokenSecret OAuth access token secret password high
twitter.oauth.consumerKey OAuth consumer key password high
twitter.oauth.consumerSecret OAuth consumer secret password high
twitter.debug Flag to enable debug logging for the twitter api. boolean false low

Schemas

com.github.jcustenborder.kafka.connect.twitter.Place

Returns the place attached to this status

Name Optional Schema Default Value Documentation
Name true String
StreetAddress true String
CountryCode true String
Id true String
Country true String
PlaceType true String
URL true String
FullName true String

com.github.jcustenborder.kafka.connect.twitter.GeoLocation

Returns The location that this tweet refers to if available.

Name Optional Schema Default Value Documentation
Latitude false Float64 returns the latitude of the geo location
Longitude false Float64 returns the longitude of the geo location

com.github.jcustenborder.kafka.connect.twitter.StatusDeletionNotice

Message that is received when a status is deleted from Twitter.

Name Optional Schema Default Value Documentation
StatusId false Int64
UserId false Int64

com.github.jcustenborder.kafka.connect.twitter.StatusDeletionNoticeKey

Key for a message that is received when a status is deleted from Twitter.

Name Optional Schema Default Value Documentation
StatusId false Int64

com.github.jcustenborder.kafka.connect.twitter.StatusKey

Key for a twitter status.

Name Optional Schema Default Value Documentation
Id true Int64

com.github.jcustenborder.kafka.connect.twitter.Status

Twitter status message.

Name Optional Schema Default Value Documentation
CreatedAt true Timestamp Return the created_at
Id true Int64 Returns the id of the status
Text true String Returns the text of the status
Source true String Returns the source
Truncated true Boolean Test if the status is truncated
InReplyToStatusId true Int64 Returns the in_reply_tostatus_id
InReplyToUserId true Int64 Returns the in_reply_user_id
InReplyToScreenName true String Returns the in_reply_to_screen_name
GeoLocation true com.github.jcustenborder.kafka.connect.twitter.GeoLocation Returns The location that this tweet refers to if available.
Place true com.github.jcustenborder.kafka.connect.twitter.Place Returns the place attached to this status
Favorited true Boolean Test if the status is favorited
Retweeted true Boolean Test if the status is retweeted
FavoriteCount true Int32 Indicates approximately how many times this Tweet has been "favorited" by Twitter users.
User false com.github.jcustenborder.kafka.connect.twitter.User Return the user associated with the status.
This can be null if the instance is from User.getStatus().
Retweet true Boolean
Contributors false Array of Int64 Returns an array of contributors, or null if no contributor is associated with this status.
RetweetCount true Int32 Returns the number of times this tweet has been retweeted, or -1 when the tweet was created before this feature was enabled.
RetweetedByMe true Boolean
CurrentUserRetweetId true Int64 Returns the authenticating user's retweet's id of this tweet, or -1L when the tweet was created before this feature was enabled.
PossiblySensitive true Boolean
Lang true String Returns the lang of the status text if available.
WithheldInCountries false Array of String Returns the list of country codes where the tweet is withheld
HashtagEntities true Array of com.github.jcustenborder.kafka.connect.twitter.HashtagEntity Returns an array if hashtag mentioned in the tweet.
UserMentionEntities true Array of com.github.jcustenborder.kafka.connect.twitter.UserMentionEntity Returns an array of user mentions in the tweet.
MediaEntities true Array of com.github.jcustenborder.kafka.connect.twitter.MediaEntity Returns an array of MediaEntities if medias are available in the tweet.
SymbolEntities true Array of com.github.jcustenborder.kafka.connect.twitter.SymbolEntity Returns an array of SymbolEntities if medias are available in the tweet.
URLEntities true Array of com.github.jcustenborder.kafka.connect.twitter.URLEntity Returns an array if URLEntity mentioned in the tweet.

com.github.jcustenborder.kafka.connect.twitter.User

Return the user associated with the status. This can be null if the instance is from User.getStatus().

Name Optional Schema Default Value Documentation
Id true Int64 Returns the id of the user
Name true String Returns the name of the user
ScreenName true String Returns the screen name of the user
Location true String Returns the location of the user
Description true String Returns the description of the user
ContributorsEnabled true Boolean Tests if the user is enabling contributors
ProfileImageURL true String Returns the profile image url of the user
BiggerProfileImageURL true String
MiniProfileImageURL true String
OriginalProfileImageURL true String
ProfileImageURLHttps true String
BiggerProfileImageURLHttps true String
MiniProfileImageURLHttps true String
OriginalProfileImageURLHttps true String
DefaultProfileImage true Boolean Tests if the user has not uploaded their own avatar
URL true String Returns the url of the user
Protected true Boolean Test if the user status is protected
FollowersCount true Int32 Returns the number of followers
ProfileBackgroundColor true String
ProfileTextColor true String
ProfileLinkColor true String
ProfileSidebarFillColor true String
ProfileSidebarBorderColor true String
ProfileUseBackgroundImage true Boolean
DefaultProfile true Boolean Tests if the user has not altered the theme or background
ShowAllInlineMedia true Boolean
FriendsCount true Int32 Returns the number of users the user follows (AKA "followings")
CreatedAt true Timestamp
FavouritesCount true Int32
UtcOffset true Int32
TimeZone true String
ProfileBackgroundImageURL true String
ProfileBackgroundImageUrlHttps true String
ProfileBannerURL true String
ProfileBannerRetinaURL true String
ProfileBannerIPadURL true String
ProfileBannerIPadRetinaURL true String
ProfileBannerMobileURL true String
ProfileBannerMobileRetinaURL true String
ProfileBackgroundTiled true Boolean
Lang true String Returns the preferred language of the user
StatusesCount true Int32
GeoEnabled true Boolean
Verified true Boolean
Translator true Boolean
ListedCount true Int32 Returns the number of public lists the user is listed on, or -1 if the count is unavailable.
FollowRequestSent true Boolean Returns true if the authenticating user has requested to follow this user, otherwise false.
WithheldInCountries false Array of String Returns the list of country codes where the user is withheld

com.github.jcustenborder.kafka.connect.twitter.ExtendedMediaEntity.Variant

Name Optional Schema Default Value Documentation
Url true String
Bitrate true Int32
ContentType true String

com.github.jcustenborder.kafka.connect.twitter.MediaEntity.Size

Name Optional Schema Default Value Documentation
Resize true Int32
Width true Int32
Height true Int32

com.github.jcustenborder.kafka.connect.twitter.ExtendedMediaEntity

Name Optional Schema Default Value Documentation
VideoAspectRatioWidth true Int32
VideoAspectRatioHeight true Int32
VideoDurationMillis true Int64
VideoVariants true Array of com.github.jcustenborder.kafka.connect.twitter.ExtendedMediaEntity.Variant
ExtAltText true String
Id true Int64 Returns the id of the media.
Type true String Returns the media type photo, video, animated_gif.
MediaURL true String Returns the media URL.
Sizes false Map of <Int32, com.github.jcustenborder.kafka.connect.twitter.MediaEntity.Size> Returns size variations of the media.
MediaURLHttps true String Returns the media secure URL.
URL true String Returns the URL mentioned in the tweet.
Text true String Returns the URL mentioned in the tweet.
ExpandedURL true String Returns the expanded URL if mentioned URL is shorten.
Start true Int32 Returns the index of the start character of the URL mentioned in the tweet.
End true Int32 Returns the index of the end character of the URL mentioned in the tweet.
DisplayURL true String Returns the display URL if mentioned URL is shorten.

com.github.jcustenborder.kafka.connect.twitter.HashtagEntity

Name Optional Schema Default Value Documentation
Text true String Returns the text of the hashtag without #.
Start true Int32 Returns the index of the start character of the hashtag.
End true Int32 Returns the index of the end character of the hashtag.

com.github.jcustenborder.kafka.connect.twitter.MediaEntity

Name Optional Schema Default Value Documentation
Id true Int64 Returns the id of the media.
Type true String Returns the media type photo, video, animated_gif.
MediaURL true String Returns the media URL.
Sizes false Map of <Int32, com.github.jcustenborder.kafka.connect.twitter.MediaEntity.Size>
MediaURLHttps true String Returns the media secure URL.
VideoAspectRatioWidth true Int32
VideoAspectRatioHeight true Int32
VideoDurationMillis true Int64
VideoVariants true Array of com.github.jcustenborder.kafka.connect.twitter.ExtendedMediaEntity.Variant Returns size variations of the media.
ExtAltText true String
URL true String Returns the URL mentioned in the tweet.
Text true String Returns the URL mentioned in the tweet.
ExpandedURL true String Returns the expanded URL if mentioned URL is shorten.
Start true Int32 Returns the index of the start character of the URL mentioned in the tweet.
End true Int32 Returns the index of the end character of the URL mentioned in the tweet.
DisplayURL true String Returns the display URL if mentioned URL is shorten.

com.github.jcustenborder.kafka.connect.twitter.SymbolEntity

Name Optional Schema Default Value Documentation
Start true Int32 Returns the index of the start character of the symbol.
End true Int32 Returns the index of the end character of the symbol.
Text true String Returns the text of the entity

com.github.jcustenborder.kafka.connect.twitter.URLEntity

Name Optional Schema Default Value Documentation
URL true String Returns the URL mentioned in the tweet.
Text true String Returns the URL mentioned in the tweet.
ExpandedURL true String Returns the expanded URL if mentioned URL is shorten.
Start true Int32 Returns the index of the start character of the URL mentioned in the tweet.
End true Int32 Returns the index of the end character of the URL mentioned in the tweet.
DisplayURL true String Returns the display URL if mentioned URL is shorten.

com.github.jcustenborder.kafka.connect.twitter.UserMentionEntity

Name Optional Schema Default Value Documentation
Name true String Returns the name mentioned in the status.
Id true Int64 Returns the user id mentioned in the status.
Text true String Returns the screen name mentioned in the status.
ScreenName true String Returns the screen name mentioned in the status.
Start true Int32 Returns the index of the start character of the user mention.
End true Int32 Returns the index of the end character of the user mention.

Running in development

mvn clean package
export CLASSPATH="$(find target/ -type f -name '*.jar'| grep '\-package' | tr '\n' ':')"
$CONFLUENT_HOME/bin/connect-standalone connect/connect-avro-docker.properties config/TwitterSourceConnector.properties

More Repositories

1

kafka-connect-spooldir

Kafka Connect connector for reading CSV files into Kafka.
Java
154
star
2

kafka-connect-transform-common

Common Transforms for Kafka Connect.
Java
148
star
3

kafka-connect-archtype

Maven quick start for building Kafka Connect connectors.
Java
143
star
4

kafka-connect-redis

Kafka Connect connector for Redis
Java
60
star
5

flume-ng-rabbitmq

Flume plugin for RabbitMQ
Java
59
star
6

kafka-connect-solr

Kafka Connect connector for writing to Solr.
Java
39
star
7

connect-utils

Utility project for working with Kafka Connect.
Java
32
star
8

kafka-connect-splunk

Kafka Connect connector for receiving data and writing data to Splunk.
Java
25
star
9

kafka-config-provider-aws

Kafka Configuration Provider for AWS Secrets Manager
Java
22
star
10

kafka-connect-transform-xml

Transformation for converting XML data to Structured data.
Java
22
star
11

kafka-connect-transform-archive

Kafka Connect transform to assist with archiving to S3.
Java
17
star
12

kafka-connect-json-schema

Java
14
star
13

puppet-confluent

Puppet Module for installing and configuring the Confluent Platform
Puppet
14
star
14

kafka-connect-cdc-postgres

Kafka Connect connector for CDC data from postgres
Java
12
star
15

kafka-connect-snmp

Kafka Connect connector for receiving SNMP data.
Java
11
star
16

kafka-config-provider-vault

Config provider for retrieving secrets from Hashicorp Vault.
Java
10
star
17

netty-codec-syslog

Netty codec for syslog
Java
10
star
18

cef-parser

Parser for Common Event Format messages
Java
8
star
19

kafka-connect-simulator

Kafka Connect connector for simulating a data flow.
Java
7
star
20

kafka-connect-docker

Docker container for all of the connectors and transforms maintained by me
7
star
21

kafka-connect-email

Kafka Connect Connector for email integrations.
Java
7
star
22

kafka-connect-examples

Example configurations for Kafka-Connect
7
star
23

confluent-ansible

Ansible playbook for installing the Confluent Platform
6
star
24

kafka-connect-memcached

Kafka Connect connector for Memcached
Java
6
star
25

connector-operator

Container to launch and manage connectors in a kubernetes environment
Java
6
star
26

kafka-jackson

Kafka Serializer, Deserializer, and Serde for Jackson JSON
Java
6
star
27

kafka-jaxb

Kafka Serializers / Deserializers for JAXB
Java
6
star
28

kafka-tools

Tools for managing Kafka and Kafka Connect.
Java
6
star
29

confluent-startup-scripts

Shell
6
star
30

kafka-connect-transform-maxmind

Kafka Connect transform to append MaxMind data.
Java
6
star
31

kafka-config-provider-azure

Java
5
star
32

kafka-vault

Java
5
star
33

jenkins-pipeline

Jenkins Pipeline for my public projects.
Groovy
5
star
34

kafka-connect-transform-fix

Kafka Connect Transformation for reading FIX messages.
Java
5
star
35

kafka-connect-transform-cef

Kafka Connect Single Message Transform for converting syslog messages to CEF format.
Java
5
star
36

kafka-connect-template

Github template repository for a Kafka Connect Plugin
Java
5
star
37

netty-codec-netflow

Netty decoder for Netflow V9
Java
4
star
38

vertica-stream-writer

Library is used to create streams in Vertica's Native Binary Format
Java
4
star
39

csid-connect-k8s

Java
3
star
40

kafka-config-provider-gcloud

Java
3
star
41

terraform-vmware-confluent

Terraform project to install the Confluent platform on a VMWare cluster.
HCL
3
star
42

cp-kafka-connect

3
star
43

kafka-connect-parent

Parent pom for Kafka Connect Connectors.
3
star
44

xjc-kafka-connect-plugin

Kafka Connect plugin to extend code generated by xjc to produce Kafka Connect Schemas and Structs.
Java
3
star
45

kafka-connect-transform-hl7

Kafka Connect Transform for reading HL7 structures.
2
star
46

kerberos-centos-6.6

Puppet
2
star
47

kafka-connect-aerospike

Kafka connect plugin for integrating with Aerospike
Java
2
star
48

kafka-connect-client

Client library for interacting with the Kafka Connect REST api.
Java
2
star
49

palo-alto-syslog-parser

Syslog parser for Palo Alto Network Devices
Java
2
star
50

kafka-connect-protobuf

2
star
51

kafka-load-testing

Kafka load testing with real world data.
Java
2
star
52

Amazon.Powershell

C#
2
star
53

kafka-connect-all

Docker container containing all of the connectors maintained by jcustenborder
Dockerfile
2
star
54

puppet-reposync

Puppet
1
star
55

kafka-connect-servlet

Framework for building webhook based kafka-connect connectors
Java
1
star
56

principal-generator

Python
1
star
57

kafka-connect-transform-cobol

Kafka Connect Transformations for converting data from Cobol.
Java
1
star
58

packagegenerator

Ruby
1
star
59

auditd-parser

Library to parse auditd log messages
1
star
60

kafka-connect-netflow

Kafka Connect connector for receiving NetFlow data from network devices.
1
star
61

kafka-connect-wits

Kafka Connect plugin for receiving data via the WITS protocol.
Java
1
star
62

kafka-streams-faa-swim

Java
1
star
63

kafka-connect-rethinkdb

Kafka Connect Plugin for RethinkDB
Java
1
star
64

kafka-connect-opentsdb

Java
1
star
65

mq-init-container

Python
1
star
66

chef-confluent

Chef cookbook for the Confluent Platform
Ruby
1
star
67

kafka-connect-google-apis

1
star
68

kafka-websocket-example

Java
1
star
69

extended-log-format-parser

Parser for Extended log format files.
Java
1
star
70

netty-codec-wits

Netty codec for implementing the WITS TCP protocol
Java
1
star