• Stars
    star
    168
  • Rank 224,731 (Top 5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 10 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Distributed Graph Analytics (DGA) is a compendium of graph analytics written for Bulk-Synchronous-Parallel (BSP) processing frameworks such as Giraph and GraphX. The analytics included are High Betweenness Set Extraction, Weakly Connected Components, Page Rank, Leaf Compression, and Louvain Modularity.

distributed-graph-analytics

Distributed Graph Analytics (DGA) is a compendium of graph analytics written for Bulk-Synchronous-Parallel (BSP) processing frameworks such as Giraph and GraphX.

Currently, DGA supports the following analytics:

Giraph
  • Weakly Connected Components
  • Leaf Compression
  • Page Rank
  • High Betweenness Set Extraction
  • Louvain
GraphX
  • Louvain Modularity (initial stage)
  • Weakly Connected Components
  • High Betweenness Set Extraction
  • Leaf Compression
  • Page Rank
  • Neighboring Communities
dga-giraph

dga-giraph is the project that contains our Giraph implementation of DGA. For more information, go here: dga-giraph README.md

documentation

http://sotera.github.io/distributed-graph-analytics

Steps to run Louvain GraphX

Download The CentOS VM

https://github.com/Sotera/seam-team-6-vm

Required Scala Version

Scala version: 2.11.1 Scala installation location: /opt/scala

wget http://downloads.typesafe.com/scala/2.11.1/scala-2.11.1.tgz tar xvf scala-2.11.1.tgz sudo mv scala-2.11.1 /opt/scala

Required Spark Core and GraphX

Spark Core and GraphX version: 1.3.0 Spark installation location: /opt/spark

Start Spark with one master and four worker instances

cd /usr/lib/spark/sbin ./start-master.sh echo "export SPARK_WORKER_INSTANCES=4" >> spark-env.sh ./start-slaves.sh

Spark Web UI: Browse to the master web UI to make sure

the master and all the workers are started correctly

http://localhost:8080/

Copy the example data to HDFS:

wget http://sotera.github.io/distributed-graph-analytics/data/example.csv hdfs dfs -put example.csv hdfs://localhost:8020/tmp/dga/louvain/input/

Clone the repository and build the code

cd /vagrant git clone https://github.com/Sotera/distributed-graph-analytics cd distributed-graph-analytics gradle clean dist cd /vagrant/distributed-graph-analytics/dga-graphx/build/dist ./run.sh

You can find the output in:

hdfs://localhost:8020/tmp/dga/louvain/output

More Repositories

1

spark-distributed-louvain-modularity

Spark / graphX implementation of the distributed louvain modularity algorithm
Scala
304
star
2

correlation-approximation

Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets
Scala
91
star
3

mitie-trainer

Model Training tool for MITIE
JavaScript
77
star
4

newman

Quickly analyze and explore email with advanced analytics and visualization.
JavaScript
55
star
5

pst-extraction

PST extraction and analytic pipeline
Python
37
star
6

distributed-louvain-modularity

Community Detection and Compression Analytic for Big Graph Data
Java
37
star
7

graphene

JavaScript
24
star
8

zephyr

Zephyr is a big data, platform agnostic ETL API, with Hadoop MapReduce, Storm, and other big data bindings.
Java
21
star
9

watchman

Watchman: An open-source social-media event-detection system
JavaScript
20
star
10

aggregate-micro-paths

Infer movement patterns from large amounts of geo-temporal data in a cloud environment.
Python
14
star
11

track-communities

A series of analytics for creating networks from geo-temporal track data based on time/space co-occurrence. Includes UI for visualization of communities and tracks.
JavaScript
14
star
12

Datawake

Browser add-on and web server to support collection and analysis of web browsing data.
JavaScript
13
star
13

Datawake-Legacy

This project is superseded by the current Datawake project but is maintained here for existing users. Browser extension and backend services aimed at enhancing Internet search with domain specific knowledge, collaboration, and analysis.
JavaScript
10
star
14

DatawakeDepot

Loopback web application for administration of Datawake networks
JavaScript
9
star
15

high-betweenness-set-extraction

Approximate Betweenness Centrality computation for big graph data.
Java
8
star
16

rhipe-arima

An R/Hadoop Arima analytic using Rhipe to submit mapreduce jobs.
R
8
star
17

GEQE

Geo Event Quey by Example - Leverage geo-located temporal text data in order to identify similar locations or events.
Python
8
star
18

firmament

NodeJS script and Docker files to create MySQL/MongoDB backed AngularJS/Bootstrap web application
JavaScript
7
star
19

datawake-prefetch

Python
7
star
20

page-rank

Java
6
star
21

social-sandbox

Geo-temporal scraping of social media, unsupervised event detection
JavaScript
4
star
22

xdata-vm

Vagrant-Ubuntu VM serving as a platform for XDATA performer software integration
Ruby
4
star
23

xdata-nba

Tools to mine nba data
Python
3
star
24

leaf-compression

Java
3
star
25

DatawakeManager-WebApp

DatawakeManager Web Server
JavaScript
2
star
26

newman-vm

newman vm
Shell
2
star
27

interactive-graph-viewer

An R Shiny app for interactively viewing the results of the Louvain method for community detection.
JavaScript
2
star
28

hive-common-udf

A collection of common Apache Hive UDFs
Java
2
star
29

triangle-counting

A port of the work at Sandia National Laboratories on approximate triangle counting via wedge sampling.
Scala
2
star
30

merlin-stack

Shell
2
star
31

graphene-enron

JavaScript
2
star
32

go_watchman

github.com/watchman apps for which go is specifically well suited
Go
2
star
33

graphene-walker

Java
2
star
34

Rmmtsne

A native R implementation of multiple maps t-distributed stochastic neighbor embedding (mmtsne).
R
1
star
35

twitter-cacher

Twitter Scraper
Java
1
star
36

zephyr-sample-project

A sample project (or, rather, sample projects) to show various ways of using Zephyr - generally a good starting point for your own Zephyr implementations.
Java
1
star
37

vande

Java
1
star
38

sotera.github.io

CSS
1
star
39

DatawakeManager-Loopback

DatawakeManager Data Layer
JavaScript
1
star
40

newman-research

Tools to be evaluated prior to integration into Newman
Python
1
star
41

graphene-instagram

A version of Graphene that runs on scraped Instagram data.
Java
1
star
42

DatawakeFFPlugin

JMI based Datawake plugin for Firefox 38+
JavaScript
1
star
43

zephyr-contrib

Useful classes for functions outside the scope of Zephyr's ETL, but still used in many scenarios (generally with extensive dependencies that probably shouldn't be in the core API).
Java
1
star
44

DatawakeSuite

1
star
45

micropath-kml

For creating kml to visualize aggregate micro-path output.
Java
1
star
46

xdata_meta

Meta information about the XData project
1
star