• Stars
    star
    106
  • Rank 325,871 (Top 7 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created about 8 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Halyard is an extremely horizontally scalable Triplestore with support for Named Graphs, designed for integration of extremely large Semantic Data Models, and for storage and SPARQL 1.1 querying of the whole Linked Data universe snapshots.

Halyard

CI Coverage

Halyard is an extremely horizontally scalable triple store with support for named graphs, designed for integration of extremely large semantic data models and for storage and SPARQL 1.1 querying of complete Linked Data universe snapshots. Halyard implementation is based on Eclipse RDF4J framework and Apache HBase database, and it is completely written in Java.

Author: Adam Sotona

Discussion group: https://groups.google.com/d/forum/halyard-users

Documentation: https://merck.github.io/Halyard

Get started

Download and unzip the latest halyard-sdk-<version>.zip bundle to a Apache Hadoop cluster node with configured Apache HBase client.

Halyard is expected to run on an Apache Hadoop cluster node with configured Apache HBase client. Apache Hadoop and Apache HBase components are not bundled with Halyard. The runtime requirements are:

  • Apache Hadoop version 2.5.1 or higher
  • Apache HBase version 1.1.2 or higher
  • Java 8 Runtime

Note: Recommended Apache Hadoop distribution is the latest version of Hortonworks Data Platform (HDP) or Amazon Elastic Map Reduce (EMR).

See Documentation for usage examples, architecture information, and more.

Repository contents

  • common - a library for direct mapping between an RDF data model and Apache HBase
  • strategy - a generic parallel asynchronous implementation of RDF4J Evaluation Strategy
  • sail - an implementation of the RDF4J Storage and Inference Layer on top of Apache HBase
  • tools - a set of command line and Apache Hadoop MapReduce tools for loading, updating, querying, and exporting the data with maximum performance
  • sdk - a distributable bundle of Eclipse RDF4J and Halyard for command line use on an Apache Hadoop cluster with configured HBase
  • webapps - a re-distribution of Eclipse RDF4J Web Applications (RDF4J-Server and RDF4J-Workbench), patched and enhanced to include Halyard as another RDF repository option

More Repositories

1

BioPhi

BioPhi is an open-source antibody design platform. It features methods for automated antibody humanization (Sapiens), humanness evaluation (OASis) and an interface for computer-assisted antibody sequence design.
Python
136
star
2

deepbgc

BGC Detection and Classification Using Deep Learning
Jupyter Notebook
122
star
3

r2rtf

Easily Create Production-Ready Rich Text Format (RTF) Table and Figure
R
76
star
4

DeepNeuralNet-QSAR

Python
64
star
5

matcher

Matcher is a tool for understanding how chemical structure optimization problems have been solved. Matcher enables deep control over searching structure/activity relationships (SAR) derived from large datasets, and takes the form of an accessible web application with simple deployment. Matcher is built around the mmpdb platform.
Python
48
star
6

rdf2x

RDF2X converts big RDF datasets to the relational database model, CSV, JSON and ElasticSearch.
Java
47
star
7

Sapiens

Sapiens is a human antibody language model based on BERT.
Jupyter Notebook
44
star
8

pkglite

Compact Package Representations
R
30
star
9

sonar-r-plugin

Adds support for R language into SonarQube. It uses output from lintr tool which is processed by the plugin and uploaded into SonarQube server.
Java
23
star
10

Line-of-Therapy-Algorithm

This is the Line of Therapy Algorithm, as described in the paper "Temporal phenotyping by mining healthcare data to derive lines of therapy for cancer" pending submission in the Journal of Biomedical Informatics.
Python
23
star
11

gsDesign2

Group Sequential Design Under Non-Proportional Hazards
R
19
star
12

AlgebraicAgents.jl

A lightweight framework to enable hierarchical, heterogeneous dynamical systems co-integration. Batteries included!
Julia
17
star
13

simtrial

Clinical trial simulation for time-to-event endpoints
R
17
star
14

metalite.ae

An R package for standard adverse events analysis
R
17
star
15

BioPhi-2021-publication

This repository contains scripts, data and jupyter notebooks used to produce the evaluation results in the BioPhi 2021 publication
Jupyter Notebook
15
star
16

metalite

An R package to create metadata structure for ADaM data analysis and reporting
R
15
star
17

PepSeA

Python
14
star
18

Mutation_Maker

Application for mutagenic primer design. Facilitates development of biocatalysts (Green Chemistry) and new therapeutic proteins.
Python
14
star
19

ReactiveDynamics.jl

A Julia package that implements a category of reaction (transportation) network-type dynamical systems.
Julia
14
star
20

boxly

Interactive box plot for clinical trial analysis
R
13
star
21

mRNAid

Jupyter Notebook
11
star
22

forestly

Interactive forest plot for adverse events analysis
R
11
star
23

pmpo

Probabilistic Multi-Parameter Optimization (pMPO)
Python
11
star
24

bgc-pipeline

Jupyter Notebook
9
star
25

AbLEF

Antibody Langauge Ensemble Fusion - fuses antibody structural ensemble and language representation for property prediction
Python
8
star
26

gMCPLite

Lightweight graph-based multiple comparison procedures
R
8
star
27

GeneratedExpressions.jl

A Julia package that implements a metalanguage to support expression comprehensions.
Julia
8
star
28

Data-Profiler

Java
8
star
29

gMCPShiny

A Shiny app for graphical multiplicity control
R
7
star
30

NNGP

Nearest Neighbor Gaussian Process
7
star
31

CEEDesigns.jl

A decision-making framework for the cost-efficient design of experiments, balancing the value of acquired experimental evidence and incurred costs.
Julia
7
star
32

rtdpy

Residence Time Distribution modeling in Python.
Python
6
star
33

matcher-mmpdb

Python
5
star
34

MolPROP

fuses molecular language and graph representation for property prediction
Python
5
star
35

Real-world-Time-to-Treatment-Discontinuation-Prediction-Algorithm

Real-world Time to Treatment Discontinuation Prediction Algorithm
Perl
4
star
36

compoundcomplexity

This is an implementation of Compound Complexity for use in the SMART-PMI as described by Sherer et al. It contains derived training data as required by the described Random Forest Model in order to replicate data presented in paper as well as applying to novel data.
Perl
4
star
37

TraceTrack

Python
3
star
38

gsdmvn

The goal of gsdmvn is to enable group sequential trial design for time-to-event endpoints under non-proportional hazards assumptions.
R
3
star
39

curation-open-source

This wrapper enables the HPC execution of FDA DB curation and list all the step in a programming language style.
Jupyter Notebook
3
star
40

Message-Hub

The Messaging Orchestration HUB will be responsible for providing a connection between an organization's GS1 EPCIS-based track and trace data source system (for example ATTP) and the blockchain networks that require data relevant to product serialization and track & trace.
TypeScript
3
star
41

helm-visualisation

JavaScript
2
star
42

OMOP-CONCEPT-EMBEDDING

Python
2
star
43

polo

POLO: web interface to MARCO-scored crystallization images
Python
2
star
44

deker

This library is made to perform feature selection based on a method originally proposed in by Sun et al. [1]. This library specifically relates to the methodology described in [2], named DEKER for decomposed kernel regression, which includes methods for identifying optimal hyperparameter values. This library was also designed for use in the context of network inference, also described in [2], by iteratively reapplying the DEKER method for feature selection across all features of a dataset.
C++
2
star
45

BART-QSAR

R
1
star
46

3D_Tumor_Lightsheet_Analysis_Pipeline

Python
1
star
47

MicroMap_Pipeline

R
1
star
48

ProbeDesign

HTML
1
star
49

bayesiansprt

The goal of bayesiansprt (under GPL-3 license) is to provide the results for sequential probability ratio test under frequentist and Bayesian setup.
R
1
star
50

mmrm

R
1
star
51

rCPDMS

Chemoproteomics Data Analysis
R
1
star
52

Infant-Microbiome-Cohort

Infant Microbiome Cohort
Jupyter Notebook
1
star
53

psm3mkv

psm3mkv: A package to evaluate the fit and efficiency of three state oncology cost-effectiveness model structures
R
1
star