Cloudera (@cloudera)

Top repositories

1

hue

Open source SQL Query Assistant service for Databases/Warehouses
JavaScript
1,164
star
2

livy

Livy is an open source REST interface for interacting with Apache Spark from anywhere
Scala
996
star
3

flume

WE HAVE MOVED to Apache Incubator. https://cwiki.apache.org/FLUME/ . Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
Java
944
star
4

impyla

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Python
730
star
5

cm_api

Cloudera Manager API Client
Java
298
star
6

cdh-twitter-example

Example application for analyzing Twitter data using CDH - Flume, Oozie, Hive
Java
286
star
7

cloudera-playbook

Cloudera deployment automation with Ansible
HTML
198
star
8

cm_ext

Cloudera Manager Extensibility Tools and Documentation.
Java
183
star
9

flink-tutorials

Java
182
star
10

impala-tpcds-kit

TPC-DS Kit for Impala
Smarty
164
star
11

kitten

The fast and fun way to write YARN applications.
Java
136
star
12

cloudera-scripts-for-log4j

Scripts for addressing log4j zero day security issue
Shell
86
star
13

kudu-examples

Example code for Kudu
78
star
14

python-ngrams

Python
75
star
15

clusterdock

Python
70
star
16

hs2client

C++ native client for Impala and Hive, with Python / pandas bindings
Thrift
69
star
17

impala-udf-samples

Sample UDF and UDAs for Impala.
C++
63
star
18

director-scripts

Cloudera Director sample code
Shell
61
star
19

cm_csds

A collection of Custom Service Descriptors
Shell
54
star
20

bigtop

Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem. The primary goal of Bigtop is to build a community around the packaging and interoperability testing of Hadoop-related projects. This includes testing at various levels (packaging, platform, runtime, upgrade, etc...) developed by a community with a focus on the system as a whole, rather than individual projects.
Groovy
50
star
21

CML_AMP_LLM_Chatbot_Augmented_with_Enterprise_Data

Python
49
star
22

cdh-package

Groovy
48
star
23

ades

An analysis of adverse drug event data using Hadoop, R, and Gephi
Java
44
star
24

kafka-examples

Kafka Examples repository.
Scala
43
star
25

mapreduce-tutorial

Java
37
star
26

llama

Llama - Low Latency Application MAster
Java
33
star
27

seismichadoop

System for performing seismic data processing on a Hadoop cluster.
Java
32
star
28

CML_AMP_Anomaly_Detection

Apply modern, deep learning techniques for anomaly detection to identify network intrusions.
Python
30
star
29

mahout

Java
30
star
30

parquet-examples

Example programs and scripts for accessing parquet files
Java
30
star
31

dist_test

HTML
29
star
32

Impala

Real-time Query for Hadoop; mirror of Apache Impala
C++
29
star
33

native-toolchain

Shell
27
star
34

emailarchive

Hadoop for archiving email
Java
24
star
35

dbt-impala

A dbt adapter for Apache Impala & Cloudera Data Platform
Python
24
star
36

cdsw-training

Example Python and R code for Cloudera Data Science Workbench training
Python
23
star
37

navigator-sdk

Navigator SDK
Java
22
star
38

dbt-hive

The dbt-hive adapter allows you to use dbt with Apache Hive and Cloudera Data Platform.
Python
22
star
39

director-sdk

Cloudera Director API clients
Java
17
star
40

thrift_sasl

Thrift SASL module that implements TSaslClientTransport
Python
17
star
41

tutorial-assets

Assets used in Cloudera Tutorials
Python
16
star
42

community-ml-runtimes

Dockerfile
16
star
43

squeasel

C
16
star
44

python-sasl

Python wrapper for Cyrus SASL
C++
16
star
45

cod-examples

cod-examples
Java
16
star
46

sqoop2

Java
15
star
47

CML_AMP_Explainability_LIME_SHAP

Learn how to explain ML models using LIME and SHAP.
Jupyter Notebook
14
star
48

CML_AMP_Few-Shot_Text_Classification

Perform topic classification on news articles in several limited-labeled data regimes.
Jupyter Notebook
14
star
49

earthquake

Java
14
star
50

cmlextensions

Added functionality to the cml python package
Python
14
star
51

ml-runtimes

Dockerfile
13
star
52

CML_AMP_Image_Analysis

Build a semantic search application with deep learning models.
Jupyter Notebook
12
star
53

cloudera-airflow-plugins

Python
12
star
54

CML_AMP_Continuous_Model_Monitoring

Demonstration of how to perform continuous model monitoring on CML using Model Metrics and Evidently.ai dashboards
CSS
12
star
55

strata-tutorial-2016-nyc

Scala
11
star
56

cdp-sdk-java

Cloudera CDP SDK for Java
Java
11
star
57

director-aws-plugin

Cloudera Director - Amazon Web Services integration
Java
11
star
58

logredactor

Java
11
star
59

CML_AMP_Churn_Prediction

Build an scikit-learn model to predict churn using customer telco data.
Jupyter Notebook
11
star
60

phoenix

phoenix
Java
11
star
61

dbt-impala-example

A demo project for dbt-impala adapter for dbt
Python
10
star
62

poisson_sampling

R
10
star
63

cml-training

Example Python and R code for Cloudera Machine Learning (CML) training
R
9
star
64

Applied-ML-Prototypes

9
star
65

director-google-plugin

Cloudera Director - Google Cloud Platform integration
Java
9
star
66

cdpcli

CDP command line interface (CLI)
Python
9
star
67

cdp-dev-docs

cdp-dev-docs
HTML
8
star
68

CML_AMP_Canceled_Flight_Prediction

Perform analytics on a large airline dataset with Spark and build an XGBoost model to predict flight cancellations.
Jupyter Notebook
8
star
69

CML_AMP_Structural_Time_Series

Applying a structural time series approach to California hourly electricity demand data.
Python
8
star
70

director-spi

Cloudera Director Service Provider Interface
Java
8
star
71

CML_AMP_Question_Answering

Explore an emerging NLP capability with WikiQA, an automated question answering system built on top of Wikipedia.
Python
8
star
72

CML_AMP_Intelligent-QA-Chatbot-with-NiFi-Pinecone-and-Llama2

The prototype deploys an Application in CML using a Llama2 model from Hugging Face to answer questions augmented with knowledge extracted from the website. This prototype introduces Pinecone as a database for storing vectors for semantic search.
Python
8
star
73

dbt-hive-example

A sample project for dbt-hive adapter with Cloudera Data Platform
Python
7
star
74

terraform-provider-cdp

terraform-provider-cdp
Go
7
star
75

cmlutils

Python
7
star
76

crcutil

C++
6
star
77

datafu

Java
6
star
78

flink-basic-auth-handler

flink-basic-auth-handler
Java
6
star
79

partner-engineering

Cloudera Partner Engineering Tools
Shell
6
star
80

cybersec

Java
6
star
81

cdpcurl

Curl like tool with CDP request signing.
Python
5
star
82

CML_AMP_MLFlow_Tracking

Experiment tracking with MLFlow.
Python
5
star
83

hcatalog-examples

Sample code for reading and writing tables with hcatalog
Java
5
star
84

CML_AMP_Dask_on_CML

CML_AMP_Dask_on_CML
Jupyter Notebook
5
star
85

CML_AMP_Streamlit_on_CML

Demonstration of how to use Streamlit as a CML Application.
Python
5
star
86

CML_AMP_Video_Classification

Demonstration of how to perform video classification using pre-trained TensorFlow models.
Jupyter Notebook
5
star
87

opdb-docker

Shell
4
star
88

github-jira-gateway

A Grails app to serve as a gateway between an internal GitHub Enterprise server and an external JIRA server
Groovy
4
star
89

blog-eclipse

Perl
4
star
90

CML_llm-hol

Jupyter Notebook
4
star
91

CML_AMP_SpaCy_Entity_Extraction

A Jupyter notebook demonstrating entity extraction on headlines with SpaCy.
Jupyter Notebook
4
star
92

flink-kerberos-auth-handler

flink-kerberos-auth-handler
Java
3
star
93

CML_AMP_Object_Detection_Inference

Interact with a blog-style Streamlit application to visually unpack the inference workflow of a modern, single-stage object detector.
Python
3
star
94

dbt-spark-cde-example

Python
3
star
95

CML_AMP_Intelligent_Writing_Assistance

CML_AMP_Intelligent_Writing_Assistance
Python
3
star
96

dbt-spark-livy-example

dbt-spark-livy-example
Python
3
star
97

CML_AMP_LLM_Fine_Tuning_Studio

Python
3
star
98

CML_AMP_APIv2

Demonstration of how to use the CML API to interact with CML.
Jupyter Notebook
3
star
99

director-azure-plugin

Cloudera Director - Microsoft Azure Integration
Java
2
star
100

observability

Cloudera Observability related artifacts including Grafana charts and Alert definitions
Shell
2
star