Databricks Labs (@databrickslabs)
  • Stars
    star
    14,303
  • Global Org. Rank 1,627 (Top 0.6 %)
  • Registered over 5 years ago
  • Most used languages
    Python
    54.1 %
    Scala
    21.6 %
    HTML
    5.4 %
    R
    5.4 %
    Java
    2.7 %
    Rich Text Format
    2.7 %
    Go
    2.7 %

Top repositories

1

dolly

Databricksโ€™ Dolly, a large language model trained on the Databricks Machine Learning Platform
Python
10,811
star
2

pyspark-ai

English SDK for Apache Spark
Python
739
star
3

dbx

๐Ÿงฑ Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
Python
440
star
4

dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Python
313
star
5

tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
Jupyter Notebook
306
star
6

mosaic

An extension to the Apache Spark framework that allows easy and fast processing of very large geospatial datasets.
Jupyter Notebook
270
star
7

overwatch

Capture deep metrics on one or all assets within a Databricks workspace
Scala
226
star
8

ucx

Automated migrations to Unity Catalog
Python
220
star
9

cicd-templates

Manage your Databricks deployments and CI with code.
Python
202
star
10

automl-toolkit

Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
HTML
191
star
11

migrate

Old scripts for one-off ST-to-E2 migrations. Use "terraform exporter" linked in the readme.
Python
186
star
12

dlt-meta

Metadata driven Databricks Delta Live Tables framework for bronze/silver pipelines
Python
147
star
13

dataframe-rules-engine

Extensible Rules Engine for custom Dataframe / Dataset validation
Scala
134
star
14

discoverx

A Swiss-Army-knife for your Data Intelligence platform administration.
Python
105
star
15

geoscan

Geospatial clustering at massive scale
Scala
94
star
16

jupyterlab-integration

DEPRECATED: Integrating Jupyter with Databricks via SSH
HTML
71
star
17

smolder

HL7 Apache Spark Datasource
Scala
61
star
18

feature-factory

Accelerator to rapidly deploy customized features for your business
Python
55
star
19

databricks-sync

An experimental tool to synchronize source Databricks deployment with a target Databricks deployment.
Python
46
star
20

doc-qa

Python
45
star
21

transpiler

SIEM-to-Spark Transpiler
Scala
42
star
22

brickster

R Toolkit for Databricks
R
40
star
23

delta-oms

DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-rc1 release.Documentation here - https://databrickslabs.github.io/delta-oms/v0.7.0-rc1/
Scala
38
star
24

pytester

Python Testing for Databricks
Python
35
star
25

remorph

Cross-compiler and Data Reconciler into Databricks Lakehouse
Scala
33
star
26

splunk-integration

Databricks Add-on for Splunk
Python
26
star
27

dbignite

Python
24
star
28

arcuate

Delta Sharing + MLflow for ML model & experiment exchange (arcuate delta - a fan shaped river delta)
Python
22
star
29

databricks-sdk-r

Databricks SDK for R (Experimental)
R
19
star
30

tika-ocr

Rich Text Format
17
star
31

sandbox

Experimental or low-maturity things
Go
16
star
32

blueprint

Baseline for Databricks Labs projects written in Python
Python
16
star
33

delta-sharing-java-connector

A Java connector for delta.io/sharing/ that allows you to easily ingest data on any JVM.
Java
13
star
34

partner-connect-api

Scala
12
star
35

pylint-plugin

Databricks Plugin for PyLint
Python
10
star
36

lsql

Lightweight SQL execution wrapper only on top of Databricks SDK
Python
9
star
37

waterbear

Automated provisioning of an industry Lakehouse with enterprise data model
Python
8
star