learning-spark
Example code from Learning Spark bookkoalas
Koalas: pandas API on Apache SparkSpark-The-Definitive-Guide
Spark: The Definitive Guide's Code Repositoryscala-style-guide
Databricks Scala Coding Style Guidespark-deep-learning
Deep Learning Pipelines for Apache Sparkclick
The "Command Line Interactive Controller for Kubernetes"LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]megablocks
spark-sklearn
(Deprecated) Scikit-learn integration package for Apache Sparkspark-csv
CSV Data Source for Apache Spark 1.xtensorframes
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Sparkdevrel
This repository contains the notebooks and presentations we use for our Databricks Tech Talksreference-apps
Spark reference applicationsspark-redshift
Redshift data source for Apache Sparkspark-sql-perf
spark-avro
Avro Data Source for Apache Sparkspark-xml
XML data source for Spark SQL and DataFramesspark-corenlp
Stanford CoreNLP wrapper for Apache Sparkmlops-stacks
This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.spark-training
Apache Spark training materialdatabricks-cli
(Legacy) Command Line Interface for Databricksspark-perf
Performance tests for Apache Sparkdelta-live-tables-notebooks
terraform-provider-databricks
Databricks Terraform Providerspark-knowledgebase
Spark Knowledge Basedatabricks-ml-examples
sjsonnet
jsonnet-style-guide
Databricks Jsonnet Coding Style Guidedbt-databricks
A dbt adapter for Databricks.databricks-sdk-py
Databricks SDK for Python (Beta)containers
Sample base images for Databricks Container Servicesdatabricks-sql-python
Databricks SQL Connector for Pythonsbt-spark-package
Sbt plugin for Spark packagesnotebook-best-practices
An example showing how to apply software engineering best practices to Databricks notebooks.databricks-vscode
VS Code extension for Databricksbenchmarks
A place in which we publish scripts for reproducible benchmarks.terraform-databricks-examples
Examples of using Terraform to deploy Databricks resourcesspark-tfocs
A Spark port of TFOCS: Templates for First-Order Conic Solvers (cvxr.com/tfocs)intellij-jsonnet
Intellij Jsonnet Pluginsbt-databricks
An sbt plugin for deploying code to Databricks Cloudterraform-databricks-lakehouse-blueprints
Set of Terraform automation templates and quickstart demos to jumpstart the design of a Lakehouse on Databricks. This project has incorporated best practices across the industries we work with to deliver composable modules to build a workspace to comply with the highest platform security and governance standards.spark-integration-tests
Integration tests for Sparkgenai-cookbook
spark-pr-dashboard
Dashboard to aid in Spark pull request reviewsrun-notebook
ide-best-practices
Best practices for working with Databricks from an IDEunity-catalog-setup
Notebooks, terraform, tools to enable setting up Unity Catalogsimr
Spark In MapReduce (SIMR) - launching Spark applications on existing Hadoop MapReduce infrastructuredevbox
databricks-sql-go
Golang database/sql driver for Databricks SQL.diviner
Grouped time series forecasting enginecli
Databricks CLItmm
security-bucket-brigade
databricks-sdk-go
Databricks SDK for Gopig-on-spark
proof-of-concept implementation of Pig-on-Spark integrated at the logical node leveldatabricks-sql-cli
CLI for querying Databricks SQLautoml
databricks-sql-nodejs
Databricks SQL Connector for Node.jstpch-dbgen
Patched version of dbgenals-benchmark-scripts
Scripts to benchmark distributed Alternative Least Squares (ALS)spark-package-cmd-tool
A command line tool for Spark packagescongruity
The goal of this library is to provide a compatibility layer that makes it easier to adopt Spark Connect. The library is designed to be simply imported in your application and will then monkey-patch the existing API to provide the legacy functionality.python-interview
Databricks Python interview setup instructionsxgb-regressor
MLflow XGBoost Regressordatabricks-accelerators
Accelerate the use of Databricks for customers [public repo]tableau-connector
files_in_repos
upload-dbfs-temp
spark-sklearn-docs
sqltools-databricks-driver
SQLTools driver for Databricks SQLgenomics-pipelines
secondary analysis pipelines parallelized with apache sparkworkflows-examples
databricks-sdk-java
Databricks SDK for Javadais-cow-bff
Code for the "Path to Production" DAIS 2024 and 2023 talksxgboost-linux64
Databricks Private xgboost Linux64 forkmlflow-example-sklearn-elasticnet-wine
databricks-ttyd
setup-cli
Sets up the Databricks CLI in your GitHub Actions workflow.terraform-databricks-mlops-aws-project
This module creates and configures service principals with appropriate permissions and entitlements to run CI/CD for a project, and creates a workspace directory as a container for project-specific resources for the Databricks AWS staging and prod workspaces.jenkins-job-builder
Fork of https://docs.openstack.org/infra/jenkins-job-builder/ to include unmerged patchesterraform-databricks-mlops-azure-project-with-sp-creation
This module creates and configures service principals with appropriate permissions and entitlements to run CI/CD for a project, and creates a workspace directory as a container for project-specific resources for the Azure Databricks staging and prod workspaces. It also creates the relevant Azure Active Directory (AAD) applications for the service principals.terraform-databricks-sra
The Security Reference Architecture (SRA) implements typical security features as Terraform Templates that are deployed by most high-security organizations, and enforces controls for the largest risks that customers ask about most often.databricks-empty-ide-project
Empty IDE project used by the VSCode extension for Databricksdatabricks-repos-proxy
databricks-asset-bundles-dais2023
pex
Fork of pantsbuild/pex with a few Databricks-specific changesSnpEff
Databricks snpeff forknotebook_gallery
terraform-databricks-mlops-aws-infrastructure
This module sets up multi-workspace model registry between a Databricks AWS development (dev) workspace, staging workspace, and production (prod) workspace, allowing READ access from dev/staging workspaces to staging & prod model registries.expectations
homebrew-tap
Homebrew Tap for the Databricks CLIterraform-databricks-mlops-azure-infrastructure-with-sp-creation
This module sets up multi-workspace model registry between an Azure Databricks development (dev) workspace, staging workspace, and production (prod) workspace, allowing READ access from dev/staging workspaces to staging & prod model registries. It also creates the relevant Azure Active Directory (AAD) applications for the service principals.mfg_dlt_workshop
DLT Manufacturing Workshopdatabricks-dbutils-scala
The Scala SDK for Databricks.kdd24-forecasting-anomaly-detection
terraform-databricks-mlops-azure-project-with-sp-linking
This module creates and configures service principals with appropriate permissions and entitlements to run CI/CD for a project, and creates a workspace directory as a container for project-specific resources for the Azure Databricks staging and prod workspaces. It also links pre-existing Azure Active Directory (AAD) applications to the service principals.terraform-databricks-mlops-azure-infrastructure-with-sp-linking
This module sets up multi-workspace model registry between an Azure Databricks development (dev) workspace, staging workspace, and production (prod) workspace, allowing READ access from dev/staging workspaces to staging & prod model registries. It also links pre-existing Azure Active Directory (AAD) applications to the service principals.Love Open Source and this site? Check out how you can help us