Databricks Spark Knowledge Base
The contents contained here is also published in Gitbook format.
This content is covered by the license specified here.
There are no reviews yet. Be the first to send feedback to the community and the maintainers!
The contents contained here is also published in Gitbook format.
This content is covered by the license specified here.
learning-spark
Example code from Learning Spark bookkoalas
Koalas: pandas API on Apache SparkSpark-The-Definitive-Guide
Spark: The Definitive Guide's Code Repositoryscala-style-guide
Databricks Scala Coding Style Guidespark-deep-learning
Deep Learning Pipelines for Apache Sparkclick
The "Command Line Interactive Controller for Kubernetes"LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]megablocks
spark-sklearn
(Deprecated) Scikit-learn integration package for Apache Sparkspark-csv
CSV Data Source for Apache Spark 1.xtensorframes
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Sparkdevrel
This repository contains the notebooks and presentations we use for our Databricks Tech Talksreference-apps
Spark reference applicationsspark-redshift
Redshift data source for Apache Sparkspark-sql-perf
spark-avro
Avro Data Source for Apache Sparkspark-xml
XML data source for Spark SQL and DataFramesspark-corenlp
Stanford CoreNLP wrapper for Apache Sparkmlops-stacks
This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.spark-training
Apache Spark training materialdatabricks-cli
(Legacy) Command Line Interface for Databricksspark-perf
Performance tests for Apache Sparkdelta-live-tables-notebooks
terraform-provider-databricks
Databricks Terraform Providerdatabricks-ml-examples
sjsonnet
jsonnet-style-guide
Databricks Jsonnet Coding Style Guidedbt-databricks
A dbt adapter for Databricks.databricks-sdk-py
Databricks SDK for Python (Beta)containers
Sample base images for Databricks Container Servicesdatabricks-sql-python
Databricks SQL Connector for Pythonsbt-spark-package
Sbt plugin for Spark packagesnotebook-best-practices
An example showing how to apply software engineering best practices to Databricks notebooks.databricks-vscode
VS Code extension for Databricksbenchmarks
A place in which we publish scripts for reproducible benchmarks.terraform-databricks-examples
Examples of using Terraform to deploy Databricks resourcesspark-tfocs
A Spark port of TFOCS: Templates for First-Order Conic Solvers (cvxr.com/tfocs)intellij-jsonnet
Intellij Jsonnet Pluginsbt-databricks
An sbt plugin for deploying code to Databricks Cloudterraform-databricks-lakehouse-blueprints
Set of Terraform automation templates and quickstart demos to jumpstart the design of a Lakehouse on Databricks. This project has incorporated best practices across the industries we work with to deliver composable modules to build a workspace to comply with the highest platform security and governance standards.spark-integration-tests
Integration tests for Sparkgenai-cookbook
spark-pr-dashboard
Dashboard to aid in Spark pull request reviewsrun-notebook
ide-best-practices
Best practices for working with Databricks from an IDEunity-catalog-setup
Notebooks, terraform, tools to enable setting up Unity Catalogsimr
Spark In MapReduce (SIMR) - launching Spark applications on existing Hadoop MapReduce infrastructuredevbox
databricks-sql-go
Golang database/sql driver for Databricks SQL.diviner
Grouped time series forecasting enginecli
Databricks CLItmm
security-bucket-brigade
databricks-sdk-go
Databricks SDK for Gopig-on-spark
proof-of-concept implementation of Pig-on-Spark integrated at the logical node leveldatabricks-sql-cli
CLI for querying Databricks SQLautoml
databricks-sql-nodejs
Databricks SQL Connector for Node.jstpch-dbgen
Patched version of dbgenals-benchmark-scripts
Scripts to benchmark distributed Alternative Least Squares (ALS)spark-package-cmd-tool
A command line tool for Spark packagescongruity
The goal of this library is to provide a compatibility layer that makes it easier to adopt Spark Connect. The library is designed to be simply imported in your application and will then monkey-patch the existing API to provide the legacy functionality.python-interview
Databricks Python interview setup instructionsxgb-regressor
MLflow XGBoost Regressordatabricks-accelerators
Accelerate the use of Databricks for customers [public repo]tableau-connector
files_in_repos
upload-dbfs-temp
spark-sklearn-docs
sqltools-databricks-driver
SQLTools driver for Databricks SQLgenomics-pipelines
secondary analysis pipelines parallelized with apache sparkworkflows-examples
databricks-sdk-java
Databricks SDK for Javadais-cow-bff
Code for the "Path to Production" DAIS 2024 and 2023 talksxgboost-linux64
Databricks Private xgboost Linux64 forkmlflow-example-sklearn-elasticnet-wine
databricks-ttyd
setup-cli
Sets up the Databricks CLI in your GitHub Actions workflow.terraform-databricks-mlops-aws-project
This module creates and configures service principals with appropriate permissions and entitlements to run CI/CD for a project, and creates a workspace directory as a container for project-specific resources for the Databricks AWS staging and prod workspaces.jenkins-job-builder
Fork of https://docs.openstack.org/infra/jenkins-job-builder/ to include unmerged patchesterraform-databricks-mlops-azure-project-with-sp-creation
This module creates and configures service principals with appropriate permissions and entitlements to run CI/CD for a project, and creates a workspace directory as a container for project-specific resources for the Azure Databricks staging and prod workspaces. It also creates the relevant Azure Active Directory (AAD) applications for the service principals.terraform-databricks-sra
The Security Reference Architecture (SRA) implements typical security features as Terraform Templates that are deployed by most high-security organizations, and enforces controls for the largest risks that customers ask about most often.databricks-empty-ide-project
Empty IDE project used by the VSCode extension for Databricksdatabricks-repos-proxy
databricks-asset-bundles-dais2023
pex
Fork of pantsbuild/pex with a few Databricks-specific changesSnpEff
Databricks snpeff forknotebook_gallery
terraform-databricks-mlops-aws-infrastructure
This module sets up multi-workspace model registry between a Databricks AWS development (dev) workspace, staging workspace, and production (prod) workspace, allowing READ access from dev/staging workspaces to staging & prod model registries.expectations
homebrew-tap
Homebrew Tap for the Databricks CLIterraform-databricks-mlops-azure-infrastructure-with-sp-creation
This module sets up multi-workspace model registry between an Azure Databricks development (dev) workspace, staging workspace, and production (prod) workspace, allowing READ access from dev/staging workspaces to staging & prod model registries. It also creates the relevant Azure Active Directory (AAD) applications for the service principals.mfg_dlt_workshop
DLT Manufacturing Workshopdatabricks-dbutils-scala
The Scala SDK for Databricks.kdd24-forecasting-anomaly-detection
terraform-databricks-mlops-azure-project-with-sp-linking
This module creates and configures service principals with appropriate permissions and entitlements to run CI/CD for a project, and creates a workspace directory as a container for project-specific resources for the Azure Databricks staging and prod workspaces. It also links pre-existing Azure Active Directory (AAD) applications to the service principals.terraform-databricks-mlops-azure-infrastructure-with-sp-linking
This module sets up multi-workspace model registry between an Azure Databricks development (dev) workspace, staging workspace, and production (prod) workspace, allowing READ access from dev/staging workspaces to staging & prod model registries. It also links pre-existing Azure Active Directory (AAD) applications to the service principals.Love Open Source and this site? Check out how you can help us