• Stars
    star
    726
  • Rank 61,968 (Top 2 %)
  • Language
    Python
  • License
    Other
  • Created over 1 year ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Notebooks for Large Language Models (LLMs) Specialization

Large Language Models

This repo contains the notebooks and slides for the Large Language Models: Application through Production course on edX & Databricks Academy.

Notebooks

How to Import the Repo into Databricks?

  1. You first need to add Git credentials to Databricks. Refer to documentation here.

  2. Click Repos in the sidebar. Click Add Repo on the top right.

    repo_1
  3. Clone the "HTTPS" URL from GitHub, or copy https://github.com/databricks-academy/large-language-models.git and paste into the box Git repository URL. The rest of the fields, i.e. Git provider and Repository name, will be automatically populated. Click Create Repo on the bottom right.

    add_repo

How to Import the files from .dbc releases on GitHub

  1. You can download the notebooks from a release by navigating to the releases section on the GitHub page:

    dbc_release1
  2. From the releases page, download the .dbc file. This contains all of the course notebooks, with the structure and meta data.

    dbc_release2
  3. In your Databricks workspace, navigate to the Workspace menu, click on Home and select Import:

    dbc_release3
  4. Using the import tool, navigate to the location on your computer where the .dbc file was dowloaded from Step 1. Once you select the file, click Import, and the files will be loaded and extracted to your workspace:

    dbc_release4
Cluster settings

Which Databricks cluster should I use?

  1. First, select Single Node

    single_node
  2. This courseware has been tested on Databricks Runtime 13.1 for Machine Learning. If you do not have access to a 13.1 ML Runtime cluster, you will need to install many additional libraries (as the ML Runtime pre-installs many commonly used machine learning packages), and this courseware is not guaranteed to run.

    cluster

    For all of the notebooks except LLM 04a - Fine-tuning LLMs and LLM04L - Fine-tuning LLMs Lab, you can run them on a CPU just fine. We recommend either i3.xlarge or i3.2xlarge (i3.2xlarge will have slightly faster performance).

    cpu_settings

    For these notebooks: LLM 04a - Fine-tuning LLMs and LLM04L - Fine-tuning LLMs Lab, you will need the Databricks Runtime 13.1 for Machine Learning with GPU.

    gpu

    Select GPU instance type of g5.2xlarge.

    gpu_settings
Install datasets and models

How do I install the datasets and models locally?

  1. To improve performance of the code, we highly recommend pre-installing the datasets and models by running the LLM 00a - Install Datasets notebook.
    install_datasets_file

  2. You should run this script before running any of the other notebooks. This can take up to 25mins to complete. install_datasets_notebook

Slides

Where do I download course slides?

Please click the latest version under the Releases section. You will be able to download the slides in PDF.

More Repositories

1

data-engineering-with-databricks-english

Python
1,089
star
2

apache-spark-programming-with-databricks-english

Python
242
star
3

advanced-data-engineering-with-databricks

Python
232
star
4

llm-foundation-models

Python
215
star
5

data-analysis-with-databricks-sql

Python
114
star
6

ml-in-production-english

Machine Learning in Production
Python
99
star
7

scalable-machine-learning-with-apache-spark-english

Python
79
star
8

INT-JEPFS-V2-IL

This is the legacy version of the course that pairs with the self-paced version and its recordings which reference this repo.
Python
73
star
9

dbacademy

Internal library used to develop and test Databricks Academy courseware
Python
64
star
10

data-engineer-learning-path

Python
63
star
11

just-enough-python-for-spark

The published version of the IL course Just Enough Python for Spark
Python
45
star
12

deep-learning-with-databricks

Owner: Jacob Parr
Python
29
star
13

please-see-databricks-academy

The materials for this course are no longer available through GitHub. For more information, please see this repos' README file.
17
star
14

databricks-project

Python
13
star
15

streaming-lakehouse

Python
13
star
16

scaling-machine-learning-pipelines

Python
12
star
17

data-engineering-with-databricks-japanese

Python
12
star
18

get-started-with-data-engineering-on-databricks-repo-example

Python
12
star
19

introduction-to-python-for-data-science-and-data-engineering-english

Introduction to Python for Data Science & Data Engineering
Python
11
star
20

developer-foundations-capstone

The published version of the partner's Developer Foundation Capstone Project.
Python
8
star
21

advanced-data-engineering-with-databricks-demo-test-data-setup

This simple repo stores a notebook used to create a test dataset for ci/cd.
Python
8
star
22

cli-demo

Public resources for Databricks CLI demo
Python
7
star
23

intro-to-repos

Python
5
star
24

INT-JESFS-V1-IL

Scala
5
star
25

optimizing-apache-spark-on-databricks

5
star
26

just-enough-scala-for-spark

Scala
4
star
27

natural-language-processing

Python
4
star
28

elt-with-spark-sql

Python
4
star
29

experiment-tracking-with-mlflow-source

Python
3
star
30

spark-local-execution

Python
3
star
31

new-capability-overview-automl-source

Owner: Mark Roepke | Repository for the "Introduction to AutoML" self-paced course
Python
3
star
32

workspace-setup

Python
3
star
33

template-course

The student repository template for a new Databricks Academy course. (go/da/template)
Python
3
star
34

example-course

Owner: Jacob Parr | This is the student-facing, public repo to which the sample-courseware-repo-source repo will publish to. Note: this repo is meant to remain private in that it is here for demonstration purposes - in normal cases, this repo would public
Python
3
star
35

apache-spark-mooc-course-1

Owner: Jacob Parr & Kevin Coyle | The public version of the Apache Spark MOOC
2
star
36

dbacademy-gems

This project has been replaced by https://github.com/databricks-academy/dbacademy
Python
2
star
37

developer-advanced-capstone

The published version of the partner's Developer Advanced Capstone Project.
Python
2
star
38

intro-to-files-in-repos

Python
1
star
39

developer-essentials-capstone

Python
1
star
40

python-package

Python
1
star
41

new-capability-overview-time-series-forecasting-in-automl-source

Python
1
star
42

ml-in-production-japanese

Python
1
star
43

dbacademy-courseware

This project has been replaced by https://github.com/databricks-academy/dbacademy
Python
1
star