• Stars
    star
    215
  • Rank 182,850 (Top 4 %)
  • Language
    Python
  • License
    Other
  • Created over 1 year ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Large Language Models: Foundation Models from the Ground Up

This repo contains the notebooks and slides for the Large Language Models: Foundation Models from the Ground Up course on edX & Databricks Academy.

Note: this is the second course in the two-part series. For the first installment please see the course on edX & Databricks Academy as well as the supporting repo.

Notebooks

How to Import the Repo into Databricks?

  1. You first need to add Git credentials to Databricks. Refer to documentation here.

  2. Click Repos in the sidebar. Click Add Repo on the top right.

    repo_1
  3. Clone the "HTTPS" URL from GitHub, or copy https://github.com/databricks-academy/llm-foundation-models.git and paste into the box Git repository URL. The rest of the fields, i.e. Git provider and Repository name, will be automatically populated. Click Create Repo on the bottom right.

    add_repo

How to Import the files from .dbc releases on GitHub

  1. You can download the notebooks from a release by navigating to the releases section on the GitHub page:

    github_release=
  2. From the releases page, download the .dbc file. This contains all of the course notebooks, with the structure and meta data.

    github_assets
  3. In your Databricks workspace, navigate to the Workspace menu, click on Home and select Import:

    workspace_import
  4. Using the import tool, navigate to the location on your computer where the .dbc file was dowloaded from Step 1. Once you select the file, click Import, and the files will be loaded and extracted to your workspace:

    select_import_file
Cluster settings

Which Databricks cluster should I use?

  1. First, select Single Node

    single_node
  2. This courseware has been tested on Databricks Runtime 13.3 LTS for Machine Learning. If you do not have access to a 13.3 LTS ML Runtime cluster, you will need to install many additional libraries (as the ML Runtime pre-installs many commonly used machine learning packages), and this courseware is not guaranteed to run.

    cluster

    For Module 1 and 3 notebooks, you can run them on i3.xlarge just fine. We recommend i3.2xlarge for Module 2 and 4 notebooks.

    cpu_settings
Slides

Where do I download course slides?

Please click the latest version under the Releases section. You will be able to download the slides in PDF.

More Repositories

1

data-engineering-with-databricks-english

Python
1,089
star
2

large-language-models

Notebooks for Large Language Models (LLMs) Specialization
Python
726
star
3

apache-spark-programming-with-databricks-english

Python
242
star
4

advanced-data-engineering-with-databricks

Python
232
star
5

data-analysis-with-databricks-sql

Python
114
star
6

ml-in-production-english

Machine Learning in Production
Python
99
star
7

scalable-machine-learning-with-apache-spark-english

Python
79
star
8

INT-JEPFS-V2-IL

This is the legacy version of the course that pairs with the self-paced version and its recordings which reference this repo.
Python
73
star
9

dbacademy

Internal library used to develop and test Databricks Academy courseware
Python
64
star
10

data-engineer-learning-path

Python
63
star
11

just-enough-python-for-spark

The published version of the IL course Just Enough Python for Spark
Python
45
star
12

deep-learning-with-databricks

Owner: Jacob Parr
Python
29
star
13

please-see-databricks-academy

The materials for this course are no longer available through GitHub. For more information, please see this repos' README file.
17
star
14

databricks-project

Python
13
star
15

streaming-lakehouse

Python
13
star
16

scaling-machine-learning-pipelines

Python
12
star
17

data-engineering-with-databricks-japanese

Python
12
star
18

get-started-with-data-engineering-on-databricks-repo-example

Python
12
star
19

introduction-to-python-for-data-science-and-data-engineering-english

Introduction to Python for Data Science & Data Engineering
Python
11
star
20

developer-foundations-capstone

The published version of the partner's Developer Foundation Capstone Project.
Python
8
star
21

advanced-data-engineering-with-databricks-demo-test-data-setup

This simple repo stores a notebook used to create a test dataset for ci/cd.
Python
8
star
22

cli-demo

Public resources for Databricks CLI demo
Python
7
star
23

intro-to-repos

Python
5
star
24

INT-JESFS-V1-IL

Scala
5
star
25

optimizing-apache-spark-on-databricks

5
star
26

just-enough-scala-for-spark

Scala
4
star
27

natural-language-processing

Python
4
star
28

elt-with-spark-sql

Python
4
star
29

experiment-tracking-with-mlflow-source

Python
3
star
30

spark-local-execution

Python
3
star
31

new-capability-overview-automl-source

Owner: Mark Roepke | Repository for the "Introduction to AutoML" self-paced course
Python
3
star
32

workspace-setup

Python
3
star
33

template-course

The student repository template for a new Databricks Academy course. (go/da/template)
Python
3
star
34

example-course

Owner: Jacob Parr | This is the student-facing, public repo to which the sample-courseware-repo-source repo will publish to. Note: this repo is meant to remain private in that it is here for demonstration purposes - in normal cases, this repo would public
Python
3
star
35

apache-spark-mooc-course-1

Owner: Jacob Parr & Kevin Coyle | The public version of the Apache Spark MOOC
2
star
36

dbacademy-gems

This project has been replaced by https://github.com/databricks-academy/dbacademy
Python
2
star
37

developer-advanced-capstone

The published version of the partner's Developer Advanced Capstone Project.
Python
2
star
38

intro-to-files-in-repos

Python
1
star
39

developer-essentials-capstone

Python
1
star
40

python-package

Python
1
star
41

new-capability-overview-time-series-forecasting-in-automl-source

Python
1
star
42

ml-in-production-japanese

Python
1
star
43

dbacademy-courseware

This project has been replaced by https://github.com/databricks-academy/dbacademy
Python
1
star