• Stars
    star
    9,774
  • Rank 3,613 (Top 0.08 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 6 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

YSDA course in Natural Language Processing

YSDA Natural Language Processing course

  • This is the 2021 version. For previous year' course materials, go to this branch
  • Lecture and seminar materials for each week are in ./week* folders, see README.md for materials and instructions
  • YSDA homework deadlines will be listed in Anytask (read more).
  • Any technical issues, ideas, bugs in course materials, contribution ideas - add an issue
  • Installing libraries and troubleshooting: this thread.

Syllabus

  • week01 Word Embeddings

    • Lecture: Word embeddings. Distributional semantics. Count-based (pre-neural) methods. Word2Vec: learn vectors. GloVe: count, then learn. Evaluation: intrinsic vs extrinsic. Analysis and Interpretability. Interactive lecture materials and more.
    • Seminar: Playing with word and sentence embeddings
    • Homework: Embedding-based machine translation system
  • week02 Text Classification

    • Lecture: Text classification: introduction and datasets. General framework: feature extractor + classifier. Classical approaches: Naive Bayes, MaxEnt (Logistic Regression), SVM. Neural Networks: General View, Convolutional Models, Recurrent Models. Practical Tips: Data Augmentation. Analysis and Interpretability. Interactive lecture materials and more.
    • Seminar: Text classification with convolutional NNs.
    • Homework: Statistical & neural text classification.
  • week03 Language Modeling

    • Lecture: Language Modeling: what does it mean? Left-to-right framework. N-gram language models. Neural Language Models: General View, Recurrent Models, Convolutional Models. Evaluation. Practical Tips: Weight Tying. Analysis and Interpretability. Interactive lecture materials and more.
    • Seminar: Build a N-gram language model from scratch
    • Homework: Neural LMs & smoothing in count-based models.
  • week04 Seq2seq and Attention

    • Lecture: Seq2seq Basics: Encoder-Decoder framework, Training, Simple Models, Inference (e.g., beam search). Attention: general, score functions, models. Transformer: self-attention, masked self-attention, multi-head attention; model architecture. Subword Segmentation (BPE). Analysis and Interpretability: functions of attention heads; probing for linguistic structure. Interactive lecture materials and more.
    • Seminar: Basic sequence to sequence model
    • Homework: Machine translation with attention
  • week05 Transfer Learning

    • Lecture: What is Transfer Learning? Great idea 1: From Words to Words-in-Context (CoVe, ELMo). Great idea 2: From Replacing Embeddings to Replacing Models (GPT, BERT). (A Bit of) Adaptors. Analysis and Interpretability. Interactive lecture materials and more.
  • week06 Domain Adaptation

    • Lecture: General theory. Instance weighting. Proxy-labels methods. Feature matching methods. Distillation-like methods.
    • Seminar+Homework: BERT-based NER domain adaptation
  • week07 Model deployment, compression & acceleration

    • Lecture: how nlp models get deployed; how (and why) make your model faster and/or smaller
    • No assignment this time; instead, we showcase running a simple ML model in your browser
  • week08 Large Language Models & Their Implications

    • Lecture: more BERTology; Large language models: GPT-3, OPT, BLOOM; in-context learning, prompt engineering, parameter-efficient fine-tuning
    • Practice: prompt engineering and LoRA on a 6.7B model in colab

More TBA

Contributors & course staff

Course materials and teaching performed by

More Repositories

1

Practical_RL

A course in reinforcement learning in the wild
Jupyter Notebook
5,900
star
2

Practical_DL

DL course co-developed by YSDA, HSE and Skoltech
Jupyter Notebook
1,559
star
3

AgentNet

Deep Reinforcement Learning library for humans
Python
301
star
4

deep_vision_and_graphics

Course about deep learning for computer vision and graphics co-developed by YSDA and Skoltech.
Jupyter Notebook
300
star
5

speech_course

YSDA course in Speech Processing.
Jupyter Notebook
200
star
6

sdc_course

Short course about self-driving cars
JavaScript
157
star
7

roc_comparison

The fast version of DeLong's method for computing the covariance of unadjusted AUC.
Python
144
star
8

YSDA_deeplearning17

Yandex SDA classes on deep learning. Version of year 2017
Jupyter Notebook
116
star
9

MLatImperial2017

Materials for the course of machine learning at Imperial College organized by Yandex SDA
Jupyter Notebook
81
star
10

mlhep2016

Machine Learning in High Energy Physics 2016
Jupyter Notebook
75
star
11

sklearn-deeprl

Deep reinforcement learning. In scikit-learn. In less than 50 effective lines.
Jupyter Notebook
52
star
12

MLatGradDays

Course of Machine Learning in Science and Industry at Heidelberg university
Jupyter Notebook
47
star
13

mlhep2019

MLHEP'19 slides and notebooks
Jupyter Notebook
45
star
14

ml-training-website

ML Training website
HTML
43
star
15

python_public

Открытые материалы курса по питону
Jupyter Notebook
42
star
16

flavours-of-physics-start

Starter kit for "Flavours of Physics" challenge at Kaggle
Python
41
star
17

gumbel_lstm

Experiments with binary LSTM using gumbel-sigmoid
Jupyter Notebook
30
star
18

satellite-collision-avoidance

RL for optimal satellite collision avoidance maneuvres
Python
26
star
19

mlhep2018

MLHEP-18 slides and stuff
Jupyter Notebook
26
star
20

mlhep2017

MLHEP 2017 slides & seminars
Jupyter Notebook
26
star
21

CSC_deeplearning

3-day dive into deep learning at csc
Jupyter Notebook
25
star
22

manytask

The auto solution checking system for YSDA; server, storing grades and managing deadlines
Python
24
star
23

IDAO-2019-muon-id

Problem for IDAO 2019 on LHCb Muon Identification
Jupyter Notebook
24
star
24

MLatImperial2016

Materials for the course of machine learning at Imperial College organized by YSDA
Jupyter Notebook
23
star
25

ML-Handbook-materials

Notebooks and other media for ML Handbook
Jupyter Notebook
19
star
26

mlhep2015

MLHEP 2015 materials (http://hse.ru/mlhep2015)
Shell
19
star
27

modelgym

Gym for predictive models
Jupyter Notebook
17
star
28

mlhep2020-assignments

Jupyter Notebook
17
star
29

algorithms

Тут находится страница курса алгоритмов yandexdataschool.ru
CSS
15
star
30

students_projects

Темы студенческих проектов
Makefile
13
star
31

cms-dqm

CMS data quality monitoring
Jupyter Notebook
11
star
32

gumbel_dpg

Blog post: how to do deterministic policy gradient with gumbel softmax and why you should do it.
Jupyter Notebook
11
star
33

MLatImperial2020

Jupyter Notebook
11
star
34

tinyverse

Universe RL trainer platform. Simple. Supple. Scalable.
Jupyter Notebook
10
star
35

MLatImperial2018

ML at Imperial College
Jupyter Notebook
9
star
36

neurohack-2016-starterkit

neurohack starter kit
Jupyter Notebook
8
star
37

asml

Jupyter Notebook
7
star
38

dlatscale_draft

This is an early version of Deep Learning at Scale course for Yandex School of Data Analysis
Jupyter Notebook
7
star
39

inverse-problem-intensive

A short course on simulation-based infernce for physics at YSDA in April 2021
Jupyter Notebook
7
star
40

QuantileTransformerTF

Tensorflow implementation of sklearn.preprocessing.QuantileTransformer
Python
7
star
41

dqn_binder

a deep reinforcement learning tutorial
Jupyter Notebook
6
star
42

PreciseGAN

A research repo for studying different techniques towards making more precise GANs
Python
6
star
43

reproducible_analysis_course

A course on tools for collaborative and reproducible machine learning
Jupyter Notebook
6
star
44

MLHEP-2020-muon-id

Muon identification chellenge for MLHEP-2020
Jupyter Notebook
5
star
45

MLatImperial2022

Jupyter Notebook
5
star
46

mlhep-course-2016

materials for course on machine learning for HEP at YSDA
Jupyter Notebook
5
star
47

crayimage

A toolkit for image manipulation. Not for humans.
Python
5
star
48

ML-Handbook

JavaScript
5
star
49

dt

Python
5
star
50

mlhep-course-2017

materials for course on machine learning for HEP at YSDA
Jupyter Notebook
5
star
51

neurohack-2016-winners

Winners of neuroscience hackathon
Jupyter Notebook
5
star
52

mlhep2018-starterkit

Starter kit for MLHEP-18 challenge
Jupyter Notebook
5
star
53

cuda_course

Cuda
4
star
54

DataPopularity

Storage optimization for LHCb experiment.
Jupyter Notebook
4
star
55

dl-course-tensorflow

[in progress] Translating all our materials from https://github.com/ddtm/dl-course to tensorflow
4
star
56

darkmatter-2017

Jupyter Notebook
4
star
57

ship_tracks_recognition

Jupyter Notebook
3
star
58

checker

The auto solution checking system for YSDA; client, checking solutions and sending grades
Python
3
star
59

lilbert

Jupyter Notebook
3
star
60

mlda

Machine Learning, Data Analysis course materials
Makefile
3
star
61

MLatMISiS2018

Machine Learning track for Physics at MISiS
Jupyter Notebook
3
star
62

manchester-cp-asymmetry-tutorial

Manchester CP asymmetry tutorial
Jupyter Notebook
2
star
63

datanight2015-starterkit

Data Analysis Night StarterKit (https://academy.yandex.ru/events/data-analysis-night/2015/)
Python
2
star
64

REP_tutorial

Examples of using yandex/rep framework
Jupyter Notebook
2
star
65

cern_summer_school_2017

CERN openlab Summer School 2017 Machine Learning - Parts 3 &4
Jupyter Notebook
2
star
66

sentinels

sentinels data analysis
Jupyter Notebook
2
star
67

cern-higgsml-baseline

baseline solution for HiggsML challenge using data from CERN open data portal
Jupyter Notebook
2
star
68

crowd_course

Jupyter Notebook
1
star
69

cpp0_course

Python
1
star
70

KSfinder

Jupyter Notebook
1
star
71

HSE-DataNight-StarterKit

Everware version of HSE data night starter kit
Jupyter Notebook
1
star
72

aleph2015

Applying (machine) Learning to Experimental Physics (ALEPH) and «Flavours of Physics» challenge
HTML
1
star
73

everware-base-image

Python
1
star
74

mamontov-lhc-display

The Large Hadron Collider status display in Yandex HQ
JavaScript
1
star
75

pyretina

Python
1
star
76

reproducible_analysis_course_py3

Jupyter Notebook
1
star
77

eScience-2016-everware

The presentation for https://www.esciencecenter.nl/event/4th-national-escience-symposium
CSS
1
star
78

datanight2015-advanced-starterkit

Data Analysis Night Advanced StarterKit (https://academy.yandex.ru/events/data-analysis-night/2015/)
Python
1
star
79

moseskit

Train phrase-based machine translation in one bash command with decent defaults. Docker-powered.
Ruby
1
star
80

MLHEP2020-black-box

A competition for the MLHEP 2020 summer school.
Jupyter Notebook
1
star