add mask to matmul kernel in lecture 14 by noklam · Pull Request #52 · gpu-mode/lectures

noklam · 2025-04-16T15:49:24Z

fix the indices typo. If tpb.x = tbp.y and blocks.x = blocks.y, then the original way gives the correct result. Otherwise, not correct. ``` tpb = ns(x=16,y=16) blocks = ns(x=math.ceil(w/tpb.x), y=math.ceil(h/tpb.y)) ```

Tiny index fix

Fix readme typo

…etup fix: update environment setup for session 4

rectangle matmul with shared memory python

shared mem matmul cuda

Triton Internals Slides and Code

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

Fix bug in `triton_util.py`

Add lecture 30

Fixed typo in image

[refactor] replace hardcoded conda env, remove deprecated FindCUDA.cmake, added logic regarding libtorch download

* Slides/materials for Lecture 31 * Update README.md --------- Co-authored-by: Mark Saroufim <marksaroufim@meta.com>

Add Lecture 32 - Unsloth

* add lecture 33 slides and tutorial links * add README for folder_033

…de#39) * docs: add SGLang Performance Optimization GPU MODE talk slide * upd

add the slide_001 offered in google doc

Fix format

Create Qs.md

lancerts and others added 30 commits January 27, 2024 17:50

Update pmpp.ipynb

d4006ef

fix the indices typo. If tpb.x = tbp.y and blocks.x = blocks.y, then the original way gives the correct result. Otherwise, not correct. ``` tpb = ns(x=16,y=16) blocks = ns(x=math.ceil(w/tpb.x), y=math.ceil(h/tpb.y)) ```

Merge pull request gpu-mode#1 from lancerts/patch-1

0429593

Tiny index fix

Fix readme typo

18ff2bf

Merge pull request gpu-mode#2 from erjanmx/fix-readme-typo

af554d4

Fix readme typo

add session 4

5cdf20b

fix: session 4 env setup

7d5a5c2

Merge pull request gpu-mode#4 from NichitaDiaconu/fix-session_4_env_s…

a8c82b1

…etup fix: update environment setup for session 4

amend README for lecture 4

cde1cbf

Add Lecture 1

f1ee018

Update README.md

bebb72c

move lecture2 content into separate lecture2 folder

a18ae1a

Merge branch 'main' of github.com:cuda-mode/lectures

37132b6

.gitignore

e076997

incomplete matmul_l5.ipynb

3d15b4f

rectangle matmul with shared memory python

23c965b

mimic __syncthreads with threading.Barrier

002c70e

Merge pull request gpu-mode#5 from KeremTurgutlu/rectangle-matmul

6e7958f

rectangle matmul with shared memory python

working python threads

fc59bb6

shared mem matmul cuda

c82bef2

implemented dynamic shared mem

aea4ac8

Merge pull request gpu-mode#6 from KeremTurgutlu/cuda-matmul

63e733f

shared mem matmul cuda

lecture5 updates

9859c1e

numba

6ae4348

lecture5

b9de413

templated kernel

30ac07b

add lecture 5 & video links

1925780

merge titles

f9f27da

Include the cmd for the interpret mode in triton_square.py

629feb2

Merge pull request gpu-mode#11 from lancerts/jit-interpret

9d99cc8

add lecture8 content

ce7fd92

andreaskoepf and others added 29 commits September 29, 2024 02:35

Merge pull request gpu-mode#31 from kapilsh/main

a4e9f1b

Triton Internals Slides and Code

Fix bug in triton_util.py

ffcc35a

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

Merge pull request gpu-mode#32 from emmanuel-ferdman/main

313c296

Fix bug in `triton_util.py`

add lecture 30

b02cb17

Merge pull request gpu-mode#34 from gau-nernst/lecture_030

8028c6b

Add lecture 30

Fixed typo in image

a3ca796

Merge branch 'gpu-mode:main' into main

c5b76ef

Merge pull request gpu-mode#35 from UmerHA/main

cd5691a

Fixed typo in image

Merge pull request gpu-mode#29 from debashishc/l18-dev

7a6fc7d

[refactor] replace hardcoded conda env, remove deprecated FindCUDA.cmake, added logic regarding libtorch download

Slides/materials for Lecture 31 (gpu-mode#36)

518d021

* Slides/materials for Lecture 31 * Update README.md --------- Co-authored-by: Mark Saroufim <marksaroufim@meta.com>

Add Lecture 32 - Unsloth

9bc9de8

Merge pull request gpu-mode#37 from danielhanchen/main

63a6f67

Add Lecture 32 - Unsloth

Bitblas (gpu-mode#38)

5223aa5

* add lecture 33 slides and tutorial links * add README for folder_033

Add mobicham slides

2b49652

docs: add SGLang Performance Optimization GPU MODE talk slide (gpu-mo…

0a49c3a

…de#39) * docs: add SGLang Performance Optimization GPU MODE talk slide * upd

Update README.md

4f941f5

Lecture 36

0cf90d6

Lecture 37 - Introduction to SASS & GPU Microarchitecture (gpu-mode#40)

e3c47e3

add lec38 (gpu-mode#43)

b2c62ab

add slide_001 (gpu-mode#41)

cf846da

add the slide_001 offered in google doc

Create Qs.md

8cad72c

Fix format

e031509

Merge pull request gpu-mode#45 from zinccat/patch-1

052ae18

Fix format

Merge pull request gpu-mode#44 from UmerHA/patch-1

32d3664

Create Qs.md

Adding warmup steps to coarsening.cu (gpu-mode#46)

7314692

Update README.md

2a94cf9

erik lecture

12f35f5

Update README.md

45e5dee

add mask to matmul kernel

9735597

msaroufim force-pushed the main branch from c41f9d0 to b4df16e Compare June 15, 2026 04:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add mask to matmul kernel in lecture 14#52

add mask to matmul kernel in lecture 14#52
noklam wants to merge 110 commits into
gpu-mode:mainfrom
noklam:patch-1

noklam commented Apr 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

noklam commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

noklam commented Apr 16, 2025 •

edited

Loading