Skip to content

add mask to matmul kernel in lecture 14#52

Open
noklam wants to merge 110 commits into
gpu-mode:mainfrom
noklam:patch-1
Open

add mask to matmul kernel in lecture 14#52
noklam wants to merge 110 commits into
gpu-mode:mainfrom
noklam:patch-1

Conversation

@noklam

@noklam noklam commented Apr 16, 2025

Copy link
Copy Markdown

fix #51

@UmerHA

lancerts and others added 30 commits January 27, 2024 17:50
fix the indices typo.
If tpb.x = tbp.y and blocks.x = blocks.y, then the original way gives the correct result. Otherwise, not correct.
```
    tpb = ns(x=16,y=16)
    blocks = ns(x=math.ceil(w/tpb.x), y=math.ceil(h/tpb.y))
```
…etup

fix: update environment setup for session 4
rectangle matmul with shared memory python
andreaskoepf and others added 29 commits September 29, 2024 02:35
Triton Internals Slides and Code
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
[refactor] replace hardcoded conda env, remove deprecated FindCUDA.cmake, added logic regarding libtorch download
* Slides/materials for Lecture 31

* Update README.md

---------

Co-authored-by: Mark Saroufim <marksaroufim@meta.com>
* add lecture 33 slides and tutorial links

* add README for folder_033
…de#39)

* docs: add SGLang Performance Optimization GPU MODE talk slide

* upd
add the slide_001 offered in google doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lecture 14 weird result with TRITON_INTERPRET=1