Skip to content

CSCfi/llm-quantization-scripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

197 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Quantization

This repository explores quantization methods for Large Language Models (LLMs).
Quantization is a key technique for reducing model size and inference cost, enabling LLMs to run efficiently on consumer hardware or limited GPU memory.

We provide examples and experiments for:


πŸ“– What to Expect in This Repo

  1. Implementation Examples

    • Scripts for loading, quantizing, and saving models.
    • Examples include small models (for example; facebook/opt-125m) so you can try things quickly, and notes for scaling to larger models.
  2. Benchmarks

    • Inference time comparisons before and after quantization.
    • Model size reduction (disk footprint in MB).
  3. Guides & Utilities

    • Helper functions for measuring model size, timing inference, and testing the quantized model.
    • Notes on how to run the examples on Puhti, Mahti and LUMI.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors