Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy

Razvan Gabriel Dumitru, Paul Ioan Clotan, Vikas Yadav, Darius Peteleaza, Mihai Surdeanu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper introduces a novel model compression approach through dynamic layer-specific pruning in Large Language Models (LLMs), enhancing the traditional methodology established by SliceGPT.By transitioning from constant to dynamic slicing, our method leverages the newly proposed Layer Redundancy (LR) score, which assesses how much change each layer changes its input by measuring the cosine similarity of the input to the output of the layer.We use this score to prune parts of individual layers based on redundancy in such a way that the average pruned percentage for all layers is a fixed value.We conducted extensive experiments using models like Llama3-8B and Mistral-7B on multiple datasets, evaluating different slicing bases and percentages to determine optimal configurations that balance efficiency and performance.Our findings show that our dynamic slicing approach not only maintains but, in many cases, enhances model performance compared to the baseline established by constant slicing methods.For instance, in several settings, we see performance improvements of up to 5% over the SliceGPT baseline.Additionally, a perplexity decrease by as much as 7% was observed across multiple benchmarks, validating the effectiveness of our method.The code, model weights, and datasets are open-sourced at https://github.com/RazvanDu/DynamicSlicing.

Original languageEnglish (US)
Title of host publicationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
EditorsYaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
PublisherAssociation for Computational Linguistics (ACL)
Pages9912-9920
Number of pages9
ISBN (Electronic)9798891761681
DOIs
StatePublished - 2024
Event2024 Findings of the Association for Computational Linguistics, EMNLP 2024 - Hybrid, Miami, United States
Duration: Nov 12 2024Nov 16 2024

Publication series

NameEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024

Conference

Conference2024 Findings of the Association for Computational Linguistics, EMNLP 2024
Country/TerritoryUnited States
CityHybrid, Miami
Period11/12/2411/16/24

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy'. Together they form a unique fingerprint.

Cite this