LoRA: Low-Rank Adaptation of Large Language Models

Link
Abstract

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by a factor of 10,000 and the GPU memory requirement by a factor of 3. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.

Synth

Problem:: 대규모 언어 모델 파인튜닝의 계산 및 저장 비용 부담 / 기존 적응 기법들의 추론 지연 및 성능 저하 문제

Solution:: 사전 학습 가중치는 고정하고 훈련 가능한 저순위 분해 행렬을 주입 / 작업별 파라미터 수와 GPU 메모리 요구량 대폭 감소

Novelty:: 모델 적응에서 가중치 변화의 낮은 본질적 순위 특성 발견 / 추론 지연 없이 전체 파인튜닝과 동등한 성능 달성 / 적은 순위(r≤4)만으로도 높은 차원에서 효과적 기능 / 다양한 작업 간 빠른 전환 가능 / 기존 방법들과 직교하여 결합 가능

Note:: Motivation과 방법이 잘 연결되어 있고, 실험 성능이 이를 잘 뒷받침함. 실용성도 높음

Summary

Motivation

Method

file-20250402224605415.png|389

Low-Rank-Parametrized Update Matrices

Transformer에 LoRA 적용

실용적 이점과 한계

Method 검증

실험 환경

RoBERTa 및 DeBERTa 실험 결과

GPT-2 실험 결과

GPT-3 175B 실험 결과

추론 지연 시간 분석