MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

Link
Abstract

LoRA achieves remarkable resource efficiency and comparable performance when adapting LLMs for specific tasks. Since ChatGPT demonstrated superior performance on various tasks, there has been a growing desire to adapt one model for all tasks. However, the explicit low-rank of LoRA limits the adaptation performance in complex multi-task scenarios. LoRA is dominated by a small number of top singular vectors while fine-tuning decomposes into a set of less important unitary transforms. In this paper, we propose MultiLoRA for better multi-task adaptation by reducing the dominance of top singular vectors observed in LoRA. MultiLoRA scales LoRA modules horizontally and change parameter initialization of adaptation matrices to reduce parameter dependency, thus yields more balanced unitary subspaces. We unprecedentedly construct specialized training data by mixing datasets of instruction follow, natural language understanding, world knowledge, to cover semantically and syntactically different samples. With only 2.5% of additional parameters, MultiLoRA outperforms single LoRA counterparts and fine-tuning on multiple benchmarks and model scales. Further investigation into weight update matrices of MultiLoRA exhibits reduced dependency on top singular vectors and more democratic unitary transform contributions.

Synth

Problem:: LoRA의 소수 특이값 의존 현상 때문에 MTL 성능이 저하됨 / 기존 MTL 방식들은 NLU에 국한됨

Solution:: 수평적 확장을 통해 동일한 파라미터 개수로 특이값 의존 현상 개선 / 더 다양한 태스크로 구성된 데이터 셋 구성

Novelty:: 특이값 분석을 통한 의존 현상 확인

Note:: 수학적으로 A 행렬에 대한 병렬화는 단순히 랭크를 늘린 것과 동일하지만, B 행렬에 대한 병렬화는 표현력 크게 향상시킴

Summary

Motivation

file-20250407082351124.png

(a)는 전체 View, (b)는 큰 특이값 부분만 확대한 View → LoRA는 FT와 달리 소수의 큰 특이값만 존재

Method

MultiLoRA 설계

file-20250407082029461.png|402

새로운 실험 환경

Method 검증

다중 작업 성능 결과

성능 안정성

자원 효율성

MultiLoRA 이해

특이값 분해 기반 부분 공간 비교

file-20250407082917678.png

특이값 분포 분석

file-20250407082943048.png

MultiLoRA 모듈 간 부분 공간 유사도 변동성

file-20250407083011670.png