Editing Models with Task Arithmetic

Link
Abstract

Changing how pre-trained models behave -- e.g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine learning systems. In this work, we propose a new paradigm for steering the behavior of neural networks, centered around \textit{task vectors}. A task vector specifies a direction in the weight space of a pre-trained model, such that movement in that direction improves performance on the task. We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task. We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition, and the behavior of the resulting model is steered accordingly. Negating a task vector decreases performance on the target task, with little change in model behavior on control tasks. Moreover, adding task vectors together can improve performance on multiple tasks at once. Finally, when tasks are linked by an analogy relationship of the form ``A is to B as C is to D", combining task vectors from three of the tasks can improve performance on the fourth, even when no data from the fourth task is used for training. Overall, our experiments with several models, modalities and tasks show that task arithmetic is a simple, efficient and effective way of editing models.

Synth

Problem:: 하나의 모델에 대해 세상에 사용 할 수 있는 Weight가 매우 많음

Solution:: 하나의 모델의 Weight들을 단순 연산을 통해 향상시키자

Novelty:: 세상에 많은 Weight들을 단순한 연산을 통해 효과적으로 이용하는 방식

Note:: 유사한 Task간의 Weight 유사도가 높다는 것을 보임 → 서로 다른 Task는 거의 Orthogonal → 직교하므로 더하더라도 벡터 내적 결과에 큰 영향이 없음 → 동일 Feature에 대해 서로 다른 Branch의 Feature가 유사하면 서로 상충된다? /Figure 7은 좀 의심스러움 → 아주 적은 Step으로 학습된 Task Vector를 더한 것으로 높은 성능이 달성된다? (아주 적은 학습으로 애초에 해당 Task의 성능이 높았다면 말이됨. 단, 이 경우 최종 성능을 달성하는것까지의 시간이 짧아진다가 주장이 될텐데 글 작성이 모호함)

Summary

Motivation

Method

file-20250325224828587.png

Task Arithmetic 연산

  1. Negation(부정): τnew=τ
    • 원치 않는 행동 제거 또는 특정 Ttask 잊기에 활용
  2. Addition(덧셈): τnew=iτi
    • 다중 task에 대한 모델 생성
    • 경우에 따라 개별 Fine-tuned 모델보다 우수한 성능 달성 가능
  3. Analogy(유추): τD=τC+(τBτA)
    • "A는 B에 대해 C는 D와 같다" 형태의 관계 활용
    • D task에 대한 데이터가 적거나 없어도 성능 향상 가능

Method 검증

Forgetting via Negation

Learning via Addition

Task Analogies

Ablation

Limitation