# Notes of LoRA

## Introduction

Inspiration: the change in weights during model adaptation have a low “intrinsic rank”

Description: Change small matrices `A`

and `B`

when fine-tune, adding `A * B`

to weight `W`

, which significantly reduce the trainable number of parameters because `r << d`

Novelty: Reducing Fine-tuen cost without additional latency and quality reduction or changing input consturction at the same time

## Benefits

1) Efficiently switch models for different target tasks through switching LoRA matrices 2) Reduce consuming time and hardware requirement when Fine-tune 3) Without any inference lantency 4) Orthogonal with other adaptation method

## Further information

1) LoRA have better scalability and performances

2) Adapting more weight matrices is preferable than adapting a single type of weights with a larger rank, 2 or 4 is a great option

3) Similiarity of LoRA vectors among different rank(r) is higher when dimention is small, which prove (1)low-demension consists more information, (2) small r is enough

4) Matrices of LoRA is similiar with original weight mtraices, LoRA is a amplification of original information.

## Summary

1) Purpose of auther? reduce the cost of fine-tuen without any loss

2) Key of new method? change of adaptation have a low “intrinsic rank”

3) What is useful for me? large Matrices in LLM have a low “intrinsic rank”? New fine-tune method

4) What references is necessary to read? Where “intrinsic rank” comes from?

- Measuring the Intrinsic Dimension of Objective Landscapes
- Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

1-4 from Andrew NG

5) new idea

rand-deficiency of delta_w suggests that w could be rank-deficient as well, which can be a source of inspiration of future of works.