# Acceleration of LLM - Matrix Multiplication

## Background

After read “Manual Autograd” in unsloth’s blog, I try to parse model and found more related point where we can optimize.

torchview is a great similar tool to use.

## torchview

### what torchview can do

I want to show what torchview can do after I try it.

- Model: torchview can parse model when inferencing and training, it support mlp, bert, Gemma, llama3.2.
- Node: the smallest node is tensor, module(like attention), function(like nn.funtion).
- Shape: show the input shape and output shape for every basic node.
- Edge: show the input and ouput relation between basic node.

Showing node and related information:

1
2
3
4
5
6
7
8
9
10

model = AutoModel.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
inputs = tokenizer("Hello world!", return_tensors="pt")
model_graph = draw_graph(model, input_data=inputs,
save_graph = True,
filename = 'output')
print (len(model_graph.edge_list))
for a, b in model_graph.edge_list:
print (a, b, type(a), type(b))

### what torchview view can’t so far

Attention: there are much softmax or activation functions in general model, the only three consecutive matrix multiplication is `(maxtrix_intput * W_q) * (maxtrix_intput * W_k)`

, but it can not be optimized because there is no much difference between $d_input$ and $d_hidden$.

Parse module: torchview can not parse the specific module so far, there are so much special case in module, like llamaAttention. But, if we have specific input data, it can follow a specific path to execute the code, it seems that torchview works in this way because input data or input size is necessary for torchview, I didn’t research much more about that.

## Things worth explore

Optmization of matrix multiplication still can be used in other module, like

- LoRA, as said in unsloth
- Autograd in backward, maybe

## Conclusion

Failling on this indicate that I always think too much but read insufficiently. Simple idea can not work in most situations.