Outline of LLM acceleration
Summary Methods There are two main methods to acclerate LLM and another tricky methods low-rank: reduce dimension of matrix block: compute matrix with block trick: update model structure o...
Summary Methods There are two main methods to acclerate LLM and another tricky methods low-rank: reduce dimension of matrix block: compute matrix with block trick: update model structure o...
Product LocalPictureCompress Spent whole one day to build LocalPictureCompress, really enjoy the monent when I publish it. Try AI code assistants continue: open source product, supports OpenA...
Product New ideas computer use by local models polish anything by local models YouTube Upload five videos this week Reading Google build-AI challenge OpenAI ask me anything anthro...
Product LLM acceleration read one paper Flash-attention: compute attention by blocks YouTube Upload five videos this week and start to try codeforces problems. Codeforces problems always c...
Backgroud There are two common kinds of bound which limited the speed of training in deep learning. Memeory-bound: time spent on memeory-access is bottlenecked Computation-bound: time spent o...
Collect, summary and adjust to get the following tutorial from multi-sources in Reference. How to learn Very quickly identify what the foundational knowledge is Build a personal curriculum to...
Product LLM acceleration Matrix Multiplcation: Read more LoRA: start reading paper YouTube Upload four videos this week and receive 5 Subscribers Blog Update current blog to jekyll-theme-chi...
Introduction Inspiration: the change in weights during model adaptation have a low “intrinsic rank” Description: Change small matrices A and B when fine-tune, adding A * B to weight W, which sign...
Background After read “Manual Autograd” in unsloth’s blog, I try to parse model and found more related point where we can optimize. torchview is a great similar tool to use. torchview what torc...
Product Voice correction: failed, technology is not enough mature to meet this demand. PopTranslate: First available product. Cost of product with server is higher than expectation. unsloth:...