LLM fine-tune experience
Personal experience
- care the changing of loss/reward and test dataste performance, ensure they change with same trend, otherwise, reward hacking / invalid loss function appear
- adjust learning-rate and regularization penalty by changing of loss with training steps
- verfy idea with pure comparing experiment (scientific control)
- retry same experiment to exclude influence of random
- only change hyper-parameter when training process doesn’t work well
- make llm output process before output final answer
- try to process regulazation by code rather than in prompt as much as possible
This post is licensed under CC BY 4.0 by the author.