Tag: stability
- Small-scale proxies for large-scale Transformer training instabilities (09 Oct 2023)
This is my reading note for Small-scale proxies for large-scale Transformer training instabilities. This paper discusses the method to improve model training stability related to hyper parameter.