Tag: depth
- The Impact of Depth and Width on Transformer Language Model Generalization (31 Oct 2023)
This is my reading note for The Impact of Depth and Width on Transformer Language Model Generalization. This paper shows that deeper transformer is necessary to have a good performance. Usually 4 to 6 layers is a good choice.
- 3D Reconstruction (15 Sep 2019)