Scaling Law Scaling law is one of the most important findings in LLMs (and neural networks in general) 1. You can make almost all important decisions about training of models with scaling law. For example you can choose model size, number of training steps 2, hyperparameters such as learning rate and batch size 3, learning rate schedules 4, mixture of training datasets 5, etc. So if you are serious about …
In the field of large language models, the most important recipes to cook the model is not opened to publics. Model architecture itself is quite well-known because many state-of-the-art models are now open weights, and in many cases we find it is a boringly simple vanilla transformers. But for datasets and training objectives it is not well known, and many LLM builders deliberately obfuscates the details of these two. And, …
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web LM을 다음 단어를 예측할 뿐이라거나 학습 데이터를 기억할 뿐이라는 식으로 묘사하는 것은 폄하를 위한 언어이지 LM의 실체나 실제 한계에 대해서 논하기에 적절한 방 …
https://sinews.siam.org/Details-Page/deep-deep-trouble 뉴럴넷 연구를 하던 사람들이 오랜 겨울을 지나왔던 것처럼 이미지 처리에서, 이젠 전통적인 방법이라고 불리는 방법들을 연구하던 사람들의 고민이 깊은 모양이다. 뉴 …