Scaling Law Scaling law is one of the most important findings in LLMs (and neural networks in general) 1. You can make almost all important decisions about training of models with scaling law. For example you can choose model size, number of training steps 2, hyperparameters such as learning rate and batch size 3, learning rate schedules 4, mixture of training datasets 5, etc. So if you are serious about …
In the field of large language models, the most important recipes to cook the model is not opened to publics. Model architecture itself is quite well-known because many state-of-the-art models are now open weights, and in many cases we find it is a boringly simple vanilla transformers. But for datasets and training objectives it is not well known, and many LLM builders deliberately obfuscates the details of these two. And, …
https://sinews.siam.org/Details-Page/deep-deep-trouble 뉴럴넷 연구를 하던 사람들이 오랜 겨울을 지나왔던 것처럼 이미지 처리에서, 이젠 전통적인 방법이라고 불리는 방법들을 연구하던 사람들의 고민이 깊은 모양이다. 뉴 …
https://deepmind.com/blog/understanding-agent-cooperation/ 최근에 인공지능에 승부욕이 있다느니 혹은 공격성을 보였다느니 하는 식으로 소개된 딥마인드의 연구다. 사실 연구의 핵심은 두 행위자들을 강화학습으로 훈련시켜서 …