Scaling Law Scaling law is one of the most important findings in LLMs (and neural networks in general) 1. You can make almost all important decisions about training of models with scaling law. For example you can choose model size, number of training steps 2, hyperparameters such as learning rate and batch size 3, learning rate schedules 4, mixture of training datasets 5, etc. So if you are serious about
In the field of large language models, the most important recipes to cook the model is not opened to publics. Model architecture itself is quite well-known because many state-of-the-art models are now open weights, and in many cases we find it is a boringly simple vanilla transformers. But for datasets and training objectives it is not well known, and many LLM builders deliberately obfuscates the details of these two. And,
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web LM을 다음 단어를 예측할 뿐이라거나 학습 데이터를 기억할 뿐이라는 식으로 묘사하는 것은 폄하를 위한 언어이지 LM의 실체나 실제 한계에 대해서 논하기에 적절한 방
타이틀 커버 이미지 출처: https://www.behance.net/gallery/6146939/OCR-A-Poster/modules/152114859 4년 동안 몰두했던 OCR이라는 주제를 마무리하게 되면서 으레 그래왔듯 회고를 남겨본다. 이랬더라면 어땠을까 같은 소소한 소회보다는