Scaling Law, Architecture for Stability and Layer Stacking

September 11, 2024 sorta-informative

Scaling Law Scaling law is one of the most important findings in LLMs (and neural networks in general) 1. You can make almost all important decisions about training of models with scaling law. For example you can choose model size, number of training steps 2, hyperparameters such as learning rate and batch size 3, learning rate schedules 4, mixture of training datasets 5, etc. So if you are serious about …

Preliminary Explorations on UL2 and Second-order Optimizers

June 4, 2024 sorta-informative

In the field of large language models, the most important recipes to cook the model is not opened to publics. Model architecture itself is quite well-known because many state-of-the-art models are now open weights, and in many cases we find it is a boringly simple vanilla transformers. But for datasets and training objectives it is not well known, and many LLM builders deliberately obfuscates the details of these two. And, …

Constitutional AI

July 31, 2023 sorta informative

Helpful & Harmless Agent AI 모델의 정렬(Alignment)이라고 이야기할 때 흔히 나오는 Helpfulness와 Harmlessness는 어떤 의미인가? 이는 정의 …

이미지와 텍스트 생성 모델에 대해

February 15, 2023 sorta informative

이미지 생성 하면 Style GAN이었던 시절에도 일러스트 생성 등은 오타쿠적 인기가 있는 주제였다. 문제의 Danbooru 데이터셋 같은 경우에도 그 시점에 이미 만들어진 데이터셋이었 …

언어의 손실 압축에 대하여

February 15, 2023 sorta informative

https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web LM을 다음 단어를 예측할 뿐이라거나 학습 데이터를 기억할 뿐이라는 식으로 묘사하는 것은 폄하를 위한 언어이지 LM의 실체나 실제 한계에 대해서 논하기에 적절한 방 …