Cost π° - Unless you are deploying to the edge, you will not beat openai on
cost per generated token ποΈ. If you are at a scale where your OpenAI or Anthropic cloud bills are so large you can save money by running your own models, you know more about this than I do or at least you have the funding to hire someone that does. πΌπ‘
Architecture overview
Training process (pretraining) - Predict the next token, train on **~**1 trillion + tokens (roughly 700 Billion words) and ~ 5 million USD compute cost (gpt-3). Damn!
Autoregressive - for a given sequence of words, predicting word t+1 depends on all words <t+1
Model Quality determined by model size and training duration (assuming a large and diverse dataset)
Bigger is better for quality outputs, if compute resources are not limited.