Why & how to train and run your own LLM?

Why

Improve performance on your specific use case. 🎯
Speed up the text generation. ⏩
Privacy & Control - for you and your customers. 🔐

Bad reasons:

Cost 💰 - Unless you are deploying to the edge, you will not beat openai on

cost per generated token 🎟️. If you are at a scale where your OpenAI or Anthropic cloud bills are so large you can save money by running your own models, you know more about this than I do or at least you have the funding to hire someone that does. 💼💡

Untitled

What is an LLM and how is it trained.

Architecture overview

Training process (pretraining) - Predict the next token, train on **~**1 trillion + tokens (roughly 700 Billion words) and ~ 5 million USD compute cost (gpt-3). Damn!

Untitled

Autoregressive - for a given sequence of words, predicting word t+1 depends on all words <t+1

What determines the quality and speed of generated text?

Model Quality determined by model size and training duration (assuming a large and diverse dataset)

Untitled

Bigger is better for quality outputs, if compute resources are not limited.

https://arxiv.org/pdf/2202.01169.pdf