跳转到主要内容

category

This is the 6th article in a series on using large language models (LLMs) in practice. Previous articles explored how to leverage pre-trained LLMs via prompt engineering and fine-tuning. While these approaches can handle the overwhelming majority of LLM use cases, it may make sense to build an LLM from scratch in some situations. In this article, we will review key aspects of developing a foundation LLM based on the development of models such as GPT-3, Llama, Falcon, and beyond.

Photo by Frames For Your Heart on Unsplash

Historically (i.e. less than 1 year ago), training large-scale language models (10b+ parameters) was an esoteric activity reserved for AI researchers. However, with all the AI and LLM excitement post-ChatGPT, we now have an environment where businesses and other organizations have an interest in developing their own custom LLMs from scratch [1]. Although this is not necessary (IMO) for >99% of LLM applications, it is still beneficial to understand what it takes to develop these large-scale models and when it makes sense to build them.