If you've used GPT-2 and then used GPT-3, it's shocking how much better GPT-3 is across the board. Going from 1.5 billion parameters to 175 billion parameters might not sound like that much of a difference ~116x (relatively speaking, of course), but if you have used either models you can just feel the difference GPT-3 makes across so many domains and under close scrutiny and examination.
The Chasm Between GPT-2 and GPT-3
The Chasm Between GPT-2 and GPT-3
The Chasm Between GPT-2 and GPT-3
If you've used GPT-2 and then used GPT-3, it's shocking how much better GPT-3 is across the board. Going from 1.5 billion parameters to 175 billion parameters might not sound like that much of a difference ~116x (relatively speaking, of course), but if you have used either models you can just feel the difference GPT-3 makes across so many domains and under close scrutiny and examination.