What Went into Training DeepSeek-R1? – Epoch AI

If it is possible to train a model that would exceed GPT-4 in scale to the same degree that GPT-4 eclipses GPT-2 in training compute, would that model be capable of most programming work.

Gate: AI and Automation Scenario Explorer

Most AI value will come from broad automation, not from R & D

How has DeepSeek improved the Transformer architecture?