

Note: there’s an argument that in worlds where Anthropic’s research is less useful, Anthropic is also contributing much less to actually-dangerous race dynamics, since faster progress in LLMs won’t necessarily lead to shorter timelines if LLMs aren’t a route to AGI.There’s a high enough probability that whatever method ends up getting us to AGI will be, basically, an extension and further exploration of current deep learning, rather than a completely new kind of architecture that doesn’t even share the same basic building blocks.
#Anthropic theory how to
Or there may just be practical/empirical evidence of something like an “alignment attractor basin” and knowledge of how to practically stay in it.This could at some point lead to a more general theory or theories of alignment.We don’t understand deep learning systems, and we don’t have a theoretical approach we’re at the point where actually just running experiments on current models and observing the results is the best way to get information.Yes, it’s a plan with some risks, but we don’t have any better ideas yet. There is not, currently, an obviously better plan or route to solving alignment, that doesn’t involve keeping up with state-of-the-art large models.My current sense is that this strategy makes sense under a particular set of premises: Alternatively stated, Anthropic leadership believes that you can’t solve the problem of aligning AGI independently from developing AGI. I think Anthropic believes that this is the most promising route to making AGI turn out well for humanity, so it’s worth taking the risk of being part of the competition and perhaps contributing to accelerating capabilities. They’ve recently been hiring for a product team, in order to get more red-teaming of models and eventually have more independent revenue streams. They have received outside investment, because keeping up with state of the art is expensive, and going to get moreso.

They’re aiming to be one of the “top players”, competitive with OpenAI and Deepmind, working with a similar level of advanced models. I will make a separate post about the inferences and conclusions I’ve reached personally about working at Anthropic, based on the info I’m sharing here.Īnthropic is planning to grow. This post contains “observations” only, which I wanted to write up as a reference for anyone considering similar questions. (The impetus for looking into this was to answer the question of whether I should join Anthropic's ops team.) As part of my research, I read a number of Anthropic’s published papers, and spoke to people within and outside of Anthropic.
#Anthropic theory series
This post is the first half of a series about my attempts understand Anthropic’s current strategy and lay out the facts to consider in terms of whether Anthropic’s work is likely to be net positive and whether, as a given individual, you should consider applying.
