What happens inside artificial neural networks? This was the same question the scientists from Antrophic had asked themselves. Chris Olah is an AI researcher who, in the past decade, has analyzed the behavior of artificial neural networks. His experience includes his work at Google Brain and OpenAI, and now he is doing work at Anthropic, where he is also the co-founder.
Chris Olah began to question, “What's going on inside of them?” also stating, “We have these systems, we don't know what's going on. It seems crazy.”. So, after all the updates that AI has and advancements in any domain, we can say that we don’t really know what happens inside AI neural networks. Think about ChatGPT, Gemini, Clade, or other AI networks that we use in our daily lives, but we have no idea how they can mimic human behavior and how they really work. This is due to the fact that they are not totally programmed by human engineers, as standard computer programs are.
AI systems use machine learning programs that give them the capacity to learn on their own by being fed large amounts of data. AI systems can identify patterns, learn language relationships, and use them to anticipate situations, respond accordingly, and provide relevant answers. However, being built this way, engineers don’t really know what happens inside those “black boxes”. So, solving errors is much more difficult.
Chris Olah believes that not knowing what happens inside them can’t let us know how we can make them safer. This is why he and other researchers from Anthropic decided to lead the research to discover what happens inside them. The team tries to cover how Claude, the AI system, reacts to certain stimuli in a similar manner as neuroscientists analyze and interpret MRI scans.
Chris Olah explained that he and his team treat artificial intelligence neurons as letters from Western alphabets, which are known not to carry meaning alone. But when they are grouped together, using a similar approach, we use letters to form words to give them meaning. One of the researchers, Tom Henighan, stated, “We tried a whole bunch of stuff, and nothing was working. It looked like a bunch of random garbage” until they started associating them with letters and linking them with features. The team explained in their article about Mapping the Mind of a Large Language Model how the process works.
Their work will continue to discover and dig deeper into the understanding of AI and its neuronal link, cracking the black box.