work better than ever before and accomplish better results than its previous models. With these capabilities, Gemini claimed that its software is able to scan and analyze long content forms and summarize content that reaches multiple hundred pages. Another characteristic that comes with their latest AI version is the potential to scroll through footage and scenes from films and different videos and respond to our answers.
These characteristics should have been a true lifesaver. Think about finding the perfect movie scene. Or about receiving the summary of the book you were longing to read but did not have the time to read.
However, it seems like recent studies that investigated and looked deeper into Gemini’s capabilities allegedly discovered that the artificial intelligence software does not work as presented. One of the studies shows that long content is not really Gemini’s new main strength. On the contrary, the study allegedly found that artificial intelligence software is more likely to struggle to resolve queries that involve long text formats.
The second study found that answering questions that involve long-format datasets and longer document searches delivers almost 40 to 50% of the right answers. The artificial intelligence model only being capable to answer correctly nearly half of the time.
One of the authors of the studies, Marzena Karpinska, reports that even though artificial intelligence programs can seek out large databases, they do not really understand the information that they are viewing. This leads to misinterpreted responses and the allegedly wrong responses offered by the generative artificial intelligence software.
Even more so, Gemini is lacking the context window. Meaning that the model context canupward up to 2 million tokens as text. For a better understanding, a token is made out of subdivided bits of raw data such as syllables “a” “maz” and “ing” that correspond to the word “amazing”. The translation of those 2 million tokens is the equivalent of 1.4 million words.
So, before generating an answer, the model considers the information that it has available, serving as context. However, the movie script, a show, or a video clip can be considered as context.
When the solution was launched, Google's DeepMind VP of research stated about their newest solution that “[1.5 Pro] performs these sorts of reasoning tasks across every single page, every single word,”, which allegedly seems to be a misinformation.