⏰🚀 Racing Against Time: AI Surpasses Predictions.

Date

Date

Date

July 26, 2023

July 26, 2023

July 26, 2023

Author

Author

Author

Matias Hoyl

Matias Hoyl

Matias Hoyl

In January 2021, Michael Wooldridge, PhD in artificial intelligence and professor of computer science at the University of Oxford, published this book:

undefined

On page 10, one can find a list of tasks that AI has solved, accompanied by the approximate year when researchers managed to get AI to perform that task.

The tasks range from easy ones, such as basic arithmetic, which was solved in 1945, to slightly more complex tasks, like playing chess or GO, to tasks that are almost being solved today, like autonomous cars.

undefined

If you look closely, at the end of the list, the researcher enumerates a series of tasks that “are not even close to being solved.” And further down, in the footnote, he states that at present “we have no idea how to make computers perform the tasks at the bottom of the list.”

These tasks are:

  • Understanding a story and answering questions about it

  • Human-level automated translations

  • Interpreting what is happening in a photo

  • Writing interesting stories

  • Interpreting a work of art

  • Human-level general intelligence

One of the most important researchers in AI said that two years ago.

Today, AI can solve all but one.

Let’s go point by point.

Understanding a Story

We already know that ChatGPT is very good at understanding and writing text.

But I bring two examples that are less common and illustrate the power of LLMs (Large Language Models) like ChatGPT.

First, I took the latest model of the chatbot Claude. This is a lesser-known AI compared to ChatGPT or Bing Chat, but it has an interesting peculiarity. Unlike ChatGPT, where you can write a maximum of 3,000 words in the chat box, in Claude you can write up to 75,000 words.

undefined

You can try this chatbot with a maximum of 3 messages

here

.

Taking advantage of that feature, I copied and pasted the entire book of The Little Prince and asked it about the story.

undefined

In less than 15 seconds, it responded with this:

undefined

While doing this experiment, I wondered whether Claude generated the response because it was reading the book I provided or because it already had prior knowledge about this book in its “memory.” By “memory,” I mean the thousands of books and texts that these models are trained on.

So I took the experiment a step further. In the middle of the book, among the 13,000 words of the story, I added a new chapter:

undefined

And I asked it to describe all the characters the little prince encounters in the story:

undefined

Therefore, the AI reads word by word the story and understands it perfectly.

undefined

## Translations

Computers have been able to translate simple sentences for a long time.

undefined

From its debut in 2006 to 2016, the system used a simple statistical approach. It searched for the phrase to be translated in its extensive reference library and translated it word by word or phrase by phrase.

undefined

As you can guess, its translations were not very good.

Starting in 2017, they began using more complex neural networks, similar to those used by ChatGPT today, and their quality improved substantially.

undefined

PBMT was the simple statistical method. Its evolution, GNMT, is the neural network system that Google Translate uses today.

But as any student who has tried to translate a complete essay in Google Translate knows, the tool is still not perfect.

The translation feels robotic and lacks fluency. For this reason, the author of the beginning put “human-level” translation among the things that we are still “not even close to solving.”

But today, the LLMs behind ChatGPT or Bard (from Google) are better than Google Translate for that same task.

In the case of Bard, the model behind it is called PaLM2, and in several tests, it performs better than Google Translate.

undefined

Rows: comparing PaLM (Google's old model), Google Translate, and PaLM 2 (the new one). Columns: evaluating the translation from Chinese to English and from English to German in two tests: BLEURT (more is better) and MQM (less is better).

Those who have used ChatGPT or Bard enough to translate things know that we are at the level of human translation.

undefined

A poem translated into English, Korean, and Farsi.

Another point for AI.

undefined

## Interpreting a Photo

Recently, I posted this on LinkedIn:

undefined

Did you also see an egg or not?

I passed that photo to Bing Chat, the free chatbot from Microsoft that comes in the Edge browser that is on all Windows computers.

The level of precision of the vision system is impressive.

Over the years, image recognition systems have improved to reach 91.1% accuracy. Much better than several humans I know (🙋‍♂️🤓).

undefined

Evolution of vision models in their accuracy.

Source

.

I think we can say:

undefined

## Stories

This is the specialty of LLMs.

And within the data used to train them, a good portion consists of books.

undefined

Data used to train GPT-3. There is no public data on the training set of GPT-3.5 (free ChatGPT) or GPT-4 (ChatGPT Plus), but we can assume a similar proportion and a (much) larger number of tokens.

Source

.

67 billion tokens (12 + 55) are approximately equivalent to one million books. And that’s in the case of GPT-3, which is no longer even used. For ChatGPT, it’s probably 2x or 3x that amount.

In short, many books. Many stories.

Can you imagine the quality of a story that a person could tell if they read a million of them?

Exactly.

And ChatGPT (and its cousins Bard, Claude, and Bing Chat) are very good at writing stories.

This is what someone on the internet discovered when they used ChatGPT to write a complete book.

undefined

If you notice, the author is “GPT-4.” The photo was created using another AI tool: Midjourney.

You can read the complete book in English for free, and you can also see the process of the “meta-author” to make ChatGPT write the complete book.

The book won’t win any awards, but it’s interesting enough.

undefined

## Interpreting Art

This is the task that sounds the most complex.

And, moreover, due to its level of subjectivity, it is the hardest to measure.

How can we know if an artistic interpretation is “good” or “bad”?

By nature, an interpretation is subjective. What a work of art conveys to me may be different from what it conveys to you. Neither of the two perceptions is "wrong."

But let’s let everyone judge for themselves. Here’s an example:

undefined

Another example:

undefined

Bing Chat: “One possible interpretation is that it represents the loss of individual identity in a society that imposes roles and expectations on people. The two characters have their heads covered with a white cloth, suggesting that they cannot express their true selves. The contrast between the red dress and the black suit indicates that there is a difference in gender and class between them, but that both are subjected to social norms.”

Last one 🙂:

undefined

Bing Chat: “One way to see it is as a critique of modern society and its contradictions. The bright colors and geometric shapes may suggest an appearance of order and harmony, but they may also hide a more chaotic and conflicting reality. The faces and figures may represent people living in a fragmented and alienated world, where the sense of unity and communication is lost.”

This brings us to the last point.

In January 2021, Michael Wooldridge, PhD in artificial intelligence and professor of computer science at the University of Oxford, published this book:

undefined

On page 10, one can find a list of tasks that AI has solved, accompanied by the approximate year when researchers managed to get AI to perform that task.

The tasks range from easy ones, such as basic arithmetic, which was solved in 1945, to slightly more complex tasks, like playing chess or GO, to tasks that are almost being solved today, like autonomous cars.

undefined

If you look closely, at the end of the list, the researcher enumerates a series of tasks that “are not even close to being solved.” And further down, in the footnote, he states that at present “we have no idea how to make computers perform the tasks at the bottom of the list.”

These tasks are:

  • Understanding a story and answering questions about it

  • Human-level automated translations

  • Interpreting what is happening in a photo

  • Writing interesting stories

  • Interpreting a work of art

  • Human-level general intelligence

One of the most important researchers in AI said that two years ago.

Today, AI can solve all but one.

Let’s go point by point.

Understanding a Story

We already know that ChatGPT is very good at understanding and writing text.

But I bring two examples that are less common and illustrate the power of LLMs (Large Language Models) like ChatGPT.

First, I took the latest model of the chatbot Claude. This is a lesser-known AI compared to ChatGPT or Bing Chat, but it has an interesting peculiarity. Unlike ChatGPT, where you can write a maximum of 3,000 words in the chat box, in Claude you can write up to 75,000 words.

undefined

You can try this chatbot with a maximum of 3 messages

here

.

Taking advantage of that feature, I copied and pasted the entire book of The Little Prince and asked it about the story.

undefined

In less than 15 seconds, it responded with this:

undefined

While doing this experiment, I wondered whether Claude generated the response because it was reading the book I provided or because it already had prior knowledge about this book in its “memory.” By “memory,” I mean the thousands of books and texts that these models are trained on.

So I took the experiment a step further. In the middle of the book, among the 13,000 words of the story, I added a new chapter:

undefined

And I asked it to describe all the characters the little prince encounters in the story:

undefined

Therefore, the AI reads word by word the story and understands it perfectly.

undefined

## Translations

Computers have been able to translate simple sentences for a long time.

undefined

From its debut in 2006 to 2016, the system used a simple statistical approach. It searched for the phrase to be translated in its extensive reference library and translated it word by word or phrase by phrase.

undefined

As you can guess, its translations were not very good.

Starting in 2017, they began using more complex neural networks, similar to those used by ChatGPT today, and their quality improved substantially.

undefined

PBMT was the simple statistical method. Its evolution, GNMT, is the neural network system that Google Translate uses today.

But as any student who has tried to translate a complete essay in Google Translate knows, the tool is still not perfect.

The translation feels robotic and lacks fluency. For this reason, the author of the beginning put “human-level” translation among the things that we are still “not even close to solving.”

But today, the LLMs behind ChatGPT or Bard (from Google) are better than Google Translate for that same task.

In the case of Bard, the model behind it is called PaLM2, and in several tests, it performs better than Google Translate.

undefined

Rows: comparing PaLM (Google's old model), Google Translate, and PaLM 2 (the new one). Columns: evaluating the translation from Chinese to English and from English to German in two tests: BLEURT (more is better) and MQM (less is better).

Those who have used ChatGPT or Bard enough to translate things know that we are at the level of human translation.

undefined

A poem translated into English, Korean, and Farsi.

Another point for AI.

undefined

## Interpreting a Photo

Recently, I posted this on LinkedIn:

undefined

Did you also see an egg or not?

I passed that photo to Bing Chat, the free chatbot from Microsoft that comes in the Edge browser that is on all Windows computers.

The level of precision of the vision system is impressive.

Over the years, image recognition systems have improved to reach 91.1% accuracy. Much better than several humans I know (🙋‍♂️🤓).

undefined

Evolution of vision models in their accuracy.

Source

.

I think we can say:

undefined

## Stories

This is the specialty of LLMs.

And within the data used to train them, a good portion consists of books.

undefined

Data used to train GPT-3. There is no public data on the training set of GPT-3.5 (free ChatGPT) or GPT-4 (ChatGPT Plus), but we can assume a similar proportion and a (much) larger number of tokens.

Source

.

67 billion tokens (12 + 55) are approximately equivalent to one million books. And that’s in the case of GPT-3, which is no longer even used. For ChatGPT, it’s probably 2x or 3x that amount.

In short, many books. Many stories.

Can you imagine the quality of a story that a person could tell if they read a million of them?

Exactly.

And ChatGPT (and its cousins Bard, Claude, and Bing Chat) are very good at writing stories.

This is what someone on the internet discovered when they used ChatGPT to write a complete book.

undefined

If you notice, the author is “GPT-4.” The photo was created using another AI tool: Midjourney.

You can read the complete book in English for free, and you can also see the process of the “meta-author” to make ChatGPT write the complete book.

The book won’t win any awards, but it’s interesting enough.

undefined

## Interpreting Art

This is the task that sounds the most complex.

And, moreover, due to its level of subjectivity, it is the hardest to measure.

How can we know if an artistic interpretation is “good” or “bad”?

By nature, an interpretation is subjective. What a work of art conveys to me may be different from what it conveys to you. Neither of the two perceptions is "wrong."

But let’s let everyone judge for themselves. Here’s an example:

undefined

Another example:

undefined

Bing Chat: “One possible interpretation is that it represents the loss of individual identity in a society that imposes roles and expectations on people. The two characters have their heads covered with a white cloth, suggesting that they cannot express their true selves. The contrast between the red dress and the black suit indicates that there is a difference in gender and class between them, but that both are subjected to social norms.”

Last one 🙂:

undefined

Bing Chat: “One way to see it is as a critique of modern society and its contradictions. The bright colors and geometric shapes may suggest an appearance of order and harmony, but they may also hide a more chaotic and conflicting reality. The faces and figures may represent people living in a fragmented and alienated world, where the sense of unity and communication is lost.”

This brings us to the last point.

Related posts

August 30, 2023

🤖🏆 The Ultimate Battle: Chatbots in the Ring.

August 30, 2023

🤖🏆 The Ultimate Battle: Chatbots in the Ring.

August 30, 2023

🤖🏆 The Ultimate Battle: Chatbots in the Ring.

August 1, 2023

🤖🚦Braking the Race: The Dangerous Potential of Superintelligence.

August 1, 2023

🤖🚦Braking the Race: The Dangerous Potential of Superintelligence.

August 1, 2023

🤖🚦Braking the Race: The Dangerous Potential of Superintelligence.

Matias Hoyl · mhoyl@stanford.edu

© 2024 Matías Hoyl. All Rights Reserved.

Matias Hoyl · mhoyl@stanford.edu

© 2024 Matías Hoyl. All Rights Reserved.

Matias Hoyl · mhoyl@stanford.edu

© 2024 Matías Hoyl. All Rights Reserved.