Q* (Quiet-STaR)

Teaching a machine to think

Turing Times
March 29, 2024

You’ll remember a few months ago all the board drama at OpenAI. One of the things that came out during that whole period was a model called Q*, which certain people at OpenAI raised as getting close to having reasoning capabilities. They were allegedly alarmed at how impressive the logic abilities of that AI were, and thought that more oversight of the model was needed. A new paper has been released about a likely similar approach to machine reasoning from researchers, including XAI employee Eric Zelikman.

Here are a few key points about the Quiet-STaR approach based on the paper, in the simplest of terms.

To help the robot get better at understanding and predicting tricky concepts, the researchers taught it a new trick called "thinking to itself." Here's how it works:

Think: As the robot reads each word, it pauses to generate its own little thoughts or explanations about what might come next in the text.
Talk: The robot then combines its original prediction for the next word with the prediction it made after generating its thought. It learns to balance between the two.
Learn: The robot gets a reward when its thoughts help it make better predictions. This encourages it to come up with helpful explanations.

The researchers also made some clever adjustments to make this process faster and smoother for the robot. They added special "start thinking" and "stop thinking" signals to help the robot know when to begin and end its thoughts.

To test if this new thinking trick was working, they gave the robot some challenging reading comprehension and math problem-solving tasks. Impressively, the robot did much better on these tests after learning to think to itself, without any extra training on those specific tasks. The longer the robot's thoughts were, the better it performed.

When the researchers looked at the robot's thoughts, they found that it was coming up with helpful ideas and recalling relevant information to better predict tricky words and concepts.

Overall, this thinking trick, called Quiet-STaR, helped the robot to reason better and understand more complex ideas by learning from a wide variety of texts, making it a more well-rounded and capable language model.

The link to the full article can be found here: https://arxiv.org/abs/2403.09629