People explore options, then selectively represent good options to make difficult decisionsPosted on May 21, 2019
UNIVERSITY PARK, Pa. — In a world that offers a seemingly unending number of options and opportunities, people may rely on the overall complexity of alternative options to help them make choices in uncertain environments, according to researchers.
In a study recently published in the journal Cognition, the researchers found that when participants faced complex choices, they often showed a burst of exploration before settling into preferred options of higher value. Instead of trying to represent the values of all of the alternatives, adaptive decision-making was supported by selectively maintaining high-value options while forgetting the rest. This strategy may be one way that people can conserve their cognitive resources and solve problems that exceed their working memory capacity.
It might, for example, explain why people have their go-to meals when they visit restaurants, said Michael Hallquist, assistant professor of psychology at Penn State and Institute for Computational and Data Sciences co-hire.
“There is a set of neural circuits — and cognitive processes that these circuits instantiate — that help you remember the value of different actions, so if you go to a restaurant and try the steak and it was fantastic, the next time you’ll usually remember that,” said Hallquist. “The difficulty, though, is that at any given moment, you’re faced with so many possibilities that you can’t possibly evaluate all of the alternatives in detail. In the decision-making literature, this has been called the exploration-exploitation dilemma. Keeping this in the context of the restaurant example, exploration would be ordering something you haven’t tried before and exploitation would be going back to the steak you know is going to be good. By comparison, if you had previously tried the lasagna and it was unremarkable, would you remember this as clearly as the steak?”
To study the exploration-exploitation dilemma, the researchers recruited 76 participants to complete a timed task that was divided into eight sessions, or runs. Each run consisted of 50 trials. During a trial, a clock hand revolved around an image of a face with a happy or unhappy expression, for example, or some other abstract image. The subjects could stop the revolving hand and, depending on when they decided to stop the hand, received a reward of between 0 and 150 points. To create an uncertain environment, the payoff for choices was inconsistent and varied as a function of time. In some runs, the researchers rewarded the subject when he or she waited, whereas in other runs, the contingency rewarded subjects who responded more quickly.
“They didn’t know any of this going into the test, they had to learn it as they went,” said Hallquist. “They’re learning whether they should wait, or whether they should act quickly. It sounds easy, but it can be tricky because the payouts are probabilistic,” said Hallquist. “So, you may choose to respond in two seconds and receive 100 points. And then you may hit that mark again and get no points, so people have to integrate the long-run outcomes.”
Using mathematical models of decision-making, Hallquist and Alexandre Y. Dombrovski, associate professor of psychiatry, University of Pittsburgh, found that subjects’ decisions were consistent with a strategy of selectively maintaining high-value response times. Alternative models that represented the values of all response times, or that promoted or discouraged responses based on uncertainty were not supported. Altogether, these results suggest that people solve the exploration-exploitation in part by sampling many different options, then compressing the information that they need to track, according to Hallquist.
“You are assigning and holding onto things that are especially valuable, and you devote cognitive horsepower to representing those things with high fidelity,” said Hallquist. “This helps you solve this really hard problem because you can’t represent all of it, so there has to be some way of compressing this information.”
Better representing real-life decisions
The experiment was designed to better represent decisions in real life, according to Hallquist.
“In most decision-making experiments, you’re only a choosing among a few things, but in real life, you’re faced with many, many options,” he said. “We saw a timed task with varying outcomes as a more realistic test of how people make these exploratory versus exploitative choices in a complex environment.”
In the future, the researchers are planning to analyze data from fMRI — functional magnetic resonance imaging — to better understand how the brain represents decision-relevant signals during the tasks.
Computations for this research were performed on the Penn State’s Institute for CyberScience Advanced CyberInfrastructure (ICS-ACI).