The CurtyMarsili forecasting game
The classical setting
The seminal model of Curty & Marsili^{11} considers a population of agents making a binary forecast for the next time step. Table 1 gathers the notations used for the different variables and parameters used throughout the paper. Agents have two possible strategies: looking for information (I) or being “follower” (F) and relying on others. These two strategies correspond to what is usually called respectively individual and social learning in cultural evolution. An agent playing the I strategy has a probability (p > 1/2) of being right (a quantity that we will, henceforth, call the “accuracy” of the strategy). The follower strategy consists in an iterative imitation process, specified as follows:

(i)
All agents playing F randomly initialize their opinion, with a 1/2 probability of being right: they have no private information, and thus a neutral prior.

(ii)
One of them, randomly chosen, observes a random sample of m individuals, and endorses the majority opinion among them (for simplicity, m is assumed to be odd to avoid ties).

(iii)
Step (ii) is repeated a large number of times until the proportion of followers holding the right opinion converges to a value (hatq).
Unlike the I strategy, the F accuracy depends on the fraction z of followers in a nontrivial way, see Fig. 2. For a small fraction z, there is only one attractive fixed point (q_>), larger than p, which means that herding is efficient. This is simply the Condorcet theorem mechanism at play: aggregating information yields more accurate predictions. Furthermore, the performance of followers increases at first with their number, as the probability to sample from an accurate follower increases, thereby improving sample quality. Notwithstanding, above a critical value (z_c), the system becomes bistable: (hatq) has two attractive fixed points (solid red line), and an unstable one in the middle (dashed red line), which marks the frontier between the two basins of attraction. Thus, the population has two possible end states, one where most people are right, and another where most people are wrong.
More intuitively, there is for (z > z_c) a substantial probability (increasing with z) that, due to initial conditions, followers trap themselves in the wrong selfreinforcing forecast. When z becomes very large, these two fixed point are close to 0 and 1, the followers basically reach a consensus. This is a herding phenomenon: when imitation is too frequent, followers are mostly imitating one another and no longer aggregating genuine information. It is possible to compute analytically (q = mathbb E[hatq])^{11}, which gives the bellshaped blue dotted curve shown in Fig. 2.
Under the hood, this is due to the fact that the more numerous the followers, the less independent their opinions. As we mentioned in the Introduction, the superaccuracy of followers relies on independence: if all individuals agree, aggregating opinions has no effect. Worse still: as followers have no information of their own but rely on others to make up their mind, a population made of 100% followers would decide purely by chance, which means that q has to fall towards 1/2 as (zrightarrow 1). Furthermore, there is a point (z^dagger lesssim 1) (i.e. the vast majority of agents are followers) where (q = p). If the agents are rational accuracymaximizers, this point is the only Nash equilibrium.
The message of the model is twofold. First, the equilibrium of the game is the point where, even though most people learn from others, learning offers no longer any gain of accuracy ((p = q), reminiscent of Rogers’ paradox^{10}–see Sect. 3.1 in the Discussion for more details). In other words, imitation behaviour spreads until the snake bites his own tail. There is “consensus”—nearly all the population adopts either the right or the wrong forecast. This model can thus explain, e.g., herding effects among financial analysts^{38}.
This model shares the basic features and conclusions of the conformist transmission literature. In Fig. 1, the dottedblue curve represents the probability that an imitator acquires the right opinion by using the F strategy, depending on its prevalence in the population. As we already noted, this shape matches Boyd & Richerson’s definition^{6} of having conformist bias (the probability to adopt the belief held by the majority is greater than the frequency of the majority opinion in the population).
An adaptive social network
Assumptions
One of the limitations of the CurtyMarsili framework is the assumption that followers sample randomly their consociates to make up their mind, thereby neglecting all possible effects of reputation, trust, and epistemic links between agents. In order to represent the fact that agents often rely on a network of people they know and trust, we embed the game in a dynamic directed graph, where each agent i is a node. A link (i rightarrow j) signifies that i listens to j when he/she makes up his/her mind. The network follows the following rules.

Each agent i initializes its network (Omega _i) of nearest neighbours by picking uniformly m agents in the population.

Each member (j in Omega _i) is endowed with a personal score (S_i^j), initialized at 0. It is updated following:
$$beginaligned forall j in Omega _i,quad S_i^j(t+1) = (1 – gamma )S_i^j(t) + R^j(t), endaligned$$
(1)
with (0 < gamma le 1) a memory decay rate, and a reward (R^j(t) = 1) if j was right at time t, 0 otherwise. The score thus reflects the accuracy record of an agent, with an emphasis on the recent past. (gamma =0) corresponds to infinite memory, whereas (gamma =1) is such that only the latest reward matters.

At each time step, i drops the lowest performer (j^star =,mathrmargmin,(S_i^j,j in Omega _i)) with probability (gamma) To reduce the number of parameters, we constrained the memory decay rate and the probability to update the network to have the same value (gamma). (which controls the speed of the network evolution), and picks a substitute. The probability for a “target” agent (j^text target) to be chosen as a substitute is proportional to (a + k(j^text target)), with (a > 0) a small incompressible weight that avoids diversity depletion (when (a = 0) an agent who has lost all its audience has no possibility to ever reenter the network), and (k(j^text target)) is (j^text target)’s indegree (measuring his/her “reputation”). This picking process excludes oneself and all agents already present in (Omega _i).
For the moment, we keep the strategies (I and F) fixed and exogenous, and observe the phenomenology of the model; we shall in a second stage let the strategies evolve depending on their respective payoffs. Henceforth, we choose z (the fraction of agents playing F) close to 1, and (p=0.52).
Collapse of the informed audience share
With this modification of the CurtyMarsili model we find that, perhaps surprisingly, q now drops below p, and settles around 1/2 (Fig. 3c).
This is caused by a subtle phenomenon. In the network, the scores (S_i^j) are supposed to estimate the accuracy of an agent. So, we should expect that when (q < p), the followers tend to fill their network with I agents, who are on average more accurate. Paradoxically, the opposite happens. Even though the scores of the informed agents are on average higher than those of the followers, the latter have a much lower variance (see Fig. 3a). Indeed, as followers tend to make the same prediction, their scores fluctuate less than those of I agents who make independent predictions. As the network is updated by withdrawing worst performing agents it is often, counterintuitively, an I agent who is jettisoned. Hence a collapse of the informed audience share, defined as the proportion of links directed towards an I agent.
Therefore, the systems ends up in a situation where the tiny minority of agents who have genuine information are not heeded. In our model, as often in real life, being the only one to be wrong has much stronger consequences than being the only one to be right. Being wrong means being wiped out the network, while being right does not provide any particular advantage in terms of audience share. The informed agents suffer most from this asymmetry. This echoes some empirical results: Yaniv & Kleinburger^{24} found that it was easier for advisors to lose a good reputation than to gain one. This is also reminiscent of Kahneman & Tversky’s prospect theory^{39}: if average performance is the point of reference, then being the only one to be wrong is a loss, which is psychologically overweighted compared with being the only one to be right. Anecdotal evidence suggests that this also true for asset managers, who get badly punished when they are alone in a drawdown.
Broad indegree distribution
In our network, the outdegrees (the number of agents a given agent is listening to) are all fixed to m. The dynamics of indegrees (the number of agents listening to someone) evolves dynamically. Quite interestingly, we find that the indegrees distribution develops a power law tail, see Fig. 3b. This can be interpreted as the spontaneous emergence of “opinion leaders”, i.e. agents whose opinion has a systemic impact on the population. This scalefree topology is due to the assumption that the probability to pick a given node is an affine function of its indegree. Nodes with indegree (k gg a) therefore grow exponentially with time. On the other hand, the probability that the score of an “opinion leader” remains by chance above a certain threshold that shields him/her from losing some followers decays exponentially with time. This “battle of exponentials” naturally leads to powerlaw distribution for k, with an exponent that depends on the parameters of the model (see e.g.^{40}).
The value of originality
The results of the previous section are somewhat unsatisfying. In the situation where q is close to 1/2, any follower would benefit from listening to an informed instead of a follower. This is an assortment failure, and the situation described above is only an equilibrium because our assumptions prevent efficient assortment.
While it makes sense to assume that the strategies themselves (I or F) are not directly observable, opinions are public and may be used to infer the underlying strategies. But as we repeatedly pointed out, at equilibrium informed agents cannot be identified by their accuracy (in the Curty & Marsili’s equilibrium, (p=q)).
Instead, the most distinctive feature of I agents is their originality. Since their forecast is not influenced by anyone, the correlation between their opinion and the majority opinion is zero (or slightly above zero if the agent has a substantial influence in the network), whereas this quantity is close to one for followers. This distinctive feature can plausibly be observed by followers, and be used to identify potentially informed sources and listen preferentially to them. More mundanely, agents with original opinions are often deemed competent (they are said to “think outside the box”, or hold “bold” opinions^{35}).
This intuition can be formalized by introducing a proxy of originality, which we define to be the distance observed by agent i between the forecast of an agent j in (Omega _i) and the average forecast in (Omega _i). Formally, the originality of an agent is (big vert R^j(t) – overlineR_i(t) big vert), with (overlineR_i(t) := textstyle frac1m sum _k = 1^m R^k(t)). Such distance quantifies how deviant a given forecaster is, from the point of view of a follower. Including it in the scores update rule in the simplest way amounts to writing:
$$beginaligned S_i^j(t+1)= & (1 – gamma )S_i^j(t) + R^j(t) + alpha big vert R^j(t) – overlineR_i(t) big vert endaligned$$
(2)
where (alpha) is a weighting parameter, measuring by how much agents value originality over accuracy.
In informationtheoretic terms, given the opinion of an original agent, the opinion of another randomly drawn unoriginal one has a null Shannon entropy. Conversely, the opinion of an original agent has a constant positive Shannon entropy, irrespective of the opinions that have been already voiced, since the random variable representing his/her opinion is independent from all other opinions. In other words, listening to an original agent is a way to diversify information sources, whereas listening to a consensusfollower is a waste of time. Hence, listening to original agents can actually be seen as a form of protection against groupthink, which agents are arguably wary of. We start by fixing (alpha) as an exogenous parameter before letting it evolve endogenously to an equilibrium value.
Figure 4 reveals that the effect of the originality term in (2) is twofold. First, as (alpha) increases, predictably, the taste for originality increases the audience of informed agents in the network (red markers, left axis). This means that our modified score update rule is successful in implementing the idea of beneficial originality. Perhaps surprisingly though, the effect of (alpha) on q is nonmonotonic. As (alpha) increases further, q reaches a maximum beyond which it decays.
In order to understand why this is the case, recall that in Fig. 2, z was found to have a bellshaped effect on q. In the present framework, the informed audience share s in the network (black markers, right axis) somehow plays the role of z in Curty & Marsili’s model. There needs to be an optimal fraction of informed agents in the network in order to maximize the wisdom of crowds. It is due to the fact that when (q > p), followers are better forecasters than informed, leading to a meanvariance tradeoff between diversifying information sources and listening to good forecasters.
This result is striking. Intuitively, social learners should preferentially seek to copy individual learners, who supposedly possess high quality information. Here, at some point, social learners are so good at detecting individual learners that listening to social learners instead becomes a better option. Analogously, reading a metaanalysis instead of primary literature is a good idea as long at the metaanalysis is filled with primary literature.
From the group point of view, the optimal situation should correspond to a “pyramidal” organisation with some individual learners at its base, originalityoriented social learners in the middle (who aggregate the information produced by the base) and pure social learners at the top. But if agents aimed to be right as often as possible, such a pyramid would be doubly unstable: the base would have an incentive to switch to social learning, and the middle would gain from listening to social learners instead. However, this organisation can be stable when agents do not have accuracy as only goal. This is what we explore in the following section.
Finally, increasing (alpha) concentrates the audience on the very small number of informed agents. Thus, we expect that some I agents have massive indegrees, but numerical results are, again, counterintuitive. The higher the (alpha), the lower the maximum indegrees (not shown). This makes sense if one accounts for the fact that informed draw their high audience share from their originality. Hence, as soon as an informed agent is in the network of a significant share of the population, it acquires a systemic impact which undermines its very originality. The effect of originality on scores therefore leads, selfconsistently, to a limitation of the audience share of opinion leaders. Individual learners gets much more listened than average, without reaching the extremes allowed by the fattailed indegree distribution obtained with (alpha = 0) (which gets gaussian as (alpha) increases).
Preliminary summary and open issues
Our exploratory results so far have shown that there exists an optimal positive value of the taste for originality (alpha). One would however prefer that individuals endogenously develop such a feature. But this feature actually generates externalities: in a population with a large (alpha) (say, where followers listen only to informed), an agent who sets its personal (alpha) to 0 in order to listen mostly to followers would have a better accuracy than others. We might thus still be in a producerscrounger problem, where freeriding would destroy the additional accuracy brought by the taste for originality.
In addition, we have so far worked with an exogenous proportion of informed agents. But if agents try to maximize their accuracy by changing strategies, this proportion should spontaneously decrease as soon as (q > p). The foregone conclusion is that imitation produces no accuracy gain ((p = q) is a necessary condition to have a Nash equilibrium) or, worse, that informed agents go extinct (which implies (q=1/2)). This is precisely what happens in a naive evolutionary simulation where fitness depends only on accuracy.
Within our framework, the only way to produce a situation where (q > p) is to modify the objective of our agents. Note that, in some contexts like financial markets, another realistic possibility is that informed agents can benefit from time priority. Take that the profit of an informed agent is given by (beta p – c) where (beta > 1) measures time priority and c are information costs, and that the profit of followers is simply q. Then one can have (q > p) at equilibrium provided ((beta 1) p > c). Concretely, if agents compete not only for accuracy, but also for indegrees (i.e. for audience), then we might establish an efficient “division of labour”. The lower accuracy of the informed would be compensated by their large indegrees, due to the taste for originality that followers develop.
Thus, neither the taste for originality nor the survival of informed agents can be taken for granted at this point. We need to investigate these issues in a dynamical setting, i.e. by making both z and (alpha) endogenous. The next section introduces an evolutionary framework precisely to address this issue.
An evolutionary framework
At this point, analytical computations of an evolutionary equilibrium appear out of reach. We thus choose a dynamical mutationselection algorithm, which allows us to find the equilibrium numerically, and a possible description of how the psychological traits we study evolved biologically. To do so, we need two ingredients: genes (the endogenous part of the model) and a fitness formula (maximized by evolution).
The genotype
We assume that agents have two genes:

(i)
The strategy gene, which has two alleles: informed (I) and follower (F)

(ii)
The originality gene, with also two alleles: (alpha = 0) or (alpha = alpha ^star >0).
Note that henceforth the taste for originality is no longer continuous, but instead binary ((alpha =0) or (alpha ^star)), which helps for interpreting the results. Strictly speaking, what is endogenous here is not the value of (alpha) but rather the fraction of agents having (alpha = alpha ^star > 0).
Three phenotypes can then be distinguished:

(i)
Followers preferring originality, thereafter called “dandies”. (We chose this word to signify that these agents also imitate, but do so in a way that make them deviant.)

(ii)
Followers without this taste, or “conformists”

(iii)
Informed (for them, the second gene makes no difference, as they do not use their social network)
The natural selection dynamics is implemented in a WrightFisher fashion^{41}. At each time step, a uniformly drawn individual is killed, and replaced by a clone of some agent chosen with a probability proportional to fitness, defined below. At each time step, each gene has also a small probability (sigma) to mutate to the alternative allele (typically, (sigma =10^8) in the simulations).
Costs and benefits
In addition to the obvious aim of agents, which is to make the right forecast as often as possible, we add two further sources of costs and benefits (otherwise, the informed inevitably go extinct and the model becomes trivial). First, information seeking has a cost. Second, we assume that being listened to has benefits. It can represent an analyst fee, university wage or simply the prestige or the power reaped from being influential.
Formally, we introduce the fitness of an individual (F_i(t)), which is a weighted sum of three ingredients:

(i)
A measure of an agent’s forecast performance, (S_i(t)), recorded with an exponential decay memory kernel (i.e. (S_i(t) = (1gamma )S_i(t1) + R_i(t)), with (gamma ) the memory decay rate and (R_i(t)=1) if i was right at time t and 0 otherwise.

(ii)
The cost of information ((cgeqslant 0)), incurred by the informed

(iii)
A measure of an agent’s audience/prestige proxied by its indegree (k_i:=#(j/iin Omega _j)), weighted by a coefficient (omega) quantifying how rewarding it is to be listened to, relatively to being right.
The fitness formula, in its simplest linear form, writes:
$$beginaligned F_i(t) = S_i(t) – ccdot mathbb 1_iin I + omega cdot k_i. endaligned$$
(3)
The simpler case where fitness depends only on accuracy corresponds to (c=0) and (omega =0).
A mutually beneficial division of labour
Figure 5a shows the average accuracy attained by followers as a function of (omega). Quite remarkably, we find that q increases rapidly with (omega) for small (omega), reaches a maximum and then decreases. For intermediate values of (omega) the system appears to be bistable, with two equilibria: (q_1 approx 1/2) and (q_2 > 1/2), the latter becoming gradually unreachable as (omega) increases.
In order to explain this result, we plot the frequencies of types in the population (Fig. 5b). We see that the higher the (omega), the rarer “dandies” are, whereas the effect on the fraction of informed agents is bellshaped. The first observation comes from the phenomenon discussed in Sect. 2.2 and illustrated in Fig. 3a. Like the informed (but to a lesser extent), dandies decouple their opinion from the rest of the population. Their scores in the network are therefore more dispersed. Thus, exactly as for the informed, they tend to have fewer indegrees than conformists. Hence, the higher the (omega), the more natural selection will disfavour the dandy allele.
The bellshaped effect on informed agents then follows naturally. We suggested in Sect. 2.4 that the survival of informed agents was contingent on two features: the reward brought by indegrees (that compensates their lower accuracy) and the presence of dandies (who constitute the lion’s share of the audience of informed agents). Here, these conditions translate into an (omega) value high enough for agents to have an incentive to be followers, but not too high—otherwise, for the reason stated in the previous paragraph, dandies go extinct, preventing informed agents from having an audience.
To sum up, we find three distinct regimes:

(i)
Small (omega): informed go extinct as they have no reason not to; they are less accurate than dandies, and accuracy is the main asset. Their extinction causes q to fall to 1/2, as no one in the network has any information any longer.

(ii)
High (omega): dandies cannot spread, as they lack indegrees (which is here crucial). Their absence causes informed agents to go extinct, as the high reward for indegrees does not favour them anymore.

(iii)
Intermediate (omega) (from 1 to 5 in Fig. 5a): the three types can coexist, with increased accuracy at the global level ((q > p)).
The last regime is especially interesting and is that which we are looking for. We can describe it as a mutually beneficial “division of labour”: informed agents do the hard work and collect reliable information, dandies listen to them to form more accurate opinions (metaanalysis), whereas conformists listen to a mix of conformists and dandies (so dandies achieve an indirect transmission of information between informed and conformists, otherwise impossible). The taste for originality is now stable, as the lower indegrees of dandies is offset by their higher accuracy. The presence of an informed minority is also stable, as their lower accuracy is offset by their high indegrees.
Opinion diversity
One can also analyse these results in terms of opinion diversity. Such diversity can be thought of as the average size of the minority group. One finds (not shown) that it follows very closely the prevalence of dandies. Thus, allowing for originality somehow solves the paradox we started from: the wisdom of crowds can work while maintaining opinion diversity.
In the labour division equilibrium, the population is neither a “smear of ideas”^{8} nor homogenous. Instead, there is a clear majority, but also a substantial residual heterogeneity, which is endogenous and evolutionary stable.
So, how can we position the agents in terms of conformism? Similarly to Fig. 1, we can use our model (with (omega =1) so we fall in the labour division equilibrium) to plot the probability for a random individual to acquire an opinion by imitation, as function of opinion’s prevalence in the population—see Fig. 1. We see that individuals are generally “conformists” in the sense of Boyd & Richerson, unless the population is very close to full consensus, in which case the residual heterogeneity makes the curve saturate to a value (< 1). Thus, the population is bound to stabilize in one of the two stable fixed points (around 0.1 and 0.9), which means some opinion diversity is stable. Interestingly, Fig. 1 is very close to the type of curve postulated by Schelling or Granovetter^{42,43} in their models of social imitation—models that posit some inherent heterogeneities, for a review see^{44}.
Intuitively, agents are only cautiously influenced by the majority: they aggregate information while remaining wary of herding. Also, we can note that q (the followers’ accuracy, not shown) is a strictly increasing function of opinion diversity, in line with Pages’s “Diversity Prediction Theorem”^{27}: the amount by which the crowd outpredicts its average member increases as the crowd becomes more diverse. As noted in^{44,45} such a situation also prevents large opinion swings (or “crashes”) when external conditions slowly evolve.
Note however that although the average value of q is larger than p, we are still in the bistable region of Fig. 2: in some instances, followers selftrap in the wrong belief. In our scenario, wisdom of crowd indeed exists but can sometimes badly fail. When neighbourhoods are not drawn at random in the population but according to static criteria (location, political opinion, social class, etc.) one should expect the formation of “echo chambers” with some clusters adopting the wrong belief, while others fully benefit from information aggregation.
What about anticonformism?
Is originality per se a credible signal?
The line of arguments developed above still has a loophole. If originality in itself is used by agents to detect the sources of genuine information, then original agents benefit from this signal by getting prestige in return. Thus, a social learner could benefit from being artificially original: he/she would be considered by others as a good source without paying the price of information search—just like in Batesian mimicry, where the palatable species gains protection from predators without paying the cost of being toxic^{46}. Furthermore, originality is simple enough to produce: an agent can choose opinions randomly, or sample around and adopt the minority opinion. Therefore, the above scenario is possibly vulnerable to an invasion by parasitic behaviour mimicking originality, thus scrounging the prestige of genuinely informed agents.
It thus makes sense to include a fourth strategy: anticonformism, consisting in sampling around, and endorsing the minority opinion. If such a behaviour spreads in the population, it challenges the very credibility of our scenario, as an original agent is no longer necessarily a source of reliable information. To do so, we include a third binary gene, whose possession of the positive allele coupled with the follower one leads to anticonformism. Like the other genes, it evolves by mutation and selection. Now, if we rerun the simulations of Fig. 5a allowing for this new strategy (see orange curve in Fig. 6a), the (q>p) phase disappears (in other words, the wisdom of crowds does not work anymore). The originality is at first used by dandies to hedge against herding, but is then exploited by anticonformists, who become parasitic to the prestige of the informed.
The accuracy/prestige sweetspot
At this point, one could conclude that our scenario producing (q>p) is flawed, and that it can only be saved if there exists an authentication method allowing truly informed agents to stand out. The situation is in fact more complex, in an interesting way.
Actually, anticonformism yields originality at the expense of accuracy: if the population is better than random ((q>1/2)), then an anticonformist mutant would be worse than random ((1q<1/2)). More mundanely, an anticonformist’s opinion is most certainly original, but is often balderdash. Making wrong forecasts decreases fitness in two ways: directly (Eq. 3) and through prestige, as supplying good information is also key to obtaining prestige (recall the update rule 2).
The magnitude of this effect is controlled by the probability p that an informed agent makes the right forecast (which is, contrarily to q, exogenous). We can think of p as the easiness to forecast the topic in question. For instance, p would be higher in meteorology than in finance, as the predictive power is (nowadays) stronger in the former case. So, it would make sense that the lower the p, the more anticonformism can spread in the population. Put differently, the more unpredictable a subject is, the harder it is to detect con artists. Thus, the virtuous equilibrium described in Sect. 2.5 is reachable when two conditions are met: (i) p is sufficiently high to detect counterfeits; (ii) (omega) takes intermediate values, as in the previous section.
This scenario is indeed confirmed numerically (cf. Fig. 6a): the (q>p) phase reappears for higher values of p. As shown in Fig. 6b, anticonformists only spread for low p and intermediate (omega), in which case the division of labour fails.
In terms of opinion diversity (not shown), the phase where anticonformists are present not only produces heterogeneity, but also polarisation: the average size of the minority group is close to 50%, i.e. the population is evenly split, as would indeed be expected for an anticonformist bias (see Fig. 1): if the probability to follow the minority is greater than the size of such minority, then the only stable fixed point corresponds to 50% of the population holding one opinion, and 50% holding the other.