First, there are few data sources that cover a fully representative range of the world’s countries, and thus without combining indicators, it would be impossible to gain scores for more than a small sub-sample of nations.
Second, we assume that every indicator has some amount of error. Error can be due to several causes: observational error may exist because of unreliability in the instrument used to record a particular phenomenon: surveys, for example, may be subject to reporting biases or sampling error, while official statistics on the other hand may have been compiled using different methodologies.
There is also error that is attributable to the use of indicators with low concept validity, that is, when the selected indicator, however reliably gathered, only imperfectly corresponds to the latent variable under consideration. One way to reduce error is to employ greater scrutiny in the selection and consideration of indicators. Yet this presumes a high degree of knowledge on the part of the analyst: it can be difficult to assess the reliability of any given measure in isolation, especially in the absence of familiarity with the method used to generate those values. Validity is easier to determine, though here again we often have to rely on complex assumptions regarding the causal relationship between what we are measuring and what we seek to measure.
For example, it may be open to contention whether civic capacity is best measured by features of the institutional environment (the number of media organizations, freedom of information), features of citizen behavior (engagement in local civic groups, participation in voting, petitions and demonstrations), or some other feature of that society (e.g. the number of international NGOs).
Moreover, with social phenomena we often face a trade-off between reliability, validity, and representativeness: a given indicator, such as the income ratio between different ethnic groups, may be a valid and reliable measure of social exclusion, but available for very few countries; a survey item on attitudes toward other ethnic groups is certainly valid and may be widely available, but subject to survey response bias. There is, in short, rarely a single indicator that adequately measures the concept we are trying to quantify.
Combining multiple indicators, on the other hand, is another means to reduce aggregate error. If one assumes that errors are uncorrelated between data sources and that the size of the error is constant across items, then the combination of multiple sources will progressively reduce error as the number of indicators increases. We supplement these estimates with the calculated margin of error for each country, which is based on how many sources there were and the extent to which these sources agreed.