Ranking rules. The case of identical ranks

When studying public health and healthcare for scientific and practical purposes, the researcher often has to conduct a statistical analysis of the relationships between factor and performance characteristics of a statistical population (causal relationship) or determine the dependence of parallel changes in several characteristics of this population on some third value (on their common cause ). It is necessary to be able to study the features of this connection, determine its size and direction, and also evaluate its reliability. For this purpose, correlation methods are used.

  1. Types of manifestation of quantitative relationships between characteristics
    • functional connection
    • correlation connection
  2. Definitions of functional and correlational connection

    Functional connection- this type of relationship between two characteristics when each value of one of them corresponds to a strictly defined value of the other (the area of ​​a circle depends on the radius of the circle, etc.). Functional connection is characteristic of physical and mathematical processes.

    Correlation- such a relationship in which each specific value of one characteristic corresponds to several values ​​of another characteristic interrelated with it (the relationship between a person’s height and weight; the relationship between body temperature and pulse rate, etc.). Correlation is typical for medical and biological processes.

  3. The practical significance of establishing a correlation connection. Identification of cause-and-effect relationships between factor and resultant characteristics (when assessing physical development, to determine the relationship between working conditions, living conditions and health status, when determining the dependence of the frequency of illnesses on age, length of service, the presence of occupational hazards, etc.)

    Dependence of parallel changes in several characteristics on some third value. For example, under the influence of high temperature in the workshop, changes in blood pressure, blood viscosity, pulse rate, etc. occur.

  4. A value characterizing the direction and strength of the relationship between characteristics. The correlation coefficient, which in one number gives an idea of ​​the direction and strength of the connection between signs (phenomena), the limits of its fluctuations from 0 to ± 1
  5. Methods of presenting correlations
    • graph (scatter plot)
    • correlation coefficient
  6. Direction of correlation
    • straight
    • reverse
  7. Strength of correlation
    • strong: ±0.7 to ±1
    • average: ±0.3 to ±0.699
    • weak: 0 to ±0.299
  8. Methods for determining the correlation coefficient and formulas
    • method of squares (Pearson method)
    • rank method (Spearman method)
  9. Methodological requirements for using the correlation coefficient
    • measuring the relationship is only possible in qualitatively homogeneous populations (for example, measuring the relationship between height and weight in populations that are homogeneous by gender and age)
    • calculation can be made using absolute or derived values
    • to calculate the correlation coefficient, ungrouped variation series are used (this requirement applies only when calculating the correlation coefficient using the method of squares)
    • number of observations at least 30
  10. Recommendations for using the rank correlation method (Spearman's method)
    • when there is no need to accurately establish the strength of the connection, but approximate data is sufficient
    • when characteristics are represented not only by quantitative, but also by attributive values
    • when the feature distribution series have open options(for example, work experience up to 1 year, etc.)
  11. Recommendations for using the method of squares (Pearson's method)
    • when an accurate determination of the strength of connection between characteristics is required
    • when signs have only quantitative expression
  12. Methodology and procedure for calculating the correlation coefficient

    1) Method of squares

    2) Rank method

  13. Scheme for assessing the correlation relationship using the correlation coefficient
  14. Calculation of correlation coefficient error
  15. Estimation of the reliability of the correlation coefficient obtained by the rank correlation method and the method of squares

    Method 1
    Reliability is determined by the formula:

    The t criterion is evaluated using a table of t values, taking into account the number of degrees of freedom (n - 2), where n is the number of paired options. The t criterion must be equal to or greater than the table one, corresponding to a probability p ≥99%.

    Method 2
    Reliability is assessed using a special table of standard correlation coefficients. In this case, a correlation coefficient is considered reliable when, with a certain number of degrees of freedom (n - 2), it is equal to or more than the tabular one, corresponding to the degree of error-free prediction p ≥95%.

to use the method of squares

Exercise: calculate the correlation coefficient, determine the direction and strength of the relationship between the amount of calcium in water and water hardness, if the following data are known (Table 1). Assess the reliability of the relationship. Draw a conclusion.

Table 1

Justification for the choice of method. To solve the problem, the method of squares (Pearson) was chosen, because each of the signs (water hardness and amount of calcium) has a numerical expression; no open option.

Solution.
The sequence of calculations is described in the text, the results are presented in the table. Having constructed series of paired comparable characteristics, denote them by x (water hardness in degrees) and by y (amount of calcium in water in mg/l).

Water hardness
(in degrees)
Amount of calcium in water
(in mg/l)
d x d y d x x d y d x 2 d y 2
4
8
11
27
34
37
28
56
77
191
241
262
-16
-12
-9
+7
+14
+16
-114
-86
-66
+48
+98
+120
1824
1032
594
336
1372
1920
256
144
81
49
196
256
12996
7396
4356
2304
9604
14400
M x =Σ x / n M y =Σ y / n Σ d x x d y =7078 Σ d x 2 =982 Σ d y 2 =51056
M x =120/6=20 M y =852/6=142
  1. Determine the average values ​​of M x in the row option “x” and M y in the row option “y” using the formulas:
    M x = Σх/n (column 1) and
    M y = Σу/n (column 2)
  2. Find the deviation (d x and d y) of each option from the value of the calculated average in the series “x” and in the series “y”
    d x = x - M x (column 3) and d y = y - M y (column 4).
  3. Find the product of deviations d x x d y and sum them up: Σ d x x d y (column 5)
  4. Square each deviation d x and d y and sum their values ​​along the “x” series and the “y” series: Σ d x 2 = 982 (column 6) and Σ d y 2 = 51056 (column 7).
  5. Determine the product Σ d x 2 x Σ d y 2 and extract the square root from this product
  6. The resulting values ​​Σ (d x x d y) and √ (Σd x 2 x Σd y 2) substitute into the formula for calculating the correlation coefficient:
  7. Determine the reliability of the correlation coefficient:
    1st method. Find the error of the correlation coefficient (mr xy) and the t criterion using the formulas:

    Criterion t = 14.1, which corresponds to the probability of an error-free forecast p > 99.9%.

    2nd method. The reliability of the correlation coefficient is assessed using the table “Standard correlation coefficients” (see Appendix 1). With the number of degrees of freedom (n - 2)=6 - 2=4, our calculated coefficient the correlation r xу = + 0.99 is greater than the table one (r table = + 0.917 at p = 99%).

    Conclusion. The more calcium in water, the harder it is (connection direct, strong and authentic: r xy = + 0.99, p > 99.9%).

    to use the ranking method

    Exercise: using the rank method, establish the direction and strength of the relationship between years of work experience and the frequency of injuries if the following data are obtained:

    Justification for choosing the method: To solve the problem, only the rank correlation method can be chosen, because The first row of the attribute “work experience in years” has open options (work experience up to 1 year and 7 or more years), which does not allow the use of a more accurate method - the method of squares - to establish a connection between the compared characteristics.

    Solution. The sequence of calculations is presented in the text, the results are presented in table. 2.

    Table 2

    Work experience in years Number of injuries Ordinal numbers (ranks) Rank difference Squared difference of ranks
    X Y d(x-y) d 2
    Up to 1 year 24 1 5 -4 16
    1-2 16 2 4 -2 4
    3-4 12 3 2,5 +0,5 0,25
    5-6 12 4 2,5 +1,5 2,25
    7 or more 6 5 1 +4 16
    Σ d 2 = 38.5

    Standard correlation coefficients that are considered reliable (according to L.S. Kaminsky)

    Number of degrees of freedom - 2 Probability level p (%)
    95% 98% 99%
    1 0,997 0,999 0,999
    2 0,950 0,980 0,990
    3 0,878 0,934 0,959
    4 0,811 0,882 0,917
    5 0,754 0,833 0,874
    6 0,707 0,789 0,834
    7 0,666 0,750 0,798
    8 0,632 0,716 0,765
    9 0,602 0,885 0,735
    10 0,576 0,858 0,708
    11 0,553 0,634 0,684
    12 0,532 0,612 0,661
    13 0,514 0,592 0,641
    14 0,497 0,574 0,623
    15 0,482 0,558 0,606
    16 0,468 0,542 0,590
    17 0,456 0,528 0,575
    18 0,444 0,516 0,561
    19 0,433 0,503 0,549
    20 0,423 0,492 0,537
    25 0,381 0,445 0,487
    30 0,349 0,409 0,449

    1. Vlasov V.V. Epidemiology. - M.: GEOTAR-MED, 2004. - 464 p.
    2. Lisitsyn Yu.P. Public health and healthcare. Textbook for universities. - M.: GEOTAR-MED, 2007. - 512 p.
    3. Medic V.A., Yuryev V.K. Course of lectures on public health and healthcare: Part 1. Public health. - M.: Medicine, 2003. - 368 p.
    4. Minyaev V.A., Vishnyakov N.I. and others. Social medicine and healthcare organization (Manual in 2 volumes). - St. Petersburg, 1998. -528 p.
    5. Kucherenko V.Z., Agarkov N.M. and others. Social hygiene and healthcare organization ( Tutorial) - Moscow, 2000. - 432 p.
    6. S. Glanz. Medical and biological statistics. Translation from English - M., Praktika, 1998. - 459 p.

Rank correlation coefficients- these are less accurate, but simpler to calculate non-parametric indicators for measuring the closeness of the relationship between two correlated characteristics. These include the Spearman (ρ) and Kendal (τ) coefficients, based on the correlation not of the values ​​of the correlated features themselves, but of their ranks– serial numbers assigned to each individual value X And at(separately) in a ranked series. Both characteristics must be ranked (numbered) in the same order: from lower to higher values ​​and vice versa. If multiple values ​​occur X(or at), then each of them is assigned a rank equal to the quotient of dividing the sum of ranks (places in a row) attributable to these values ​​by the number of equal values. Feature ranks X And at denoted by symbols Rx And Ry(Sometimes Nx And Ny). Judging the relationship between changes in values X And at based on comparison of the behavior of ranks according to two characteristics in parallel. If every couple X And at the ranks coincide, this characterizes the closest possible connection. If there is a complete opposite of ranks, i.e. in one row the ranks increase from 1 to n, and in the other – decrease from n up to 1, this is the maximum possible feedback. Spearman's and Kendal's approaches to assessing the closeness of a connection are somewhat different. For calculation Spearman coefficient feature values X And at numbered (separately) in ascending order from 1 to n, i.e. they are assigned a certain rank ( Rx And Ry) – serial number in a ranked series. Then, for each pair of ranks, their difference is found (denoted as d=RxRy), and the squares of this difference are summed.

Where d– rank difference X And at;

n– number of observed pairs of values X And at.

Coefficient ρ can take values ​​from 0 to ±1. It should be borne in mind that since the Spearman coefficient takes into account the difference only in ranks, and not in the values ​​themselves X And y, it is less accurate compared to the linear coefficient. Therefore, its extreme values ​​(1 or 0) cannot be unconditionally regarded as evidence of a functional connection or a complete absence of dependence between X And u. In all other cases, i.e. When ρ does not take extreme values, it is quite close to r.

Formula (147) is strictly theoretically applicable only when individual values X(And y), and therefore their ranks are not repeated. For the case of repeating (linked) ranks, there is another, more complex formula, adjusted for the number of repeating ranks. However, experience shows that the results of calculations using the adjusted formula for related ranks differ little from the results obtained using the formula for non-repeating ranks. Therefore, in practice, formula (147) is successfully used for both non-repeating and repeating ranks.

Kendal Rank Correlation Coefficientτ is constructed somewhat differently, although its calculation also begins with ranking the values ​​of the features X And u. Ranks X(Rx) are placed strictly in ascending order and in parallel write down the corresponding Rx meaning Ry. Since Rx are written strictly in ascending order, then the task is to determine the degree of consistency of the sequence Ry following the “correct” Rx. At the same time, for everyone Ry sequentially determine the number of ranks following it, exceeding its value, and the number of ranks less in value. The first (“correct” following) are counted as points with a “+” sign, and their sum is indicated by the letter R. The second (“incorrect” following) are taken into account as points with a “–” sign, and their sum is indicated by the letter Q. Obviously, the maximum value R is achieved if the ranks y (Ry) coincide with ranks X (Rx) and in each row represent a row natural numbers from 1 to p. Then after the first pair of values Rx= 1 and Ry = 1 number of excess of these rank values ​​will be ( n– 1), after the second pair, where Rx= 2 and Ry= 2, respectively (p – 2) etc. Thus, if the ranks X And at coincide and the number of rank pairs is equal n, That

If the sequence of ranks X And at has the opposite tendency with respect to the rank sequence X, That Q there will be the same maximum value modulo:

.

If the ranks of y do not coincide with the ranks X, then all positive and negative points are summed up ( S=P+Q); ratio of this amount S to the maximum value of one of the terms and represents the Kendal rank correlation coefficient τ, i.e.:

. (148)

The Kendal rank correlation coefficient formula (148) is used for cases when individual values ​​of a characteristic (as X, so and y) are not repeated and, therefore, their ranks are not combined. If there are several identical values X(or y), those. ranks are repeated, become related, the Kendal rank correlation coefficient is determined by the formula:

, (149)

Where S– the actual total score when assessing +1 for each pair of ranks with the same order of change and –1 for each pair of ranks with the opposite order of change;

– the number of points that correct (reduce) the maximum amount of points due to repetitions (combinations) t ranks in each row.

Note that cases of identical repeating ranks (in any row) are scored 0, i.e. they are not taken into account in the calculation either with the “+” sign or with the “–” sign.

The advantages of Spearman and Kendal rank correlation coefficients: they are easy to calculate, with their help you can study and measure the relationship not only between quantitative, but also between qualitative (descriptive) features ranked in a certain way. In addition, when using rank correlation coefficients, it is not necessary to know the form of connection between the phenomena being studied.

If the number of ranked characteristics (factors) is more than two, then to measure the closeness of the connection between them, you can use the concordance coefficient (multiple rank correlation coefficient) proposed by M. Kendal and B. Smith:

, (150)

Where S- sum of squared deviations of the sum T ranks from their average value;

T - number of ranked features;

p - number of ranked units (number of observations).

Formula (150) is used for the case where the ranks for each attribute are not repeated. If there are related ranks, then the concordance coefficient is calculated taking into account the number of such repeating (related) ranks for each factor:

, (151)

Where t– the number of identical ranks for each characteristic.

Concordance coefficient W can take values ​​from 0 to 1. However, it is necessary to check it for significance (significance) using the χ2 criterion in the absence of related ranks using formula (152), and if they are present, using formula (153):

, (152) . (153)

The actual value of χ2 is compared with the tabulated value corresponding to the accepted significance level α (0.05 or 0.01) and the number of degrees of freedom v = p – 1. If χ2fact > χ2table, then W – significant (significant).

The concordance coefficient is especially often used in expert assessments, for example, in order to determine the degree of agreement between experts’ opinions about the importance of a particular indicator being assessed or to rank individual units on any basis. In formula (150), in these cases, m means the number of experts, and n is the number of ranked units (or features).

Approximates R.s. quite well. T, and the difference is negligible when . If the hypothesis H 0 is true, according to the cut component X 1 , ... , Xn random vector X are independent random variables, projection of R.s. Determined by the formula

where (see).

There is an internal connection between R. s. And . As shown in , if the hypothesis H 0 is true, the projection Kendall correlation coefficient into the family of linear linear systems. up to a constant factor coincides with the Spearman rank correlation coefficient, namely:


From this equality it follows that the correlation coefficient corr between and is equal to


i.e. at large pr. With. and are asymptotically equivalent (see).

Lit.: G a e k Ya., Sh i d a k Z., Theory of rank criteria, trans. from English, M., 1971; K e n d a l l M. G., Rank correlation methods, 4ed., L., 1970. M. S. Nikulin.


Mathematical encyclopedia. - M.: Soviet Encyclopedia. I. M. Vinogradov. 1977-1985.

See what "RANKING STATISTICS" is in other dictionaries:

    ranking statistics- - [A.S. Goldberg. English-Russian energy dictionary. 2006] Energy topics in general EN rank statistics ... Technical Translator's Guide

    This term has other meanings, see Statistics (meanings). Statistics (in the narrow sense) is measurable numeric function from the sample, independent of unknown distribution parameters. In a broad sense, the term (mathematical) ... ... Wikipedia

    - (statistics) 1. The totality of data and mathematical methods, used to study relationships between different variables. It includes methods such as linear regression and rank correlation. 2. Values ​​used... ... Economic dictionary

    STATISTICS- 1. A type of activity aimed at obtaining, processing and analyzing information that characterizes the quantitative patterns of life in all its diversity, in inextricable connection with its qualitative content. In a narrower sense of the word... ... Russian Sociological Encyclopedia

    - (non parametric statistics) Statistical techniques that do not allow special functional forms for relationships between variables. The rank correlation of two variables is an example of this. The use of such technical... ... Economic dictionary- K. m., which received their name. due to the fact that they are based on “co-relation” variables, they are statistical methods, the beginning of which was made in the works of Karl Pearson around late XIX V. They are closely related to... ... Psychological Encyclopedia

    Developer Digital Illusions CE Publisher ... Wikipedia

    Karl Pearson Karl (Carl) Pearson Date of birth ... Wikipedia

The use of an ordinal scale allows you to assign ranks to objects according to any criterion. Thus, metric values ​​are converted into rank values. At the same time, differences in the degree of expression of properties are recorded. There are 2 rules to follow during the ranking process.

Ranking order rule. It is necessary to decide who receives the first rank: the object with the greatest degree of expression of any quality or vice versa. Most often, this is absolutely indifferent and does not affect the final result. It is traditional to assign the first rank to objects with a greater degree of quality expression (a higher value means a lower rank). For example, the champion is awarded first place, and not vice versa. Although, even here, if the reverse order had been adopted, the results would not have changed. So each researcher has the right to determine the ranking order himself. For example, E.V. Sidorenko recommends assigning a lower rank to a smaller value. In some cases it is more convenient, but more unusual.

For example: there is an unordered sample whose data needs to be ranked. (2, 7, 6, 8, 11, 15, 9). After ordering the sample, we rank it.

Metric data

Alternative:

Metric data

The following should be said separately. There is a group of rarely used nonparametric tests (Wilcoxon T-test, Mann-Whitney U-test, Rosenbaum Q-test, etc.), when working with which you should always assign a lower rank to a smaller value.

Rule of related ranks. Objects with the same expression of properties are assigned the same rank. This rank is the average of the ranks they would have received if they had not been equal. For example, you need to rank a sample containing a number of identical metric data: (4, 5, 9, 2, 6, 5, 9, 7, 5, 12). After ordering the sample, the arithmetic mean of the related ranks should be calculated.

Metric data

Preliminary ranking

Final Ranking

Assignments for independent work.

    Rank the sample according to the rule “ higher value– lower rank”: (111, 104, 115, 107, 95, 104, 104).

    Rank the sample according to the rule “lower value – lower rank” (20, 25, 8, 7, 20, 14, 27).

    Combine the two previous samples and rank according to the rule “higher value - lower rank”

    Indicators of which features from Table I are nominative and which are metric?

    Convert the awareness indicators from Appendix Table I to a ranking scale. Identify the levels of expression of indicators by translating them into a nominative scale.

      Table I Data for processing

students

university profile

awareness

hidden figures

missed

arithmetic

understanding

exception

images

analogies

number series

inferences

geometric addition

learning words

average IQ

extroversion-

introversion

neuroticism

average mark

University profile: 0 - student’s choice of a humanitarian profile;

1 - student’s choice of a mathematical or natural science profile

1 Brief history emergence correlation analysis

The beginning of the use of mathematical and statistical techniques to study correlation dependencies dates back to the 70s of the nineteenth century. Many historians and statisticians trace the history of the development of correlation back to the forties of the nineteenth century - from the time when the French mathematician O. Bravais proposed a formula for the distribution of two random variables that satisfy the requirements of the law of normal distribution.

However, the true founder of the correlation theory is considered to be the English mathematician and statistician K. Pearson, who created in the late nineteenth and early twentieth centuries this theory. In it, correlation acts as a form of dialectical connection, in which many different causes operate, both necessary and random, both common to both correlation values, and private, affecting only one of them. Moreover, not all natural connections are causal.

The development of the theory was carried out with the help of other studies, when the main provisions of the correlation theory had already been created. Moreover, in the field of studying correlations, practice sharply diverged from theory, placing researchers in conditions that did not satisfy its requirements.

The basis for the formation of methods for studying correlations and regressions was data characterizing any quantitatively expressed characteristics. Therefore, at the very first steps, researchers encountered the problem of correlation qualitative signs, for example, the relationship between eye color in fathers and sons. General principle, which was the basis for the design of correlation indicators of qualitative characteristics, was that two qualitative characteristics can be considered interrelated if the effect of one of them A under the action of attribute B is the same as under the action of attribute not B. In development of this principle, and were offered various designs such indicators as, for example, Pearson's mean square contingency coefficient or Chuprov's mutual contingency coefficient.

The study of the correlation of qualitative characteristics gave rise to the so-called theory of ranks and the theory of rank correlation based on it in the general doctrine of correlation. The English mathematician and statistician M. Kendall, the author of a monograph devoted to the problems of rank correlation, pointed out that the theory of ranks first arose as an offshoot of the theory of random processes. On initial stage in ranks they most often saw simply a convenient device, thanks to which it is possible to do without measuring the absolute value of variables and thereby save time and effort. Later, rank statistics were able to gain recognition due to their own merits. Kendall constructed a measure that is also applicable to studying partial correlation between ranks. It is impossible to imagine the modern theory of rank correlation without M. Kendall's most comprehensive studies.

Thus, by the beginning of the twentieth century, mathematical and statistical methods for measuring correlations and regressions had generally developed into a fairly coherent integrated system, including methods of nonparametric statistics and nonparametric rank methods.

2 Nonparametric rank methods

Nonparametric rank methods are a rapidly developing area of ​​mathematical statistics. The history of modern nonparametric rank-based methods is quite short—only about 40 years. Rank methods have emerged as a special area of ​​nonparametric statistics not only due to the nature of the source material, but also due to the ideas behind it. further use. Today, these methods solve many problems in the analysis of economic, statistical, engineering, natural science, sociological, and medical data.

Ranking is a procedure for arranging objects of study, which is performed on the basis of preference. Rank is a serial number of attribute values, arranged in ascending or descending order of their values. As statistical studies conducted over the past 10-15 years have shown, ranking methods are largely free of a number of disadvantages for working with small samples, the distribution of which is unknown. As is known, the transition from the observations themselves to their ranks is accompanied by a certain loss of information. However, these losses are not too great. Unfortunately, at present there is still a lack of specialized literature on this issue.

IN lately Expert assessments have become widely used in forecasting and in solving a number of other problems. Rank correlation methods in this area are perhaps the only way to generalize expert assessments.

Rank theory first emerged as an offshoot of the theory of random processes. At the initial stage, ranks were most often seen as simply a convenient device, thanks to which it was possible to do without changing the absolute value of variables and thereby save time or effort. Thanks to the use of ranks, it was possible to avoid the difficulties associated with constructing an objective scale of absolute values. Later, rank statistics were able to gain recognition on their own merits.

Below we will consider the most common ways of organizing the objects being studied:

The task may simply be to organize objects according to the place they occupy in space or time. For example, the cards were arranged in a deck in some order and then shuffled. The new arrangement of cards is also characterized by a certain order, ranking. Comparing it with the old one, you can see how carefully the cards were shuffled. In this task, only the general arrangement of cards in the deck is interesting, and there is no need to arrange objects in accordance with the “increase” or “decrease” of one or another characteristic inherent in all of them;

Objects can also be ordered according to some quality, for which there is no objective absolute scale of change. You can, for example, rank samples rocks by hardness, based on the following simple criterion: A is harder than B if A leaves a scratch on B when they touch. If A leaves a scratch on B, and B leaves a scratch on C, then A will leave a scratch on C. Thus, by resorting to a series of comparisons, the objects in question can be ordered with reasonable accuracy (unless the set includes two objects that have the same hardness ). However, this method does not allow measuring the absolute value of rock hardness. It is always possible to establish that A is harder than B. However, until one or another measurement scale is constructed absolute values, it cannot be said that A is, say, twice as hard as B;

The ordering can be carried out in accordance with the measured (or theoretically calculated) value of some attribute. For example, you can arrange people in one order or another depending on their height, and cities by population. In this case, it is not always necessary to resort to the measurement process itself: you can build a group of students by height “by eye”; however, in such cases, the criterion by which the ranking occurs must allow for direct comparisons.

It is possible to order objects according to some attribute, the value of which, in principle, can be measured, but in practice (or even theoretically) it is not possible to resort to such a measurement for one reason or another. For example, one might order a series of persons according to their intellectual abilities, believing that such a quality actually exists and that people can be placed in one order or another according to the intensity of this attribute.

IN practical applications Ranking-based methods sometimes encounter cases where two or more objects are so similar that it is impossible to give preference to one of them. When an expert ranks an object based on subjective judgments, then this property (lack of preference) is associated with the truth of their indistinguishability or the inability of the researcher to find significant differences. In this case, they say that such an object is called bound.

For example, students were ranked according to their merits or exam scores. The method adopted for prescribing numerical values ​​for the ranks of related objects is to average the ranks they would have if they were distinguishable. For example, if the third and fourth objects are connected, then each is assigned a rank of 3.5, but if objects from the second to the seventh are connected, then the resulting rank is 4.5.

This approach is sometimes called the “average rank method.” When there is no basis for choosing between objects, then it is clear that in this case it is necessary to assign equal ranks to everyone. Advantage this method is that the sum of ranks for all objects remains exactly the same as when ranking without connections.

In the analysis of socio-economic phenomena, it is often necessary to resort to various conditional estimates using ranks, and the relationship between individual characteristics is measured using nonparametric coefficients communications.

3 Kendall's rank concordance coefficient

To determine the closeness of the relationship between an arbitrary number of ranked features, a multiple correlation coefficient (concordance coefficient) is used.

In the practice of statistical research, there are cases when a set of objects is characterized not by two, but by several sequences of ranks; it is necessary to establish a statistical relationship between several variables. As such a meter, the multiple correlation coefficient (concordance coefficient) of Kendall ranks is used, determined by the following formula:

Where W– concordance coefficient;

D– the sum of squares of ranks is calculated according to formula (2);

n– number of objects of the ranked characteristic (number of experts);

m– number of analyzed ordinal variables.

In a sense, W serves as a measure of generality.

, (2)

Where r ij– ranked judgments of the group of experts;

n– number of objects (number of experts).

The values ​​of the concordance coefficients are contained in the segment .

An increase in the coefficient from 0 to 1 means greater consistency of judgments. If all these judgments coincide, then W=1.

Testing the significance of the coefficient is based on the fact that if the null hypothesis about the absence of correlation for n>7 is true, the statistics m(n-1)* W has approximately a distribution with k=n-1 degrees of freedom. Therefore, the concordance coefficient is significant at level =0.05 if m(n-1)W> .

What else to read