What is correlate. What does the concept of correlation mean in simple words

06/06/2018 12 879 0 Igor

Psychology and society

Everything in the world is interconnected. Each person at the level of intuition tries to find the relationship between phenomena in order to be able to influence and control them. The concept that reflects this relationship is called correlation. What does she mean in simple words?

Content:

The concept of correlation

Correlation (from the Latin "correlatio" - ratio, relationship)- a mathematical term that means a measure of statistical probabilistic dependence between random variables (variables).



Example: Let's take two types of relationship:

  1. First- a pen in a person's hand. In which direction the hand moves, in that direction the pen moves. If the hand is at rest, then the pen will not write. If a person presses a little harder on it, then the mark on the paper will be richer. This type of relationship reflects a rigid dependence and is not a correlation. This relationship is functional.
  2. Second view- the relationship between the level of education of a person and the reading of literature. It is not known in advance which of the people reads more: higher education or without it. This relationship is random or stochastic, it is studied by statistical science, which deals exclusively with mass phenomena. If a statistical calculation makes it possible to prove a correlation between the level of education and the reading of literature, then this will make it possible to make any forecasts, to predict the probabilistic occurrence of events. In this example, it can be argued with a high degree of probability that people with higher education, those who are more educated, read more books. But since the relationship between these parameters is not functional, we can make a mistake. It is always possible to calculate the probability of such an error, which will be uniquely small and is called the level statistical significance(p).

Examples of the relationship between natural phenomena are: food chain in nature, the human body, which consists of organ systems interconnected and functioning as a whole.

Every day we are faced with a correlation in Everyday life: between weather and good mood, correct wording goals and their achievement, a positive attitude and luck, a sense of happiness and financial well-being. But we are looking for connections based not on mathematical calculations, but on myths, intuition, superstition, idle conjectures. These phenomena are very difficult to translate into mathematical language, to express in numbers, to measure. Another thing is when we analyze phenomena that can be calculated and presented in the form of numbers. In this case, we can determine the correlation using the correlation coefficient (r), which reflects the strength, degree, closeness and direction of the correlation between random variables.

Strong correlation between random variables- evidence of the presence of some statistical relationship specifically between these phenomena, but this relationship cannot be transferred to the same phenomena, but for a different situation. Often researchers, having calculated a significant correlation between two variables, based on the simplicity correlation analysis, make false intuitive assumptions about the existence of causal relationships between features, forgetting that the correlation coefficient is probabilistic.

Example: the number of people injured during icy conditions and the number of road accidents among vehicles. These quantities will correlate with each other, although they are absolutely not interconnected with each other, but have only a connection with the common cause of these random events- icy. If the analysis did not reveal a correlation relationship between the phenomena, this is not yet evidence of the absence of a relationship between them, which can be complex nonlinear, not revealed by correlation calculations.




The first to introduce the concept of correlation into scientific circulation was the French paleontologist Georges Cuvier. In the 18th century, he deduced the law of correlation of parts and organs of living organisms, thanks to which it became possible to restore the appearance of the entire fossil creature, animal, from the found parts of the body (remains). In statistics, the term correlation was first used in 1886 by an English scientist Francis Galton. But he could not derive the exact formula for calculating the correlation coefficient, but his student did it - famous mathematician and biologist Karl Pearson.

Types of correlation

By importance- highly significant, significant and insignificant.

Kinds

what is r

highly significant

r corresponds to the level of statistical significance p<=0,01

meaningful

r matches p<=0,05

insignificant

r does not reach p>0.1

negative(a decrease in the value of one variable leads to an increase in the level of another: the more phobias a person has, the less likely it is to take a leadership position) and positive (if an increase in one value entails an increase in the level of another: the more nervous you are, the more likely you are to get sick). If there is no relationship between the variables, then such a correlation is called zero.

Linear(when one value increases or decreases, the second also increases or decreases) and non-linear (when, when one value changes, the nature of the change in the second cannot be described using a linear dependence, then other mathematical laws are applied - polynomial, hyperbolic dependence).

By strength.

Odds




Depending on which scale the studied variables belong to, different types of correlation coefficients are calculated:

  1. Pearson's correlation coefficient, pairwise linear correlation coefficient, or product moment correlation is calculated for variables with interval and quantitative measurement scales.
  2. Spearman's or Kendall's rank correlation coefficient - when at least one of the values ​​has an ordinal scale or is not normally distributed.
  3. Point two-series correlation coefficient (Fechner sign correlation coefficient) - if one of the two values ​​is dichotomous.
  4. Four-field correlation coefficient (coefficient of multiple rank correlation (concordance) - if the two variables are dichotomous.

Pearson's coefficient refers to parametric indicators of correlation, all the rest - to non-parametric ones.

The value of the correlation coefficient is in the range from -1 to +1. With a complete positive correlation, r = +1, with a complete negative correlation, r = -1.

Formula and calculation





Examples

It is necessary to determine the relationship between two variables: the level of intellectual development (according to the results of testing) and the number of lateness per month (according to entries in the educational journal) among schoolchildren.

The initial data are presented in the table:

IQ data (x)

Data on the number of late arrivals (y)

Sum

1122

Average

112,2


To give a correct interpretation of the obtained indicator, it is necessary to analyze the sign of the correlation coefficient (+ or -) and its absolute value (modulo).

In accordance with the classification table of the correlation coefficient by strength, we conclude that rxy = -0.827 is a strong negative correlation. Thus, the number of schoolchildren being late has a very strong dependence on their level of intellectual development. We can say that high IQ students are less likely to be late to class than low IQ students.



The correlation coefficient can be used both by scientists to confirm or refute the assumption about the dependence of two quantities or phenomena and measure its strength, significance, and by students to conduct empirical and statistical research in various subjects. It must be remembered that this indicator is not an ideal tool, it is calculated only to measure the strength of a linear relationship and will always be a probabilistic value that has a certain error.

Correlation analysis is applied in the following areas:

  • economic science;
  • astrophysics;
  • social sciences (sociology, psychology, pedagogy);
  • agrochemistry;
  • metal science;
  • industry (for quality control);
  • hydrobiology;
  • biometrics, etc.

Reasons for the popularity of the correlation analysis method:

  1. The relative simplicity of calculating the correlation coefficients, this does not require a special mathematical education.
  2. Allows you to calculate the relationship between mass random variables, which are the subject of analysis of statistical science. In this regard, this method has become widespread in the field of statistical research.

Hopefully, now you will be able to distinguish between a functional relationship and a correlational one, and you will know that when you hear on television or read in the press about a correlation, then by it they mean a positive and quite significant relationship between two phenomena.

Publication date: 09/03/2017 13:01

The term "correlation" is actively used in the humanities, medicine; frequently featured in the media. Correlations play a key role in psychology. In particular, the calculation of correlations is an important step in the implementation of empirical research when writing a dissertation in psychology.

Correlation stuff on the web is too scientific. It is difficult for a non-specialist to understand the formulas. At the same time, understanding the meaning of correlations is necessary for a marketer, sociologist, physician, psychologist - everyone who conducts research on people.

In this article, we will explain in simple terms the essence of the correlation, types of correlations, methods of calculation, features of the use of correlation in psychological research, as well as when writing theses in psychology.

Content

What is correlation

Correlation is communication. But not any. What is its peculiarity? Let's look at an example.

Imagine that you are driving a car. You press the gas pedal - the car goes faster. You slow down the gas - the car slows down. Even a person who is not familiar with the device of a car will say: “There is a direct relationship between the gas pedal and the speed of the car: the harder the pedal is pressed, the higher the speed.”

This dependence is functional - the speed is a direct function of the gas pedal. The specialist will explain that the pedal controls the supply of fuel to the cylinders, where the combustion of the mixture occurs, which leads to an increase in power to the shaft, etc. This connection is rigid, deterministic, not allowing exceptions (provided that the machine is working).

Now imagine that you are the director of a company whose employees sell goods. You decide to increase sales by raising employees' salaries. You raise your salary by 10%, and the company's average sales go up. After a while, you increase by another 10%, and again growth. Then another 5%, and again there is an effect. The conclusion suggests itself - there is a direct relationship between the sales of the company and the salary of employees - the higher the salaries, the higher the sales of the organization. Is this the same connection as between the gas pedal and the speed of the car? What is the key difference?

That's right, the relationship between salary and sales is not rigid. This means that for some of the employees, sales could even decline, despite the increase in salary. Somebody's got to stay the same. But on average, sales have grown in the company, and we say that there is a relationship between sales and employee salaries, and it is correlated.

The functional connection (gas pedal - speed) is based on a physical law. The basis of the correlation (sales - salary) is a simple consistency of changes in two indicators. There is no law (in the physical sense of the word) behind correlation. There is only a probabilistic (stochastic) regularity.

Numerical expression of correlation dependence

So, the correlation reflects the dependence between phenomena. If these phenomena can be measured, then it receives a numerical expression.

For example, the role of reading in people's lives is being studied. The researchers took a group of 40 people and measured two indicators for each subject: 1) how much time he reads per week; 2) to what extent he considers himself successful (on a scale from 1 to 10). The researchers plotted the data in two columns and used a statistical program to calculate the correlation between reading and well-being. Suppose they got the following result -0.76. But what does this number mean? How to interpret it? Let's figure it out.

The resulting number is called the correlation coefficient. For its correct interpretation, it is important to consider the following:

  1. The sign "+" or "-" reflects the direction of dependence.
  2. The value of the coefficient reflects the strength of the dependence.

Direct and reverse

The plus sign in front of the coefficient indicates that the relationship between phenomena or indicators is direct. That is, the greater one indicator, the greater the other. Higher salary means higher sales. Such a correlation is called direct, or positive.

If the coefficient has a minus sign, then the correlation is inverse, or negative. In this case, the higher one indicator, the lower the other. In the reading and well-being example, we got -0.76, which means that the more people read, the lower their level of well-being.

Strong and weak

Correlation in numerical terms is a number in the range from -1 to +1. Denoted by the letter "r". The higher the number (ignoring the sign), the stronger the correlation.

The lower the numerical value of the coefficient, the less the relationship between phenomena and indicators.

The maximum possible dependency strength is 1 or -1. How to understand and present it?

Consider an example. They took 10 students and measured their level of intelligence (IQ) and academic performance for the semester. Arranged this data in two columns.

test subject

IQ

Progress (points)

Look carefully at the data in the table. From 1 to 10 of the test subject, the IQ level increases. But the level of achievement is also rising. Of any two students, the one with the higher IQ will perform better. And there will be no exceptions to this rule.

Before us is an example of a complete, 100% coordinated change in two indicators in a group. And this is an example of the maximum possible positive relationship. That is, the correlation between intelligence and performance is 1.

Let's consider another example. The same 10 students were assessed with the help of a survey to what extent they feel successful in communicating with the opposite sex (on a scale from 1 to 10).

test subject

IQ

Success in communicating with the opposite sex (points)

We look closely at the data in the table. From 1 to 10 of the test subject, the IQ level increases. At the same time, the level of success in communication with the opposite sex consistently decreases in the last column. Of any two students, the one with the lower IQ will be more successful in communicating with the opposite sex. And there will be no exceptions to this rule.

This is an example of complete consistency in the change of two indicators in the group - the maximum possible negative relationship. The correlation between IQ and the success of communication with the opposite sex is -1.

And how to understand the meaning of a correlation equal to zero (0)? This means that there is no relationship between the indicators. Once again, let's return to our students and consider another indicator measured by them - the length of the jump from a place.

test subject

IQ

Standing jump length (m)

There is no consistency between person-to-person variation in IQ and long jump. This indicates a lack of correlation. The correlation coefficient of IQ and jump length for students is 0.

We've looked at extreme cases. In real measurements, the coefficients are rarely equal to exactly 1 or 0. In this case, the following scale is adopted:

  • if the coefficient is greater than 0.70 - the relationship between the indicators is strong;
  • from 0.30 to 0.70 - the connection is moderate,
  • less than 0.30 - the connection is weak.

If we evaluate on this scale the correlation we obtained above between reading and well-being, it turns out that this dependence is strong and negative -0.76. That is, there is a strong negative relationship between erudition and well-being. Which once again confirms the biblical wisdom about the relationship between wisdom and sorrow.

The given gradation gives very rough estimates and is rarely used in research in this form.

Gradations of coefficients according to significance levels are more often used. In this case, the actual coefficient obtained may be significant or not significant. This can be determined by comparing its value with the critical value of the correlation coefficient taken from a special table. Moreover, these critical values ​​depend on the size of the sample (the larger the volume, the lower the critical value).

Correlation analysis in psychology

The correlation method is one of the main ones in psychological research. And this is not accidental, because psychology strives to be an exact science. Does it work?

What is the peculiarity of laws in the exact sciences. For example, the law of gravity in physics operates without exception: the greater the mass of a body, the stronger it attracts other bodies. This physical law reflects the relationship between body mass and gravity.

In psychology, the situation is different. For example, psychologists publish data on the relationship of warm relationships in childhood with parents and the level of creativity in adulthood. Does this mean that any of the subjects with a very warm relationship with their parents in childhood will have very high creativity? The answer is unequivocal - no. There is no law like the physical one. There is no mechanism for the influence of childhood experience on adult creativity. These are our fantasies! There is data consistency (relationships - creativity), but there is no law behind them. But there is only correlation. Psychologists often refer to the identified relationships as psychological patterns, emphasizing their probabilistic nature - not rigidity.

The student study example from the previous section illustrates well the use of correlations in psychology:

  1. Analysis of the relationship between psychological indicators. In our example, IQ and the success of communication with the opposite sex are psychological parameters. Identification of the correlation between them expands the understanding of the mental organization of a person, of the relationship between various aspects of his personality - in this case, between the intellect and the sphere of communication.
  2. Analysis of the relationship of IQ with academic performance and jumping is an example of the relationship of a psychological parameter with non-psychological ones. The results obtained reveal the features of the influence of intelligence on educational and sports activities.

Here's what a summary of the results of a fictional study on students could look like:

  1. A significant positive relationship between the intelligence of students and their academic performance was revealed.
  2. There is a negative significant relationship between IQ and successful communication with the opposite sex.
  3. There was no connection between the IQ of students and the ability to jump from a place.

Thus, the level of intelligence of students acts as a positive factor in their academic performance, while at the same time having a negative impact on relationships with the opposite sex and not having a significant impact on sports success, in particular, the ability to jump from a place.

As you can see, the intellect helps students to learn, but prevents them from building relationships with the opposite sex. This does not affect their athletic performance.

The ambiguous influence of intelligence on the personality and activity of students reflects the complexity of this phenomenon in the structure of personality traits and the importance of continuing research in this direction. In particular, it seems important to analyze the relationship of intelligence with the psychological characteristics and activities of students, taking into account their gender.

Pearson and Spearman coefficients

Let's consider two calculation methods.

The Pearson coefficient is a special method for calculating the relationship of indicators between the severity of numerical values ​​in one group. Very simplified, it boils down to this:

  1. The values ​​of two parameters in the group of subjects are taken (for example, aggression and perfectionism).
  2. The average values ​​of each parameter in the group are found.
  3. The differences between the parameters of each subject and the average value are found.
  4. These differences are substituted into a special form for calculating the Pearson coefficient.

Spearman's rank correlation coefficient is calculated in a similar way:

  1. The values ​​of two indicators in the group of subjects are taken.
  2. The ranks of each factor in the group are found, that is, the place in the list in ascending order.
  3. The rank differences are found, squared and summed.
  4. Next, the rank differences are substituted into a special form to calculate the Spearman coefficient.

In Pearson's case, the calculation was based on the average value. Therefore, random data outliers (significant difference from the mean), for example, due to processing error or unreliable answers, can significantly distort the result.

In the case of Spearman, the absolute values ​​of the data do not play a role, since only their relative position in relation to each other (ranks) is taken into account. That is, data outliers or other inaccuracies will not seriously affect the final result.

If the test results are correct, then the differences between the Pearson and Spearman coefficients are insignificant, while the Pearson coefficient shows a more accurate value of the data relationship.

How to Calculate the Correlation Coefficient

The Pearson and Spearman coefficients can be calculated manually. This may be necessary for an in-depth study of statistical methods.

However, in most cases, when solving applied problems, including in psychology, it is possible to carry out calculations using special programs.

Calculation using Microsoft Excel spreadsheets

Let's go back to the students example and look at the data on their level of intelligence and the length of the jump from a place. Let's enter this data (two columns) into an Excel spreadsheet.

After moving the cursor to an empty cell, press the "Insert Function" option and select "CORREL" from the "Statistical" section.

The format of this function assumes the selection of two data arrays: CORREL(array 1; array"). We highlight the column with IQ and the length of the jumps, respectively.

In Excel tables, the formula for calculating only the Pearson coefficient is implemented.

Calculation with the program STATISTICA

We enter data on intelligence and the length of the jump in the field of initial data. Next, select the option "Nonparametric criteria", "Spearman". Select the parameters for the calculation and get the following result.


As you can see, the calculation gave a result of 0.024, which differs from the Pearson result - 0.038, obtained above using Excel. However, the differences are minor.

Using correlation analysis in psychology theses (example)

Most of the topics of final qualification works in psychology (diplomas, term papers, master's) involve a correlation study (the rest are related to identifying differences in psychological indicators in different groups).

The very term "correlation" in the titles of topics rarely sounds - it is hidden behind the following wording:

  • "The relationship between subjective feelings of loneliness and self-actualization in women of mature age";
  • “Peculiarities of the influence of the resilience of managers on the success of their interaction with clients in conflict situations”;
  • "Personal factors of stress resistance of employees of the Ministry of Emergency Situations."

Thus, the words "relationship", "influence" and "factors" are sure signs that the method of data analysis in empirical research should be correlation analysis.

Let us briefly consider the stages of its implementation when writing a thesis in psychology on the topic: "The relationship of personal anxiety and aggressiveness in adolescents."

1. For the calculation, raw data are required, which are usually the test results of the subjects. They are entered into a pivot table and placed in the application. This table is structured as follows:

  • each line contains data for one subject;
  • each column contains scores on one scale for all subjects.

subject number

Personal anxiety

Aggressiveness

2. It is necessary to decide which of the two types of coefficients - Pearson or Spearman - will be used. Recall that Pearson gives a more accurate result, but it is sensitive to outliers in the data. Spearman coefficients can be used with any data (except for the nominative scale), which is why they are most often used in psychology diplomas.

3. We enter the table of raw data into the statistical program.

4. Calculate the value.



5. The next step is to determine if the relationship is significant. The statistical program highlighted the results in red, which means that the correlations are statistically significant at a significance level of 0.05 (indicated above).

However, it is useful to know how to determine the significance manually. To do this, you need Spearman's critical values ​​table.

Table of critical values ​​of the Spearman coefficients

Level of statistical significance

Number of test subjects

p=0.05

p=0.01

p=0.001

0,88

0,96

0,99

0,81

0,92

0,97

0,75

0,88

0,95

0,71

0,83

0,93

0,67

0,63

0,77

0,87

0,74

0,85

0,58

0,71

0,82

0,55

0,68

0,53

0,66

0,78

0,51

0,64

0,76

We are interested in the significance level of 0.05 and the size of our sample of 10 people. At the intersection of these data, we find the value of the critical Spearman: Rcr=0.63.

The rule is this: if the Spearman empirical value obtained is greater than or equal to the critical value, then it is statistically significant. In our case: Remp (0.66) > Rcr (0.63), therefore, the relationship between aggressiveness and anxiety in the adolescent group is statistically significant.

5. In the text of the thesis, you need to insert data in a word format table, and not a table from a statistical program. Below the table, we describe the result obtained and interpret it.

Table 1

Spearman's coefficients of aggressiveness and anxiety in a group of adolescents

Aggressiveness

Personal anxiety

0,665*

* - statistically significant (p0,05)

Analysis of the data presented in Table 1 shows that there is a statistically significant positive relationship between the aggressiveness and anxiety of adolescents. This means that the higher the personal anxiety of adolescents, the higher the level of their aggressiveness. This result suggests that aggression for adolescents is one of the ways to relieve anxiety. Experiencing self-doubt, anxiety due to threats to self-esteem, especially sensitive in adolescence, a teenager often uses aggressive behavior, in such an unproductive way to reduce anxiety.

6. Is it possible to talk about influence when interpreting relationships? Can we say that anxiety affects aggressiveness? Strictly speaking, no. We have shown above that the correlation between phenomena is of a probabilistic nature and reflects only the consistency of changes in traits in a group. At the same time, we cannot say that this consistency is caused by the fact that one of the phenomena is the cause of the other, affects it. That is, the presence of a correlation between psychological parameters does not give grounds to talk about the existence of a causal relationship between them. However, practice shows that the term "influence" is often used when analyzing the results of correlation analysis.

In our world, everything is interconnected, somewhere it can be seen with the naked eye, and somewhere people do not even suspect the existence of such a relationship. Nevertheless, in statistics, when they mean mutual dependence, the term "correlation" is often used. It is often found in economic literature. Let's try together to figure out what the essence of this concept is, what are the coefficients and how to interpret the obtained values.

So what is correlation? As a rule, this term refers to the statistical relationship of two or more parameters. If the value of one or more of them changes, this inevitably affects the value of the others. To mathematically determine the strength of such interdependence, it is customary to use various coefficients. It should be noted that in the case when a change in one parameter does not lead to a regular change in another, but affects some statistical characteristic of this parameter, such a relationship is not a correlation, but simply a statistical one.

History of the term

In order to better understand what correlation is, let's dive into history a bit. This term appeared in the 18th century thanks to the efforts of a French paleontologist. This scientist developed the so-called “correlation law” of organs and parts of living beings, which made it possible to restore the appearance of an ancient fossil animal, having only some of its remains. In statistics, this word has come into use since 1886 with light hand English statistician and biologist The very name of the term already contains its decoding: not just and not only a connection - “relation”, but relations that have something in common with each other - “co-relation”. However, only a student of Galton, biologist and mathematician K. Pearson (1857 - 1936) could clearly explain mathematically what a correlation is. It was he who first deduced the exact formula for calculating the corresponding coefficients.

Pair correlation

This is the name of the relationship between two specific quantities. For example, it has been proven that the annual advertising spend in the United States is very closely related to the value of the gross domestic product. It is estimated that between these values ​​in the period from 1956 to 1977 was 0.9699. Another example is the number of visits to an online store and its sales volume. A close connection was found between such values ​​as beer and air temperature, the average monthly temperature for a particular place in the current and previous year, etc. How to interpret the pair correlation coefficient? We note right away that it takes a value from -1 to 1, and a negative number denotes an inverse relationship, while positive denotes a direct relationship. The greater the modulus of the calculation result, the stronger the values ​​​​influence each other. A zero value indicates the absence of dependence, a value less than 0.5 indicates a weak, and otherwise - a pronounced relationship.

Pearson correlation

Depending on the scale on which the variables are measured, one or another Fechner, Spearman, Kendall, etc.) are used for calculations. When examining interval values, they most often use an indicator invented by

This coefficient shows the degree of linear relationship between the two parameters. When people talk about correlation, most often they mean it. This indicator has become so popular that its formula is in Excel, and if you wish, you can figure out what correlation is in practice without going into the intricacies of complex formulas. The syntax for this function is: PEARSON(array1, array2). As the first and second arrays, the corresponding ranges of numbers are usually substituted.

A correlation between two quantities is a statistical relationship in which a change in one of the quantities leads to a systematic change in the other. A quantitative measure of correlation is the linear correlation coefficient (also called the Pearson correlation coefficient), calculated by the formula:

  • r xy is the correlation coefficient of the values ​​of x and y;
  • d x is the deviation of some value of the x series from the average value of this series;
  • d y is the deviation of some value of the y series from the average value of this series.

The range of possible values ​​of the correlation coefficient is between +1 and -1. In this case, the following options are possible:

  • +1 - direct relationship between the quantities;
  • |rxy| > 0.7 – pronounced dependence between the quantities;
  • 0.4 < |r xy| >0.7 - moderately pronounced dependence between the values;
  • |rxy|< 0.4 – слабо выраженная зависимость между величинами;
  • -1 - inverse relationship between the values.

It is important to note that the larger the sample of values, the lower the modulus of the correlation coefficient, we can say that there is a relationship between x and y. Unfortunately, there is a trap in the formula, which, in relation to financial instruments, can play a cruel joke on the investor. In the numerator, the deviations of the quantities can have both the same and different signs, so the product can also be both positive and negative. In the denominator, the deviations are squared, which guarantees the positiveness of the denominator. For now, we will just pay attention to it, and later we will return to what can come of it.

The practical meaning of calculating the correlation between financial instruments is to obtain important fundamental data necessary for making trading decisions. The reaction of the markets to the release of important economic news is expressed in the fact that at first the prices of the main assets (gold, oil, futures for industrial indices) come into motion, sometimes profitability. As a result, exchange rates and stock quotes change. By tracking the relationship of individual instruments, as well as the cause-and-effect relationship between price changes, you can quickly revise your trading and investment plans. In addition, correlation analysis is used in management as an obligatory part.

You can visualize the correlation of two quantities in the form of a graph in time-amplitude coordinates. For example, with a negative correlation, we get a similar picture:

Knowledge of asset correlation reduces portfolio risk


Let, for example, there are 2 assets. For simplicity, let's imagine that their prices depend on time according to the law of a sinusoid. Then, with a correlation of +1, we will get a complete overlap of waves and opening deals on both assets will be equivalent to doubling positions on one of them. Correlation -1, on the contrary, means mutual compensation of gains and losses of assets. Of course, well-matched assets do not generally move around the same level, but tend to rise over time. In addition, with some assets, growth in others allows minimizing the total risk of the portfolio:

A process called portfolio rebalancing allows you to generate income by alternating the proportion of assets in a portfolio. This is most easily achieved with a pronounced negative correlation. Suppose that initially the portfolio contained assets A and B with an inverse correlation and a 1:1 ratio, for a total amount of 1 million rubles. Within six months, asset A fell in price by 20% and its value from the initial 500 thousand rubles became 400 thousand rubles. Asset B, on the contrary, grew by 20% and its value rose to 600 thousand rubles. The total value of the portfolio has not changed and still amounts to 1 million rubles. Now we transfer 50% of asset B (300 thousand) to A and its value is now 700 thousand, and asset B is 300 thousand.

In the next six months, the opposite process occurs: assets return to their original price. Now asset A costs 840 thousand instead of 700 thousand, and asset B costs 240 thousand instead of 300 thousand. The total value of the portfolio, therefore, amounted to 1 million 80 thousand rubles, i.e. its profitability due to rebalancing is 8% per annum. Without rebalancing, the portfolio return would have been 0%. Real situations are much more complicated, because the correlations of most instruments are between +0.5 and -0.5. If we consider the risk-return chart for different ratios of two instruments with different values correlations, we get the following picture:

As can be seen, the lower the value of the correlation coefficient of instruments, the greater the possible return of the portfolio for the same value of risk, or the lower the risk for the same value of return.

Forex correlation

A common strategy based on the correlation of currency pairs is that in the event of a sharp deviation of the correlation coefficient from the current value, transactions are opened in the direction of restoring this value. For example, if the pairs EURUSD and GBPUSD long time moved in the same direction, then with a strong divergence, convergence can be expected if the divergence is not caused by long-term ones (for example, a change in the discount rate).

In addition, the correlation of currency pairs is used in a comprehensive assessment of the market. For example, on the eve of the 2008-2009 mortgage crisis, when the Australian and New Zealand dollars, as well as the British pound, had a high key rate, a trading strategy called carry trade was greatly developed. It consisted in the fact that during favorable events for the stock markets, pairs of these currencies with the yen, which traditionally has a very low rate, grew especially actively, and they also actively decreased during adverse events.

Despite the fact that no correlation can affect absolutely all time intervals and multidirectional movements of currencies are possible, but a pronounced unidirectional movement, as a rule, indicates the presence of a common fundamental “driver”. This makes it easier to plan deals. In particular, it makes no sense to look for rollbacks and work within the day if all clearly correlated pairs go in the same direction.

You can view the real-time correlation table of currency pairs and some other instruments at myfxbook.com/forex-market/correlation. This table shows that the EURUSD and AUDCAD pairs practically do not correlate with each other. In the case of simultaneous opening of deals on these pairs, you can not be afraid of either the summation of losses or the overlapping of profits for one pair by a loss for another.

This chart shows how the Australian and New Zealand dollars, which are inversely correlated with safe-haven currencies the yen and the Swiss franc, rose strongly during the period of the largest key rate differential. This trend reversed after a period of rate cuts began as the mortgage crisis deepened.

There are no effects without a cause

Asset price correlation is somewhat similar to trends: the longer the time interval for its calculation, the slower it changes. But there is something that distinguishes correlation from many other methods. It can be calculated for pairs of assets that are not traded on any exchange (oil-gas, oil-gold), which allows you to supplement the analyst's arsenal valuable information, which allows you to "read the market between charts".

Any correlation of two or more quantities always has a causal relationship. One of the quantities is decisive, on which the other (or others) depends. Correlation in the stock market is no exception. For example, in the oil-gas pair, oil quotes were decisive for a long time. In the chart below, you can see that the expansion of the spread between oil and gas due to a sharp relative increase in gas was replaced by an equally sharp return to relative equilibrium:

At the same time, in another pair of assets, gold-oil, gold is already decisive. With a significant expansion (a sharp rise or fall in oil with more stable gold), it is oil that restores the disturbed equilibrium:

Tracking this behavior of "slave" assets, you can open deals in the direction of restoring the balance. By the way, the correlation is often based on the binding of certain currencies to commodity assets. They are called so: "commodity currencies". For example, the Canadian dollar and the ruble are highly dependent on oil. In both cases, the correlation is direct: the more expensive oil, the higher the rate of these currencies against the US dollar.

In the case of the ruble, the chart correlation is so clear that it can be used in a trading strategy. Consider early 2014. Oil is trading around $110 per barrel, after which it rises slightly higher for a while. The ruble at this time, on the contrary, from 33 to the US dollar briefly drops to 36. At some point, the correlation becomes almost inverse, but the balance is quickly restored and the ruble returns to the rate of 33 per dollar, obediently following oil. Even more a prime example we see at the end of 2014, when there was a sharp weakening of the ruble against the backdrop of a much more smoothly declining oil. And this time, the disturbed balance was soon restored thanks to the strengthening of the ruble. Over time, the correlation can undergo strong changes and even go from direct to reverse. This was especially evident in the case of the correlation between the Dow Jones Industrial Average and the RTS.

At the end of 2007, when the first signs of the US mortgage crisis began to appear, the DJ index turned down, but the RTS index, thanks to active growth oil quotes, was just approaching a historical maximum. However, in the future, a sharp collapse in all stock indices of the world affected oil as well. This led to the fact that the RTS index in terms of the rate of decline was almost 2 times higher than DJ. In addition to oil, the overall outflow of capital from emerging markets also affected the rate of decline of the RTS index.

However, the crisis was short-lived and already at the beginning of 2009 was replaced by economic growth. A high correlation between DJ and RTS was observed until April 2012, which was marked by the exhaustion of the possibilities of the raw material model for the development of the Russian economy. Starting this year, even expensive oil no longer provided economic growth. In the future, the economic recession in Russia only worsened against the backdrop of cheaper oil, while the American economy received an additional stimulus for growth. The correlation between and has become inverse.

In itself, the presence of a correlation between assets does not mean that it is possible to build a trading or investment strategy on this. Let's say we're interested in the correlation of IBM stock over the past 12 months (see impactopia.com/correlation). So, in 4th place in terms of correlation is Banco Santander (about 0.43). Most likely, this is just a coincidence or a systemic flaw in the method of calculating correlations.

math trap

As I mentioned above, the formula for calculating the correlation coefficient is very sensitive to the signs of deviations of values ​​from their average values. If these deviations often have the same signs, it turns out high value correlation coefficient. But will this value make sense? The answer is not at all obvious. Consider practical example. Suppose, on the graphs of two quantities at the same time there is:

Then the new values ​​of these quantities will systematically appear on the same side of their average values. This will result in a high positive correlation. Unfortunately, this information will not be of any use, because. except for the presence of a gap, there is nothing in common between the charts. It's only good example the fact that when calculating the correlation it is allowed to use only stationary series of values, i.e. series that do not have a trend component. This means that the calculation of correlations in the world of financial assets inevitably leads to an overestimation of the significance of factors that in reality are of a random nature. Understand correctly: it is important not to seek out these factors and introduce special corrections for them, but to show the very essence of the phenomenon and not to look for another Grail where it does not exist.

However, not everything is so bad. There is a way to get around the influence of trends by calculating the correlation not of the prices themselves, but of their increments. Then the gap mentioned above will turn out to be a statistical outlier, which practically does not affect the result. It remains only to wait until such an approach prevails. It is not always possible to find fresh data on the correlation of assets. In such cases, they can be calculated using Microsoft Excel. To do this, quotes are written as two ranges of cells, and then the function is written in one of the free cells the following kind:=CORREL(array 1; array 2). An array might look like this, for example: A1:A100. To calculate the correlation by price increments, this program is doubly useful, because on the basis of closing prices, you must first calculate the increments themselves.

Summary

The correlation between asset prices is an important tool for both data analysis and risk management in portfolio investment. But, like all statistical approaches, it is not without serious drawbacks:

  • the presence of a pronounced correlation between data in the past cannot guarantee it in the future;
  • the mathematical model used has large errors during trend periods.

The use of the correlation approach will bring maximum benefit in addition to other methods of analysis and money management. In the comments, I propose to discuss how you can earn on the correlation of specific assets. I gave my examples in the article, I am waiting for yours for discussion.

All profit!

Scientific concepts are always popular. The verb "correlate" is widely used by journalists and politicians, sometimes out of place. Usually the term "correlation" refers to any connection.

People have long noticed that all the phenomena occurring on our planet, to some extent, influence each other. It is not always easy to find connections between them, but, nevertheless, they exist. Speaking of the interdependence of events, the word "correlation" is often used. Most often it is used by economists and analysts.

Let's figure out what this concept actually means.

Correlation: definition

Perhaps the first in scientific world the paleontologist Georges Cuvier spoke about the correlation. At the turn of the 18th and 19th centuries, he made a number of discoveries in the field of comparative anatomy. As a result of these discoveries, Cuvier formulated the law of the ratio of parts, according to which changes in the structure of one of the organs of an animal lead to changes in the structure of other organs. Based on this knowledge, Cuvier learned to restore the appearance of fossil animals from individual surviving fragments.

As for statistics, the concept of correlation was fixed in this science later - at the end of the 19th century, thanks to the English biologist Francis Galton.

Correlation is not just a relation, but rather a relationship or co-relation.

The formula for obtaining the correlation coefficient was derived by Galton's student, mathematician and biologist K. Pearson.

Correlation coefficient

Correlation is a statistical relationship of any quantities independent of each other. It is assumed that as soon as the value of one of the parameters changes, the value of the other also changes. If the changes relate only to individual statistical characteristics, this kind of relationship is considered statistical. There is no correlation in this case.

The correlation coefficient is used to express the degree of interdependence. The range of coefficient values ​​is from -1 to +1.

  • If the correlation is absolute and positive (+1), then when one security rises in price, the other will rise in price to the same extent.
  • Speaking of absolute negative correlation, we mean that if the value of one security increases, then the value of a negatively correlated one falls.
  • If the correlation coefficient is zero, then there is no interdependence between the movements valuable papers no: they are random.

The higher the value of the coefficient, the more interdependence is manifested. If the value of the coefficient is greater than 0.5, then the relationship is pronounced.

It should be clarified that the absolute correlation of securities exists only in an ideal world. In real life, stocks are only correlated to some extent.

Pair correlation

This term is used to refer to the relationship between two specific quantities. It is known that advertising spending in the United States has a significant impact on the volume of the GDP of this country. The correlation coefficient between these values ​​based on the results of observations that lasted for 20 years is 0.9699.

A more “down to earth” example is the relationship between the traffic of an online store page and the volume of its sales.

And, of course, hardly anyone will deny the existence of a relationship between air temperature and sales of beer or ice cream.

Correlation is the interdependence of two quantities; the correlation coefficient is an objective indicator that determines the degree of this interdependence. The correlation coefficient can be both positive and negative. As for securities, they are extremely rarely absolutely correlated.

What else to read