The standard values ​​of the alpha significance level. The concept of the level of statistical significance

When justifying a statistical inference, one should decide where is the line between accepting and rejecting the null hypothesis? Due to the presence of random influences in the experiment, this boundary cannot be drawn absolutely exactly. It is based on the concept significance level. Significance level is the probability of incorrectly rejecting the null hypothesis. Or, in other words, significance level - is the probability of a Type I error in decision making. To denote this probability, as a rule, they use either the Greek letter α or the Latin letter R. In what follows, we will use the letter R.

Historically, in the applied sciences that use statistics, and in particular in psychology, it is believed that the lowest level of statistical significance is the level p = 0.05; sufficient - level R= 0.01 and the highest level p = 0.001. Therefore, in the statistical tables that are given in the appendix to textbooks on statistics, tabular values ​​\u200b\u200bare usually given for the levels p = 0,05, p = 0.01 and R= 0.001. Sometimes tabular values ​​are given for levels R - 0.025 and p = 0,005.

The values ​​0.05, 0.01 and 0.001 are the so-called standard levels of statistical significance. In the statistical analysis of experimental data, the psychologist, depending on the objectives and hypotheses of the study, must choose the required level of significance. As you can see, here the largest value, or the lower limit of the level of statistical significance, is 0.05 - this means that five errors are allowed in a sample of one hundred elements (cases, subjects) or one error out of twenty elements (cases, subjects). It is believed that neither six, nor seven, nor more times out of a hundred, we can make a mistake. The cost of such mistakes would be too high.

Note that in modern statistical software packages on computers, not standard significance levels are used, but levels calculated directly in the process of working with the corresponding statistical method. These levels, denoted by the letter R, can have a different numeric expression in the range from 0 to 1, for example, p = 0,7, R= 0.23 or R= 0.012. It is clear that in the first two cases the significance levels obtained are too high and it is impossible to say that the result is significant. At the same time, in the latter case, the results are significant at the level of 12 thousandths. This is a valid level.

The rule for accepting a statistical conclusion is as follows: on the basis of the experimental data obtained, the psychologist calculates the so-called empirical statistics, or empirical value, using the statistical method chosen by him. It is convenient to denote this value as H emp . Then empirical statistics H emp compared with two critical values, which correspond to the 5% and 1% significance levels for the selected statistical method and which are denoted as H kr . Quantities H kr are found for a given statistical method according to the corresponding tables given in the appendix to any textbook on statistics. These quantities, as a rule, are always different and, for convenience, they can be further referred to as H cr1 and H kr2 . Critical values ​​found from the tables H cr1 and H kr2 It is convenient to represent in the following standard notation:

We emphasize, however, that we have used the notation H emp and H kr as an abbreviation of the word "number". In all statistical methods, their symbolic designations of all these quantities are accepted: both the empirical value calculated by the corresponding statistical method, and the critical quantities found from the corresponding tables. For example, when calculating Spearman's rank correlation coefficient from the table of critical values ​​of this coefficient, the following values ​​of critical values ​​were found, which for this method are denoted by the Greek letter ρ ("ro"). So for p = 0.05 value found according to the table ρ kr 1 = 0.61 and for p = 0.01 value ρ kr 2 = 0,76.

In the standard notation adopted below, it looks like this:

Now we need to compare our empirical value with the two critical values ​​found in the tables. This is best done by placing all three numbers on the so-called "significance axis". The “significance axis” is a straight line, at the left end of which is 0, although it, as a rule, is not marked on this straight line itself, and the number series increases from left to right. In fact, this is the usual school x-axis OH Cartesian coordinate system. However, the peculiarity of this axis is that three sections, “zones”, are distinguished on it. One extreme zone is called the zone of insignificance, the second extreme zone is called the zone of significance, and the intermediate zone is called the zone of uncertainty. The boundaries of all three zones are H cr1 for p = 0.05 and H kr2 for p = 0.01, as shown in the figure.

Depending on the decision rule (inference rule) prescribed in this statistical method, two options are possible.

Option 1: The alternative hypothesis is accepted if H empH kr .

Or the second option: the alternative hypothesis is accepted if H empH kr .

Counted H emp according to some statistical method, it must necessarily fall into one of the three zones.

If the empirical value falls into the zone of insignificance, then the hypothesis H 0 about the absence of differences is accepted.

If a H emp fell into the zone of significance, the alternative hypothesis H 1 is accepted about there are differences, and the hypothesis H 0 is rejected.

If a H emp falls into the zone of uncertainty, the researcher faces a dilemma. So, depending on the importance of the problem being solved, he can consider the obtained statistical estimate reliable at the level of 5%, and thus accept the hypothesis H 1, rejecting the hypothesis H 0 , or - unreliable at the level of 1%, thus accepting the hypothesis H 0 . We emphasize, however, that this is exactly the case when a psychologist can make mistakes of the first or second kind. As discussed above, in these circumstances it is best to increase the sample size.

We also emphasize that the value H emp can exactly match either H cr1 or H kr2 . In the first case, we can assume that the estimate is reliable exactly at the level of 5% and accept the hypothesis H 1 , or, conversely, accept the hypothesis H 0 . In the second case, as a rule, the alternative hypothesis H 1 about the presence of differences is accepted, and the hypothesis H 0 is rejected.

p-value(eng.) - the value used when testing statistical hypotheses. In fact, this is the probability of error when rejecting the null hypothesis (error of the first kind). Hypothesis testing using the P-value is an alternative to the classic testing procedure through the critical value of the distribution.

Usually, the P-value is equal to the probability that a random variable with a given distribution (the distribution of the test statistic under the null hypothesis) will take on a value no less than the actual value of the test statistic. Wikipedia.

In other words, the p-value is the smallest level of significance (i.e., the probability of rejecting a true hypothesis) for which the computed test statistic leads to rejection of the null hypothesis. Typically, the p-value is compared to generally accepted standard significance levels of 0.005 or 0.01.

For example, if the value of the test statistic calculated from the sample corresponds to p = 0.005, this indicates a 0.5% probability of the hypothesis being true. Thus, the smaller the p-value, the better, since it increases the “strength” of rejecting the null hypothesis and increases the expected significance of the result.

An interesting explanation of this is on Habré.

Statistical analysis is starting to look like a black box: the input is data, the output is a table of main results and a p-value.

What does p-value say?

Suppose we decided to find out if there is a relationship between the addiction to bloody computer games and aggressiveness in real life. For this, two groups of schoolchildren of 100 people each were randomly formed (group 1 - shooter fans, group 2 - not playing computer games). For example, the number of fights with peers acts as an indicator of aggressiveness. In our imaginary study, it turned out that the group of schoolchildren-gamblers did conflict with their comrades noticeably more often. But how do we find out how statistically significant the resulting differences are? Maybe we got the observed difference quite by accident? To answer these questions, the p-value is used - this is the probability of getting such or more pronounced differences, provided that there are actually no differences in the general population. In other words, this is the probability of getting such or even stronger differences between our groups, provided that, in fact, computer games do not affect aggressiveness in any way. It doesn't sound that difficult. However, this particular statistic is often misinterpreted.

p-value examples

So, we compared two groups of schoolchildren with each other in terms of the level of aggressiveness using a standard t-test (or a non-parametric Chi test - the square of the more appropriate in this situation) and found that the coveted p-significance level is less than 0.05 (for example, 0.04). But what does the resulting p-significance value actually tell us? So, if p-value is the probability of getting such or more pronounced differences, provided that there are actually no differences in the general population, then what do you think is the correct statement:

1. Computer games are the cause of aggressive behavior with a 96% probability.
2. The probability that aggressiveness and computer games are not related is 0.04.
3. If we got a p-level of significance greater than 0.05, this would mean that aggressiveness and computer games are not related in any way.
4. The probability of getting such differences by chance is 0.04.
5. All statements are wrong.

If you chose the fifth option, then you are absolutely right! But, as numerous studies show, even people with significant experience in data analysis often misinterpret p-values.

Let's take each answer in order:

The first statement is an example of the correlation error: the fact that two variables are significantly related tells us nothing about cause and effect. Maybe it's more aggressive people who prefer to spend time playing computer games, and it's not computer games that make people more aggressive.

This is a more interesting statement. The thing is that we initially take it for granted that there really are no differences. And, keeping this in mind as a fact, we calculate the p-value. Therefore, the correct interpretation is: "Assuming that aggressiveness and computer games are not related in any way, then the probability of getting such or even more pronounced differences was 0.04."

But what if we got insignificant differences? Does this mean that there is no relationship between the studied variables? No, it only means that there may be differences, but our results did not allow us to detect them.

This is directly related to the definition of p-value itself. 0.04 is the probability of getting these or even more extreme differences. In principle, it is impossible to estimate the probability of obtaining exactly such differences as in our experiment!

These are the pitfalls that can be hidden in the interpretation of such an indicator as p-value. Therefore, it is very important to understand the mechanisms underlying the methods of analysis and calculation of the main statistical indicators.

How to find p-value?

1. Determine the expected results of your experiment

Usually, when scientists conduct an experiment, they already have an idea of ​​what results to consider "normal" or "typical." This may be based on the experimental results of past experiments, on reliable data sets, on data from the scientific literature, or the scientist may be based on some other sources. For your experiment, define the expected results, and express them as numbers.

Example: For example, earlier studies have shown that in your country, red cars are more likely to get speeding tickets than blue cars. For example, average scores show a 2:1 preference for red cars over blue ones. We want to determine if the police have the same prejudice against the color of cars in your city. To do this, we will analyze the fines issued for speeding. If we take a random set of 150 speeding tickets issued to either red or blue cars, we would expect 100 tickets to be issued to red cars and 50 to blue ones if the police in our city are as biased towards the color of cars as this observed throughout the country.

2. Determine the observable results of your experiment

Now that you have determined the expected results, you need to experiment and find the actual (or "observed") values. You again need to represent these results as numbers. If we create experimental conditions, and the observed results differ from the expected ones, then we have two possibilities - either this happened by chance, or this is caused precisely by our experiment. The purpose of finding the p-value is precisely to determine whether the observed results differ from the expected ones in such a way that one can not reject the "null hypothesis" - the hypothesis that there is no relationship between the experimental variables and the observed results.

Example: For example, in our city, we randomly selected 150 speeding tickets that were issued to either red or blue cars. We determined that 90 tickets were issued to red cars and 60 to blue ones. This is different from the expected results, which are 100 and 50, respectively. Did our experiment (in this case, changing the data source from national to urban) produce this change in results, or is our city police biased in exactly the same way as the national average and we see just a random variation? The p-value will help us determine this.

3. Determine the number of degrees of freedom of your experiment

The number of degrees of freedom is the degree of variability in your experiment, which is determined by the number of categories you are exploring. The equation for the number of degrees of freedom is Number of degrees of freedom = n-1, where "n" is the number of categories or variables you are analyzing in your experiment.

Example: In our experiment, there are two categories of results: one category for red cars, and one for blue cars. Therefore, in our experiment, we have 2-1 = 1 degree of freedom. If we were comparing red, blue and green cars, we would have 2 degrees of freedom, and so on.

4. Compare expected and observed results using the chi-square test

Chi-square (written "x2") is a numerical value that measures the difference between the expected and observed values ​​of an experiment. The equation for the chi-square is x2 = Σ((o-e)2/e) where "o" is the observed value and "e" is the expected value. Sum the results of the given equation for all possible outcomes (see below).

Note that this equation includes the summation operator Σ (sigma). In other words, you need to calculate ((|o-e|-.05)2/e) for each possible outcome, and add the numbers together to get the chi-square value. In our example, we have two possible outcomes - either the car that received the penalty is red or blue. So we have to count ((o-e)2/e) twice - once for the red cars, and once for the blue cars.

Example: Let's plug our expected and observed values ​​into the equation x2 = Σ((o-e)2/e). Remember that because of the summation operator, we need to count ((o-e)2/e) twice - once for the red cars, and once for the blue cars. We will make this work as follows:
x2 = ((90-100)2/100) + (60-50)2/50)
x2 = ((-10)2/100) + (10)2/50)
x2 = (100/100) + (100/50) = 1 + 2 = 3.

5. Choose a Significance Level

Now that we know the number of degrees of freedom in our experiment, and we know the value of the chi-square test, we need to do one more thing before we can find our p-value. We need to determine the level of significance. In simple terms, the level of significance indicates how confident we are in our results. A low value for significance corresponds to a low probability that the experimental results were obtained by chance, and vice versa. Significance levels are written as decimal fractions (such as 0.01), which corresponds to the probability that we obtained the experimental results by chance (in this case, the probability of this being 1%).

By convention, scientists typically set the significance level of their experiments to 0.05, or 5%. This means that experimental results that meet such a criterion of significance could only be obtained with a probability of 5% purely by chance. In other words, there is a 95% chance that the results were caused by how the scientist manipulated the experimental variables, and not by chance. For most experiments, 95% confidence that there is a relationship between two variables is enough to consider that they are “really” related to each other.

Example: For our example with red and blue cars, let's follow the convention between the scientists and set the significance level to 0.05.

6. Use a chi-squared distribution datasheet to find your p-value

Scientists and statisticians use large spreadsheets to calculate the p-value of their experiments. Table data usually have a vertical axis on the left, corresponding to the number of degrees of freedom, and a horizontal axis on the top, corresponding to the p-value. Use the data in the table to first find your number of degrees of freedom, then look at your series from left to right until you find the first value greater than your chi-square value. Look at the corresponding p-value at the top of your column. Your p-value is between this number and the next one (the one to the left of yours).

Chi-squared distribution tables can be obtained from many sources (here you can find one at this link).

Example: Our chi-square value was 3. Since we know that there is only 1 degree of freedom in our experiment, we will select the very first row. We go from left to right along this line until we encounter a value greater than 3, our chi-square test value. The first one we find is 3.84. Looking up our column, we see that the corresponding p-value is 0.05. This means that our p-value is between 0.05 and 0.1 (the next highest p-value in the table).

7. Decide whether to reject or keep your null hypothesis

Since you have determined the approximate p-value for your experiment, you need to decide whether to reject the null hypothesis of your experiment or not (recall, this is the hypothesis that the experimental variables you manipulated did not affect the results you observed). If your p-value is less than your significance level, congratulations, you have proven that there is a very likely relationship between the variables you manipulated and the results you observed. If your p-value is higher than your significance level, you cannot be sure whether the results you observed were due to pure chance or manipulation of your variables.

Example: Our p-value is between 0.05 and 0.1. This is clearly no less than 0.05, so unfortunately we cannot reject our null hypothesis. This means that we have not reached the minimum 95% chance of saying that the police in our city are issuing tickets to red and blue cars with a probability that is quite different from the national average.

In other words, there is a 5-10% chance that the results we observe are not the consequences of a change in location (analysis of the city, not the whole country), but simply an accident. Since we required an accuracy of less than 5%, we cannot say that we are sure that the police in our city are less biased towards red cars - there is a small (but statistically significant) chance that this is not the case.

Fundamentals of the theory of testing statistical hypotheses.

The concept of statistical hypothesis

Statistical hypothesis- this is an assumption about the type of distribution or about the values ​​of unknown parameters of the general population, which can be verified on the basis of sample indicators.

Examples of statistical hypotheses:

The general population is distributed according to the Gauss law (normal law).

The variances of two normal populations are equal.

To estimate the value of general parameters according to sample indicators in biology, the so-called null hypothesis , i.e. the assumption that that the general parameters judged from sample data do not differ from each other, and that the difference observed between sample indicators is not systematic, but purely random.

Together with the put forward hypothesis, a hypothesis that contradicts it is also considered. If the hypothesis put forward is rejected, then an alternative hypothesis takes place. It is useful to distinguish between them.

Zero (But) called the proposed hypothesis.

Alternative (N 1)- a hypothesis that contradicts the null one.

There are hypotheses that contain only one and more than one assumption.

and a hypothesis, which consists of a finite or infinite number of simple hypotheses - difficult .

It should be emphasized the statistical nature of the described method for testing the null hypothesis, expressed, in particular, in the fact that the statement about the validity of the null hypothesis is not accepted absolutely, but only at a certain level of significance.

THE LEVEL OF SIGNIFICANCE is the percentage of unlikely cases that contradict the accepted hypothesis, call it into question.

In biological studies, a significance level of 5% is usually taken, which corresponds to a probability of P=0.05.

In more critical cases, when the conclusions should be especially strict, the level of significance is taken



1% or P=0.01 and

0.1% or P = 0.001.

Thus, the probability, which was decided to be neglected when estimating the general parameters from the data of sample observations, is expressed by the accepted level of significance.

The probability of the opposite cases, when the hypothesis is credible, is called CONFIDENCE PROBABILITY.

Usually in research practice, three confidence thresholds are used:

P 1 =0.95; P 2 =0.99; P 3 \u003d 0.999

Probabilities P 1 =0.95; corresponds to t = 1.96

P 2 =0.99; corresponds to t = 2.58

P 2 =0.999; corresponds to t = 3.29

The value of the confidence probability or the level of significance when testing hypotheses is set by the researcher himself, depending on the degree of accuracy with which the study is carried out and the responsibility of the conclusions arising from it.

If P≥0.05 or P<0,95, то отвергать нулевую гипотезу нет оснований.

If R<0,05 или Р≥0,95, нулевая гипотеза отвергается.

Errors of the 1st and 11th kind. Significance criterion.

Significance level. Critical area

The decision to reject or accept a statistical hypothesis is made on the basis of sample data. Therefore, one has to take into account the possibility of an erroneous decision. Distinguish between Type I and Type II errors.

Type 1 error is that the correct hypothesis will be rejected (i.e. the null hypothesis will be rejected, at the time when it is true)

Type I error is that the wrong hypothesis will be accepted (i.e. the null hypothesis will be accepted, at the time when it is not true)

When discarding the null hypothesis, there is a probability that it is still true (i.e., we make a type I-ro error), this probability is denoted by α. The probability α is called the level of significance.

Significance level α is the probability of making a mistake

The probability of a type II error is denoted by ß, and the value

1-ß-call the power of the criterion .

The higher the power, the lower the probability of a Type II error.


The permissible percentage of possible errors of the first kind is a matter of mutual agreement, among other things, the possible consequences of making an erroneous decision should be taken into account here. False decisions, such as in an examination, can have more serious consequences than an erroneously declared purity of a chemical reagent. Therefore, in the first case, a higher certainty and, consequently, a lower number of possible type 1 errors should be provided than in the second case.

The following rules are usually followed.

The hypothesis being tested is discarded if a Type 1 error can occur in less than 100α = 1% of all cases (i.e. α 0.01). Then the considered difference is considered significant.

A testable hypothesis is accepted when a type 1 error is possible in more than 100α = 5% of all cases (α 0.05). Then the considered difference is considered insignificant.

The hypothesis under consideration should be discussed further if the number of possible type I errors lies between 5% and 1% (0.01 0.05). The detected difference is interpreted as disputable. Often additional measurements can clarify the situation. If, for any reason, additional measurements are not enough, then the data obtained should be interpreted based on the worst case.

The choice of α is a matter of agreement, sometimes it is enough to choose 100α = 10%, in some cases, in practice, the possibility of an erroneous decision should be excluded (for example, when assessing the toxic effect of a pharmaceutical preparation). Then the tested hypothesis is discarded as soon as the number of possible errors of the 1st kind reaches such a negligible level, such as, for example, 100α = 0.1%.

Errors of the 1st and 2nd kind depend on each other. The less will be α, the more there will be β ( and vice versa). Therefore, there is no point in choosing a value of α that is too small for the significance test, since the unknown grows very large because of this. ß. Choice α refers to the planning phase of the experiment!

After the significance level is set, a rule is found according to which the given hypothesis is accepted or rejected. Such a rule is called statistical criterion.

Statistical test- the rule according to which the null hypothesis is accepted or rejected.

The construction of the criterion consists in choosing the appropriate function T= T(X 1, ..., Xn) from observations X 1 , ... X n , which serves as a measure of the discrepancy between the experimental and hypothetical values.


This Function, which is a random variable, is called criterion statistics.

Criterion statistics- a specially developed random variable, the distribution function of which is known.

It is assumed that the probability distribution T \u003d T (1, ..., X p) can be calculated under the assumption that the hypothesis being tested is true and that this distribution does not depend on the characteristics of the hypothetical distribution.

After choosing a certain criterion, the set of all possible values ​​is divided into two non-overlapping subsets: one of them contains the criterion values ​​under which the null hypothesis is rejected, and the other - under which it is accepted, i.e. on the critical region and the region of acceptance of the hypothesis.

Critical area is the set of criterion values ​​at which the null hypothesis is rejected.

Area of ​​acceptance of the hypothesis is the set of criterion values ​​under which the null hypothesis is accepted.

The Basic Principle of Hypothesis Testing can be formulated as follows: if the observed value of the criterion belongs to the critical region, the hypothesis is rejected; if the observed value of the criterion belongs to the area of ​​acceptance of the hypothesis, the hypothesis is accepted.

Since the criterion T = T(X 1, ..., X p) is a one-dimensional random variable, all its possible values ​​belong to a certain interval. Therefore, the critical region and the hypothesis acceptance region are also intervals, and hence there are points that separate them. Such points are called critical.

Critical values ​​of the criterion are the points separating the critical region from the hypothesis acceptance region.

critical value T cr is found from the distribution of statistics T such that if the hypothesis is true, then the probability of an event (T critical region) is equal to α, a - a predetermined significance level, i.e. this is the value of T cr statistics T for which P(T critical region) = α.

There are unilateral (right-sided or left-sided) and bilateral critical regions. They are determined from the following expressions:

right-handed - P (T> T cr) \u003d α;

left-sided - P (T<Т кр) = α

bilateral - P(T Tcr2) =a Tcr1

If the distribution of the criterion is symmetrical with respect to zero, then Р(Т<-Т кр) = Р(Т>T CR), hence we get P(T>T CR)= a/2.

Rice. 37. Critical areas: left-sided, right-sided, bilateral

Critical points are found from tables corresponding to the distribution of the criterion.

Significance tests are divided into parametric and nonparametric.

The former are built on the basis of the parameters of the sample and represent the functions of these parameters,

the second - functions from the variant of the given set with their frequencies.

Parametric criteria are applicable only when the population from which the sample is taken is normally distributed.

Nonparametric tests applicable to distributions of various shapes. The latter have certain advantages over parametric ones, due to the lower requirements for their application, a greater range of possibilities and, often, greater ease of implementation. Of course, one must also take into account the often lower accuracy of these criteria compared to parammetric ones.

The results of statistical testing methods are often inconvenient for analysts. In many cases they make insignificant (a>O,O5) or disputed differences, although on the basis of subjective experience a "true" difference has already been established. In such cases, additional measurements often help. The more results obtained, the smaller the differences will be reliably recorded. In no case should one be tempted to replace exact data with dubious ones based on subjective assessment.

Significance level - is the probability that we considered the differences significant, but they are actually random.

When we indicate that differences are significant at the 5% significance level, or at R< 0,05 , then we mean that the probability that they are still unreliable is 0.05.

When we indicate that differences are significant at the 1% significance level, or at R< 0,01 , then we mean that the probability that they are still unreliable is 0.01.

If we translate all this into a more formalized language, then the significance level is the probability of rejecting the null hypothesis, while it is true.

Mistake,consisting ofthe onewhat werejectednull hypothesis,while it is true is called a type 1 error.(See Table 1)

Tab. 1. Null and alternative hypotheses and possible test states.

The probability of such an error is usually denoted as α. In fact, we would have to put in parentheses not p < 0.05 or p < 0.01, and α < 0.05 or α < 0,01.

If the error probability is α , then the probability of a correct decision: 1-α. The smaller α, the greater the probability of a correct solution.

Historically, in psychology, it is customary to consider the 5% level (p≤0.05) as the lowest level of statistical significance: the 1% level is sufficient (p≤0.01) and the highest 0.1% level ( p≤0.001), therefore, in the tables of critical values, the values ​​of the criteria are usually given, corresponding to the levels of statistical significance p≤0.05 and p≤0.01, sometimes - p≤0.001. For some criteria, the tables indicate the exact level of significance of their different empirical values. For example, for φ*=1.56 p=0.06.

Until, however, the level of statistical significance reaches p=0.05, we are not yet entitled to reject the null hypothesis. We will adhere to the following rule of rejecting the hypothesis of no differences (HO) and accepting the hypothesis of statistical significance of differences (H 1).

Rule of rejection Ho and acceptance h1

If the empirical value of the criterion equals or exceeds the critical value corresponding to p≤0.05, then H 0 is rejected, but we cannot yet definitely accept H 1 .

If the empirical value of the criterion equals or exceeds the critical value corresponding to p≤0.01, then H 0 is rejected and H 1 is accepted.

Exceptions : G sign test, Wilcoxon T test, and Mann-Whitney U test. They are inversely related.

Rice. 4. An example of the “significance axis” for the Rosenbaum Q test.

The critical values ​​of the criterion are designated as Q o.o5 and Q 0.01, the empirical value of the criterion as Q emp. It is enclosed in an ellipse.

To the right of the critical value Q 0.01 extends the "significance zone" - empirical values ​​fall here that exceed Q 0.01 and, therefore, are certainly significant.

To the left of the critical value of Q 0.05, the "zone of insignificance" extends - empirical values ​​of Q fall here, which are below Q 0.05, and, therefore, are unconditionally insignificant.

We see that Q 0,05 =6; Q 0,01 =9; Q emp. =8;

The empirical value of the criterion falls within the range between Q 0.05 and Q 0.01. This is a zone of "uncertainty": we can already reject the hypothesis about the unreliability of differences (H 0), but we cannot yet accept the hypotheses about their reliability (H 1).

In practice, however, the researcher can consider significant already those differences that do not fall into the zone of insignificance, declaring that they are significant at p < 0.05, or indicating the exact level of significance of the obtained empirical value of the criterion, for example: p=0.02. With the help of standard tables that are in all textbooks on mathematical methods, this can be done in relation to the Kruskal-Wallis H criteria, χ 2 r Friedman, L Page, φ* Fisher .

The level of statistical significance or the critical values ​​of the criteria are defined differently when testing directed and undirected statistical hypotheses.

With a directional statistical hypothesis, a one-tailed test is used, with an undirected hypothesis, a two-tailed test. The two-tailed test is more stringent because it tests for differences in both directions, and therefore the empirical value of the test that previously corresponded to the p significance level < 0.05, now corresponds only to the p level < 0,10.

We don't have to decide for ourselves each time whether he uses a one-tailed or two-tailed test. The tables of critical values ​​of the criteria are selected in such a way that the directional hypotheses correspond to a one-sided criterion, and the non-directional hypotheses correspond to a two-sided criterion, and the given values ​​satisfy the requirements that apply to each of them. The researcher only needs to ensure that his hypotheses coincide in meaning and form with the hypotheses proposed in the description of each of the criteria.

Determine expected in your experiment results. Usually, when scientists conduct an experiment, they already have an idea of ​​what results to consider "normal" or "typical." This may be based on the experimental results of past experiments, on reliable data sets, on data from the scientific literature, or the scientist may be based on some other sources. For your experiment, define the expected results and express them as numbers.

  • Example: Let's say earlier research has shown that in your country, red car owners are more likely to get speeding tickets than blue ones. For example, average scores show a 2:1 preference for red cars over blue ones. Our task is to determine if the police are equally biased towards the color of cars in your city. To do this, we will analyze the fines issued for speeding. If we take a random set of 150 speeding tickets issued to either red or blue car owners, we expect that 100 fines will be issued to owners of red cars, and 50 - owners of blue, if the police in our city are as prejudiced about the color of cars as they are throughout the country.

Determine observed the results of your experiment. Now that you've determined your expected results, it's time to experiment and find the actual (or "observed") values. You again need to represent these results as numbers. If we create experimental conditions and the observed results different from expected, then we have two possibilities - either it happened by accident, or it is caused with our experiment. The purpose of finding the p-value is precisely to determine whether the observed results differ from the expected ones in such a way that one can not reject the "null hypothesis" - the hypothesis that there is no relationship between the experimental variables and the observed results.

  • Example: Let's say in our city we randomly selected 150 speeding tickets that were issued to either red or blue car owners. We have determined that 90 fines were issued to owners of red cars, and 60 - blue owners. This is different from the expected results, which are 100 and 50, respectively. Did our experiment (in this case, changing the data source from state to city level) actually lead to this change in results, or is our city police biased against motorists? similar, like the national average, and we see just a random deviation? The p-value will help us determine this.
  • Determine the number degrees of freedom your experiment. The number of degrees of freedom is the degree of variability in your experiment, which is determined by the number of categories you are exploring. The equation for the number of degrees of freedom is Number of degrees of freedom = n-1, where "n" is the number of categories or variables that you analyze in your experiment.

    • Example: in our experiment, there are two categories of outcomes: one category for red car owners and one for blue car owners. Therefore, in our experiment we have 2-1 = 1 degree of freedom. If we compared red, blue and green cars, we would have 2 degrees of freedom, and so on.
  • Compare expected and observed results with a test chi-square. Chi-square (written "x 2") is a numerical value that measures the difference between expected and observable experiment values. The equation for chi-square is the following: x 2 \u003d Σ ((o-e) 2 / e) where "o" is the observed value and "e" is the expected value. Sum the results of the given equation for all possible outcomes (see below).

    • Note that this equation includes the summation operator Σ (sigma). In other words, you need to calculate ((|o-e|-.05) 2 /e) for each possible outcome and add the numbers together to get the chi-square value. In our example, we have two possible outcomes - either the car that received the penalty is red or blue. So we have to count ((o-e) 2 /e) twice - once for the red cars and once for the blue cars.
    • Example: Let's plug our expected and observed values ​​into the equation x 2 = Σ((o-e) 2 /e). Remember that because of the summation operator, we need to count ((o-e) 2 /e) twice - once for the red cars and once for the blue ones. We will make this work as follows:
      • x 2 = ((90-100) 2/100) + (60-50) 2/50)
      • x 2 = ((-10) 2/100) + (10) 2/50)
      • x 2 = (100/100) + (100/50) = 1 + 2 = 3 .
  • Select significance level. Now that we know the number of degrees of freedom in our experiment and the value of the chi-square test, we need to do one more thing before we can find our p-value. We need to determine the level of significance. In simple terms, the level of significance indicates how confident we are in our results. A low value for significance corresponds to a low probability that the experimental results are random and vice versa. Significance levels are written as decimal fractions (such as 0.01), which corresponds to the probability that we obtained the experimental results by chance (in this case, the probability of this being 1%).

  • Use the chi-square distribution datasheet to find the p-value. Scientists and statisticians use large spreadsheets to calculate the p-value of their experiments. Table data usually have a vertical axis on the left, corresponding to the number of degrees of freedom, and a horizontal axis on the top, corresponding to the p-value. Use the table data to first find your number of degrees of freedom, then look at your series from left to right until you find the first value, more your chi-square value. Look at the corresponding p-value at the top of your column. The p-value you need is between this number and the next one (the one to the left of yours).

    • Chi-squared distribution tables can be obtained from a variety of sources - they can simply be found online, or looked up in science or statistics books. If you don't have these books handy, use the picture above, or use an online spreadsheet that you can view for free, such as medcalc.org. She is located .
    • Example: Our chi-square value was 3. So let's use the chi-square distribution table in the image above to find an approximate p-value. Since we know that in our experiment all 1 degree of freedom, choose the very first row. We go from left to right along the given line until we meet a value greater than 3 , our chi-square value. The first one we find is 3.84. Looking up our column, we see that the corresponding p-value is 0.05. This means that our p-value between 0.05 and 0.1(next p-value in the table in ascending order).
  • Decide whether to reject or leave the null hypothesis. Since you have determined the approximate p-value for your experiment, you need to decide whether to reject the null hypothesis of your experiment or not (recall, this is the hypothesis that the experimental variables you manipulated not influenced the results you observed). If the p-value is less than the significance level - congratulations, you have proven that there is a very likely relationship between the variables you manipulated and the results you observed. If the p-value is higher than the significance level, it is not possible to say with certainty whether the results you observed were the result of pure chance or manipulation of these variables.

    • Example: Our p-value is between 0.05 and 0.1. It's clear not less than 0.05, so unfortunately we we cannot reject our null hypothesis. This means that we have not reached a minimum of 95% probability of saying that the police in our city issue tickets to owners of red and blue cars with a probability that is quite different from the national average.
    • In other words, there is a 5-10% chance that the results we observe are not the consequences of a change in location (analysis of the city, not the whole country), but simply an accident. Since the accuracy we claim should not exceed 5%, we cannot say with certainty that the police in our city are less biased towards the owners of red cars - there is a small (but statistically significant) probability that this is not the case.
  • What else to read