В научных исследованиях часто возникает the need to find a connection between the resultant and factor variables (the yield of a crop and the amount of precipitation, height and weight of a person in homogeneous groups by sex and age, pulse rate and body temperature, etc.).
The second are the signs contributing to the change of those associated with them (the first).
There are many definitions of the term.Based on the foregoing, we can say that correlation analysis is a method used to test the hypothesis of the statistical significance of two or more variables, if the researcher can measure them, but not change them.
Есть и другие определения рассматриваемого concepts Correlation analysis is a method of processing statistical data, which consists in studying the correlation coefficients between variables. At the same time, correlation coefficients are compared between one pair or multiple pairs of attributes, in order to establish statistical relationships between them. Correlation analysis is a method for studying the statistical relationship between random variables with the optional presence of a strict functional nature, in which the dynamics of one random variable leads to the dynamics of the mathematical expectation of another.
When conducting correlation analysis is necessarytake into account that it can be carried out in relation to any combination of features, often absurd in relation to each other. Sometimes they have no causal connection with each other.
In this case, they say a false correlation.
Based on the above definitions, you canformulate the following tasks of the described method: obtain information about one of the desired variables using the other; to determine the closeness of the relationship between the studied variables.
Correlation analysis involves determining the relationship between the studied characteristics, and therefore the task of correlation analysis can be supplemented with the following:
Effective factors depend on one toseveral factors. The method of correlation analysis can be applied if there are a large number of observations about the magnitude of the effective and factorial indicators (factors), while the factors under study should be quantitative and reflected in specific sources. The first can be determined by a normal law — in this case, the result of the correlation analysis is the Pearson correlation coefficients, or, in case the signs do not obey this law, the Spearman's rank correlation coefficient is used.
При применении данного метода необходимо determine the factors that influence the performance indicators. They are selected taking into account the fact that there must be a causal relationship between the indicators. In the case of creating a multifactor correlation model, those that have a significant impact on the resulting indicator are selected, while interdependent factors with a pair correlation coefficient of more than 0.85 are preferable not to be included in the correlation model, as well as those for which the link with the effective parameter is non-linear or functional character.
The results of the correlation analysis can be presented in text and graphic forms. In the first case, they are presented as a correlation coefficient, in the second - as a scatter diagram.
In the absence of correlation between the parameters of the pointthe diagram is randomly located, the average degree of communication is characterized by a greater degree of orderliness and is characterized by a more or less uniform distance from the median. A strong bond tends to a straight line and, with r = 1, the scatter plot is a flat line. The inverse correlation differs in the directionality of the graph from the upper left to the lower right, the straight line from the lower left to the upper right corner.
In addition to the traditional 2D representation of the scatter diagram, a 3D display of a graphical representation of correlation analysis is currently used.
Also used is a scatterplot matrix,which displays all the paired graphics in one picture in a matrix format. For n variables, the matrix contains n rows and n columns. The diagram located at the intersection of the i-th row and the j-th column is a graph of Xi variables compared to Xj. Thus, each row and column is one dimension, a single cell displays a scatter diagram of two dimensions.
The tightness of the correlation is determined bycorrelation coefficient (r): strong - r = ± 0.7 to ± 1, medium - r = ± 0.3 to ± 0.699, weak - r = 0 to ± 0.299. This classification is not strict. The figure shows a slightly different scheme.
In the UK, a curious study was undertaken. It is devoted to the connection of smoking with lung cancer, and was carried out by correlation analysis. This observation is presented below.
Professional group | smoking | mortality |
Farmers, foresters and fishermen | 77 | 84 |
Miners and quarry workers | 137 | 116 |
Manufacturers of gas, coke and chemicals | 117 | 123 |
Glass and Ceramics Manufacturers | 94 | 128 |
Workers for furnaces, forging, casting and rolling mills | 116 | 155 |
Electrical and Electronics Workers | 102 | 101 |
Engineering and related professions | 111 | 118 |
Woodworking production | 93 | 113 |
Tanners | 88 | 104 |
Textile workers | 102 | 88 |
Workwear Manufacturers | 91 | 104 |
Employees of the food, drinking and tobacco industry | 104 | 129 |
Paper and printing manufacturers | 107 | 86 |
Manufacturers of other products | 112 | 96 |
Builders | 113 | 144 |
Artists and Decorators | 110 | 139 |
Drivers of stationary engines, cranes, etc. | 125 | 113 |
Workers not included elsewhere | 133 | 146 |
Transport and communications workers | 115 | 128 |
Warehouse workers, storekeepers, packers and workers filling machines | 105 | 115 |
Office workers | 87 | 79 |
Sellers | 91 | 85 |
Sports and recreation workers | 100 | 120 |
Administrators and Managers | 76 | 60 |
Professionals, technicians and artists | 66 | 51 |
We begin the correlation analysis. The solution is better to start for clarity with a graphical method, for which we construct a scatter diagram (scatter).
She demonstrates a direct link.However, on the basis of only a graphical method, it is difficult to make a definitive conclusion. Therefore, we continue to perform correlation analysis. An example of the calculation of the correlation coefficient is presented below.
Using software tools (for example, MSExcel will be described below) we determine the correlation coefficient, which is 0.716, which means a strong relationship between the parameters under study. Let us determine the statistical reliability of the obtained value according to the corresponding table, for which we need to subtract 25 values from 25 pairs, as a result we will get 23 and by this line in the table we find r critical for p = 0.01 (since this is medical data, more strict dependence, in other cases it is enough p = 0.05), which is 0.51 for this correlation analysis. The example showed that r calculated is greater than r critical, the value of the correlation coefficient is considered statistically significant.
Described type of statistical data processingcan be carried out using software, in particular, MS Excel. Correlation analysis in Excel involves calculating the following parameters using functions:
1. The correlation coefficient is determined using the CORREL [CORREL] function (array1; array2). An array of 1.2 is a cell of the range of values of the effective and factor variables.
The linear correlation coefficient is also called the Pearson correlation coefficient, and therefore, starting with Excel 2007, you can use the PEARSON function with the same arrays.
Graphic display of the correlation analysis in Excel is performed using the "Charts" panel with the choice of "Scatter chart".
After specifying the source data we get the graph.
2. Estimation of the significance of the pair correlation coefficient using Student's t-test. Calculated t-value сравнивается с табличной (критической) величиной of this indicator from the corresponding table of values of the parameter in question, taking into account a given level of significance and the number of degrees of freedom. This assessment is carried out using the function TINV (probability; degrees_freedom).
3. Matrix of pair correlation coefficients.The analysis is performed using the “Data Analysis” tool, in which the “Correlation” is selected. Statistical evaluation of pair correlation coefficients is carried out when comparing its absolute value with a tabular (critical) value. If the calculated pair correlation coefficient exceeds the critical one, we can say, given the given degree of probability, that the null hypothesis about the significance of a linear relationship is not rejected.
Use in scientific research methodcorrelation analysis allows to determine the relationship between various factors and performance indicators. It should be borne in mind that a high correlation coefficient can also be obtained from an absurd pair or a variety of data, and therefore this type of analysis should be carried out on a sufficiently large data array.
After obtaining the calculated value of r itIt is desirable to compare with r critical to confirm the statistical reliability of a certain value. Correlation analysis can be carried out manually using formulas, or using software tools, in particular MS Excel. Here you can build a scatter diagram (dispersion) in order to visualize the relationship between the studied factors of correlation analysis and the resultant mark.