Spurious Correlation and the Closure Property of Compositional Data in Geological Sciences
Subject Areas :
1 - university of Kashan
Keywords: Closedness Property, Compositional data, Log-ratio transformations, Robust statistical method, Spurious correlation.,
Abstract :
In the field of earth sciences, measurements typically yield compositional data that has a property known as closedness. The application of common statistical methods to compositional data results in the exclusion of spurious correlations, which in turn yields findings that are not representative of the underlying data. This article presents a set of transformations for the opening of closed systems of compositional data. These transformations include the additive logarithmic ratio (alr), the centered logarithmic ratio (clr), and the isometric logarithmic ratio (ilr). All of the aforementioned transformations are defined in terms of logarithms of ratios. The clr transformation was then applied to a soil chemical data set. The results of applying cluster analysis on the clr-transformed data were also analyzed using Spearman's correlation coefficient matrix as distance. Furthermore, the impact of the clr transformation on spurious correlations, skewness, and outliers in the data was evaluated using R statistical software.
اعلمی نیا، ز.، منصوری اصفهانی، م.، طباطبايی، س. ح. و بختیاری، ن. م.، 1397. شناسایی و پیجویی ناهنجاریهای زمینشناسی همراه با کانیسازی مس در چهارگوش 1:100000 نطنز (شمال اصفهان)، ایران. بلورشناسی و کانیشناسی ایران، (۳)26، 625-634.
- حسین پور نجاتی، س.، سیاه چشم، ک.، علوی، س. غ. و زرگری، پ.، ۱۴۰۰. تحلیل پتانسیل کانیزایی با استفاده از روش تحلیل فاکتوری مرحلهای (SFA) در گستره خوشنامه، هشجین، استان اردبیل. فصلنامه زمینشناسی ایران، 57، 13-1.
-حیدریان دهکردی، ن.، توکل، م. ح. و پورمحمدی، س.، 1396. پتانسیل سنجی رسوبات آبراههای منجیل با استفاده از GIS . فصلنامه زمینشناسی ایران، 43، 108-95.
-محمدی اصل، ز.، سعيدی، ع.، آرین، م.، سلگي ع. و فرهادي نژاد، ط.، ۱۳۹۹. جداسازي آنوماليهاي ژئوشيميايي از زمينه با استفاده از روش فرکتالي عيار-تعداد در محدوده وشنوه (جنوب قم). فصلنامه زمینشناسی ایران، 53، 73-61.
- Aitchison, J., 1986. The Statistical Analysis of Compositional Data, Chapman and Hall/CRC, New York.
- Chayes, F., 1960. On correlation between variables of constant sum. Journal of Geophysical Research, 65(12), 4185–4193.
- Egozcue, J.J. and Pawlowsky-Glahn, V., 2005. Groups of parts and their balances in compositional data analysis. Mathematical Geology, 37, 795–828.
- Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G. and Barceló-Vidal, C., 2003. Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35, 279-300.
- Filzmoser, P. and Hron. K., 2008. Outlier detection for compositional data using robust methods. Mathematical Geosciences, 40, 233-248.
- Filzmoser, P. and Hron, K, 2009. Correlation analysis for compositional data. Mathematical Geosciences, 41(9), 905-919.
- Filzmoser, P., Hron, K. and Reimann, C., 2009. Univariate statistical analysis of environmental (compositional) data: problems and possibilities. Science of the Total Environmen, 407, 6100–6108.
- Filzmoser, P., Horn, K. and Templ, M., 2018. Applied Compositional Data Analysis with Worked Examples in R. Springer, Switzerland.
- Gerald van den Boogaart, K. and Tolosana-Delgado, R., 2013. Analyzing Compositional Data with R. Springer, New York.
- Miesch, A.T. and Chapman, R. P., 1977. Log-transformation in geochemistry. Mathematical Geology, 9(2), 191-194.
- Pearson, K., 1897. Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proceedings of the Royal Society of London, 60, 489-498.
- Pendleton, B. F., Newman, I. and Marshall, R. S., 1983. A Monte Carlo approach to correlation spuriousness and ratio variables. Statist Comput Simul, 18, 93-124.
- Reimann, C. and Filzmoser, P., 2000. Normal and lognormal data distribution in geochemistry: death of a myth. Consequences for the statistical treatment of geochemical and environmental data. Environmental Geology, 39, 1001–1014.
- Reimann, C., Filzmoser, P., Garrett, R. and Dutter, R., 2008. Statistical Data Analysis Explained - Applied Environmental Statistics with R. John Wiley and Sons, London.
- Reimann, C., Filzmoserand, P., Hron, K., Kynčlová P. and Garrett, R., 2017. A new method for correlation analysis of compositional (environmental) data – a worked example. Science of the Total Environment, 607–608, 965–971.