Discrimination of populations under covariance matrix heterogeneity and non-normal random vectors in genetic diversity studies

Autores

  • Vitor Prado de Carvalho Universidade Federal de Viçosa
  • Ithalo Coelho de Sousa Universidade Federal de Viçosa
  • Moysés Nascimento Universidade Federal de Viçosa
  • Ana Carolina C. Nascimento Universidade Federal de Viçosa
  • Cosme Damião Cruz Universidade Federal de Viçosa

DOI:

https://doi.org/10.15361/1984-5529.2018v46n4p344-352

Resumo

Genetic diversity analysis has guided the choice of appropriate parents in breeding programs. Multivariate statisti­cal methods such as discriminant analysis are used to obtain the necessary results in these studies. However, to obtain reliable results, one must meet assumptions such as covariance matrix heterogeneity and multivariate normality of the observation vector. Artificial Neural Network (ANN), Support Vector Machine (SVM), Decision Tree (DT) and its refinements do not have these assumptions and may be used in the choice of appropriate par­ents. This study evaluates the robustness of the Fisher’s discriminant function under covariance matrix heteroge­neity and multivariate non-normal random vectors. The results were compared with those obtained from Quadratic Discriminant Analysis (QDA), ANN, SVM and DT. Scenarios characterized by heterogeneous covariance matrices and multivariate non-normal random vectors were simulated. Considering the apparent error rate (APER), the SVM method (APER-Normal = 0.07; APER-Poisson = 0.13) and quadratic discriminant method (APER-Normal = 0.09; APER-Poisson = 0.09) presented better results for scenarios simulated with covariance matrix heteroscedasticity. For scenarios with multivariate normality and covariance matrix homoscedasticity, the SVM (APER = 0.15) and ANN (APER = 0.06) presented best results. For situations in which the data had multi­variate Poisson distribution and covariance matrix homogeneity, the SVM (APER = 0.15), Fisher’s discriminant function (APER = 0.19) and ANN (APER = 0.19) presented better performances. Finally, DT refinements (Bagging, Random Forest and Boosting) presented APER values less than 0.25 and are shown to be alternatives. 

Biografia do Autor

Vitor Prado de Carvalho, Universidade Federal de Viçosa

Estudande de Doutorado em Estatística Aplicada e Biometria

Ithalo Coelho de Sousa, Universidade Federal de Viçosa

Estudande de Doutorado em Estatística Aplicada e Biometria

Moysés Nascimento, Universidade Federal de Viçosa

Professor adjunto pelo Departamento de Estatística.

Ana Carolina C. Nascimento, Universidade Federal de Viçosa

Professor adjunto pelo Departamento de Estatística.

Cosme Damião Cruz, Universidade Federal de Viçosa

Professor adjunto pelo Departamento de Biologia Geral.

Publicado

28/11/2018

Edição

Seção

Estatística - Statistics