International Journal of Intelligent Information Systems

| Peer-Reviewed |

Discriminant Analysis for the Eigenvalues of Variance Covariance Matrix of FFT Scaling of DNA Sequences: An Empirical Study of Some Organisms

Received: Jan. 19, 2019    Accepted: Mar. 11, 2019    Published: Mar. 27, 2019
Views:       Downloads:

Share This Article

Abstract

Many studies discussed different numerical representations of DNA sequences. One naive approach for exploring the nature of a DNA sequence is to assign numerical values (or scales) to the nucleotides and then proceed with standard time series methods. The analysis will depend actually on the particular assignment of numerical values.Discriminant analysis aims to examine the dependence of one qualitative (classification) variable from several quantitative variables according to number of variations of qualitative variable we can distinction. Actually, there is a discriminant analysis for two or more groups. The essential work of discriminant analysis is to get the optimal assigning rules that will minimize the likelihood of incorrect classification of elements. In this paper, we discussed the discriminant analysis of the first, second, third and fourth eigenvalues of variance covariance matrix of Fast Fourier Transform (FFT) for numerical values representation of DNA sequences of five organisms, Human, E. coli, Rat, Wheat and Grasshopper. The analysis is based on three methods (All Variables, Forward Selection and Backward Selection) of discrimination. Functions have been reached whereby discrimination is made among organisms under consideration. Empirical studies are conducted to show the value of our point of view and the applications based on. Therefore, we recommended that, other empirical studies should be done for other organisms and statistical methods by using the point of view adopted here. Also, aspects stated here must be used in an applied manner for DNA sequences discrimination.

DOI 10.11648/j.ijiis.20190801.15
Published in International Journal of Intelligent Information Systems ( Volume 8, Issue 1, February 2019 )
Page(s) 26-42
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

FFT Scaling, DNA, Classification, Discriminant Analysis (DA), All Variables, Forward Selection, Backward Selection, Wilks-Lambda, Eigenvalue

References
[1] Büyüköztürk, S. and Çokluk-Bökeoğlu, O. (2008) “Discriminant Function Analysis: Concept and Application”, Eurasian Journal of Educational Research, 33, PP. 73-74.
[2] Alexakos, C. E. (1966) “Predictive Efficiency of two Multivariate Statistical Techniques in Comparison with Clinical Predictions”, Journal of Educational Psychology, 57, PP. 297-306.
[3] Chastian, K. (1969) “Prediction of Success in Audio-lingual and Cognitive Classes”, Lan-guage Learning, 19, PP. 27-39.
[4] Stahmann, R. F. (1969) “Predicting Graduation Major Field from Freshman Entrance Data”, Journal of Counseling Psychology, Vol. 16, PP. 109-113.
[5] Anderson, G. J., Walberg, H. J., and Welch, W. W. (1969) “Curriculum Effects on the Social Climate of Learning: A new Representation of Discriminant Functions”, American Educational Research Journal, No. 6, PP. 315-328.
[6] Saupe, J. L. (1965) “Factorial-design Multiple Discriminant Analysis: A description and An illustration”, American Educational Research Journal, Vol. 2, PP. 175-184.
[7] Tatsuoka, M. M., and Tiedeman, D. V. (1954) “Discriminant Analysis. Review of Educational Research”, No. 24, PP. 402-420.
[8] Fisher, R. A. (1936) “The Use of Multiple Measurements in Taxonomic Problems”, Annals of Eugenics, 7.
[9] Fisher, R. A. (1938) “The Statistical Utilization of Multiple Measurements”, Annals of Eugenics, Vol. 8, PP. 376-386.
[10] Solovyev, V. and Salamov, A. (1997) “The Gene-Finder Computer Tools for Analysis of Human and Model Organisms Genome Sequences”, Department of Cell Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, American Association for Artificial Intelligence (www.aaai.org), PP. 294- 302.
[11] Ghosh, D. (1993). “Status of the Transcription Factors Database (TFD)”. Nucl. Acids Res., Vol. 21, PP. 3117-3118.
[12] Wingender, E. (1994) “Recognition of Regulatory Regions Genomic Sequences”, J. Biotechnol. 35, PP. 273-280.
[13] Zhang, M. Q. (2000) “Discriminant Analysis and its Application in DNA Sequence Motif Recognition”, Henry Stewart Publications 1467-5463, Briefings in Bionformatics, Vol. 1, No. 4.
[14] Dudoit, S., Fridlyand, J. and Speed, T. P. (2000) “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data”, Department of Statistics, University of California, Berkeley, Berkeley, CA 94720-3860, sandrine@stat.berkeley.edu, PP. 1-43.
[15] Kwon, S., Chu, Y. H., Yi, H. S. and Han, C. (2001) “DNA Microarray Data Analysis for Cancer Classification Based on Stepwise Discriminant Analysis and Bayesian Decision Theory”, Genome Informatics 12, PP. 252-254.
[16] Liu, Z. H., Jiao, D. and Sun, X. (2005) “Classifying Genomic Sequences by Sequence Feature Analysis”, State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China, Geno. Prot. Bioinfo., Vol. 3, No. 4, PP. 201-205.
[17] Guo, Y., Hastie, T. and Tibshirani, R. (2005) “Regularized Discriminant Analysis and Its Application in Microarrays”, Printed in Great Britain, Biostatistics, Vol. 1, No. 1, PP. 1–18.
[18] Jombart, T., Devillard, S. and Balloux, F. (2010) “Discriminant Analysis of Principal Components: A new Method for the Analysis of Genetically Structured Populations”, Jombart et al. BMC Genetics, 11:94, http://www.biomedcentral.com/1471-2156/11/94, PP. 1-15.
[19] Jin, J. and An, J. (2011) “Robust Discriminant Analysis and its Application to Identify Protein Coding Regions of Rice Genes”, Contents lists available at Science Direct, Mathematical Biosciences, journal homepage: www.elsevier.com/locate/mbs
[20] Libbrecht, M. W. and Noble, W. S. (2015) “Machine Learning Applications in Genetics and Genomics”, Nature Reviews | Genetics, Vol. 16, PP. 321-332.
[21] Corvelo, A., Clarke, W. E., Robine, N. and Zody, M. C. (2018) “TaxMaps: Comprehensive and Highly Accurate Taxonomic Classification of Short-read Data in Reasonable Time”, New York Genome Center, New York 10013, USA, Published by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/18; www.genome.org, Genome Research, Vol. 28, PP. 751–758.
[22] Polovinkina, A., Krylova, I., Druzhkova, P., Ivanchenkoa, M., Meyerova, I., Zaikina, A., and Zolotykha, N. (2016) “Solving Problems of Clustering and Classification of Cancer Diseases Based on DNA Methylation”, Data Pattern Recognition and Image Analysis, Vol. 26, No. 1, PP. 176–180.
[23] Waterman, M. and Vingron, M. (1994) “Sequence Comparison Significance and Poisson Approximation”, Stat. Sci., Vol. 9, PP. 367–381.
[24] McLachlan, A. and Stewart, M. (1976) “The 14-fold Periodicity in Alpha-Tropomyosin and the Interaction with Actin “, J. Mol. Biol., Vol. 103, PP. 271–298.
[25] Eisenberg, D., Weiss, R. M., Terwillger, T. C., (1994) “The Hydrophobic Moment Detects Periodicity in Protein Hydrophobicity”. Proc. Natl. Acad. Sci., Vol. 81, PP. 140–144.
[26] Stoffer, D. (2012) “Frequency Domain Techniques in the Analysis of DNA Sequences”, Handbook of Statistics Volume 30, PP. 261-295.
[27] Tavar´e, S., Giddings, B. (1989) “Some Statistical Aspects of the Primary Structure of Nucleotide Sequences”, In Waterman M. S. (Ed), Mathematical Methods for DNA Sequences. CRC Press, Boca Raton, Florida, PP. 117–131.
[28] Viari, A., Soldano, H. and Ollivier, E. (1990) “A Scale-independent Signal Processing Method for Sequence Analysis. Comput. Appl. Biosci., Vol. 6, PP. 71–80.
[29] Marhon, S. and Kremer, S. (2011) “Gene Prediction Based on DNA Spectral Analysis: A literature Review”, J Comput Biol., Apr, Vol. 18, No. 4, 639-76.
[30] Bajic, V., Bajic, I. and Hide, W. (2000) “A new Method of Spectral Analysis of DNA/RNA and Protein sequences” Centre for Engineering Research.
[31] Han, Y., Han, L., Yao, Y., Li, Y. and Liu, X. (2018) “Key Factors in FTIR Spectroscopic Analysis of DNA: The Sampling Technique, Pretreatment Temperature and Sample Concentration”, Analytical Methods, Issue Vol. 21, No. 10, PP. 2436-2443.
[32] Ruiz, G., Israel, Godínez, I., Ramos, S., Ruiz, S., Pérez, H. and Morales, J. (2018) “Genomic Signal Processing for DNA Sequence Clustering” PeerJ v. 6; DOI 10.7717/peerj.4264.
[33] Hoang, T., Yin, C., Zheng, H. Yu, C., Lucy He, R. and Yau, S. (2015) “A new Method to Cluster DNA Sequences Using Fourier Power Spectrum”, J Theor Biol. 7; 372:135-45.
[34] Mabrouk, M. (2017) “Advanced Genomic Signal Processing Methods in DNA Mapping Schemes for Gene Prediction Using Digital Filters”, American Journal of Signal Processing, Vol. 7, No. 1, PP. 12-24.
[35] Roy, M. and Barman, S. (2011) “Spectral Analysis of Coding and Non-coding Regions of a DNA Sequence by Parametric and Nonparametric Methods: A comparative Approach”, Annals of Faculty Engineering Hunedoara– International Journal Of Engineering; Tome IX; Faccicule 3; PP. 57-62.
[36] Galleani, L. and Garello, R. (2006) “Spectral Analysis of DNA Sequences by Entropy Minimization”, 14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September, PP. 4-8.
[37] Stoffer, D., Tyler, D. and McDougall, A. (1993) “Spectral Analysis for Categorical Time Series: Scaling and the Spectral Envelope”; Biometrika, Vol. 80, PP. 611–622.
[38] Stankovičová, I. and Vojtková, M. (2007) “Viacrozmerné Statistické Metódy s Aplikáciami”, Bratislava, Iura Edition.
[39] Kočišová, K. and Mišanková, M. (2013) “Discriminant Analysis as A tool for Forecasting Company´s Financial Health”, Contemporary Issues in Business, Management and Education, University of Žilina, Faculty of Operation and Economics of Transport and Communications, Department of Economics, Procedia - Social and Behavioral Sciences 110, PP. 1148-1157.
[40] Meloun, M., Militký, J., and Hill, M. (2005) “Počítačová Analýza Vícerozměrných Dát v Príkladech”, Praha: Academia.
[41] Muhameed, A. S. and Saleh, A. M. (2014) “Classification of Some Iraqi Soils Using Discriminant Analysis”, Dept. of Soil Sci. and Water Res. Agric. College – Univ. of Baghdad, IOSR Journal of Agriculture and Ve terinary Science (IOSR-JAVS), e-ISSN: 2319-2380, p-ISSN: 2319-2372., Vol. 7, Issue 1 Ver.
[42] Ayinla, A. S. and Adekunle, B. K. (2015) “An Overview and Application of Discriminant Analysis in Data Analysis”, IOSR Journal of Mathematics (IOSR-JM), e-ISSN: 2278-5728, p-ISSN: 2319-765X, Volume 11, Issue 1 Ver. V, PP. 12-15.
[43] Härdle, W. K. and Simar, L. (2012) “Applied Multivariate Statistical Analysis”, Sixth Edition. Copyrighted Material.
Cite This Article
  • APA Style

    Salah Hamza Abid, Jinan Hamza Farhood. (2019). Discriminant Analysis for the Eigenvalues of Variance Covariance Matrix of FFT Scaling of DNA Sequences: An Empirical Study of Some Organisms. International Journal of Intelligent Information Systems, 8(1), 26-42. https://doi.org/10.11648/j.ijiis.20190801.15

    Copy | Download

    ACS Style

    Salah Hamza Abid; Jinan Hamza Farhood. Discriminant Analysis for the Eigenvalues of Variance Covariance Matrix of FFT Scaling of DNA Sequences: An Empirical Study of Some Organisms. Int. J. Intell. Inf. Syst. 2019, 8(1), 26-42. doi: 10.11648/j.ijiis.20190801.15

    Copy | Download

    AMA Style

    Salah Hamza Abid, Jinan Hamza Farhood. Discriminant Analysis for the Eigenvalues of Variance Covariance Matrix of FFT Scaling of DNA Sequences: An Empirical Study of Some Organisms. Int J Intell Inf Syst. 2019;8(1):26-42. doi: 10.11648/j.ijiis.20190801.15

    Copy | Download

  • @article{10.11648/j.ijiis.20190801.15,
      author = {Salah Hamza Abid and Jinan Hamza Farhood},
      title = {Discriminant Analysis for the Eigenvalues of Variance Covariance Matrix of FFT Scaling of DNA Sequences: An Empirical Study of Some Organisms},
      journal = {International Journal of Intelligent Information Systems},
      volume = {8},
      number = {1},
      pages = {26-42},
      doi = {10.11648/j.ijiis.20190801.15},
      url = {https://doi.org/10.11648/j.ijiis.20190801.15},
      eprint = {https://download.sciencepg.com/pdf/10.11648.j.ijiis.20190801.15},
      abstract = {Many studies discussed different numerical representations of DNA sequences. One naive approach for exploring the nature of a DNA sequence is to assign numerical values (or scales) to the nucleotides and then proceed with standard time series methods. The analysis will depend actually on the particular assignment of numerical values.Discriminant analysis aims to examine the dependence of one qualitative (classification) variable from several quantitative variables according to number of variations of qualitative variable we can distinction. Actually, there is a discriminant analysis for two or more groups. The essential work of discriminant analysis is to get the optimal assigning rules that will minimize the likelihood of incorrect classification of elements. In this paper, we discussed the discriminant analysis of the first, second, third and fourth eigenvalues of variance covariance matrix of Fast Fourier Transform (FFT) for numerical values representation of DNA sequences of five organisms, Human, E. coli, Rat, Wheat and Grasshopper. The analysis is based on three methods (All Variables, Forward Selection and Backward Selection) of discrimination. Functions have been reached whereby discrimination is made among organisms under consideration. Empirical studies are conducted to show the value of our point of view and the applications based on. Therefore, we recommended that, other empirical studies should be done for other organisms and statistical methods by using the point of view adopted here. Also, aspects stated here must be used in an applied manner for DNA sequences discrimination.},
     year = {2019}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Discriminant Analysis for the Eigenvalues of Variance Covariance Matrix of FFT Scaling of DNA Sequences: An Empirical Study of Some Organisms
    AU  - Salah Hamza Abid
    AU  - Jinan Hamza Farhood
    Y1  - 2019/03/27
    PY  - 2019
    N1  - https://doi.org/10.11648/j.ijiis.20190801.15
    DO  - 10.11648/j.ijiis.20190801.15
    T2  - International Journal of Intelligent Information Systems
    JF  - International Journal of Intelligent Information Systems
    JO  - International Journal of Intelligent Information Systems
    SP  - 26
    EP  - 42
    PB  - Science Publishing Group
    SN  - 2328-7683
    UR  - https://doi.org/10.11648/j.ijiis.20190801.15
    AB  - Many studies discussed different numerical representations of DNA sequences. One naive approach for exploring the nature of a DNA sequence is to assign numerical values (or scales) to the nucleotides and then proceed with standard time series methods. The analysis will depend actually on the particular assignment of numerical values.Discriminant analysis aims to examine the dependence of one qualitative (classification) variable from several quantitative variables according to number of variations of qualitative variable we can distinction. Actually, there is a discriminant analysis for two or more groups. The essential work of discriminant analysis is to get the optimal assigning rules that will minimize the likelihood of incorrect classification of elements. In this paper, we discussed the discriminant analysis of the first, second, third and fourth eigenvalues of variance covariance matrix of Fast Fourier Transform (FFT) for numerical values representation of DNA sequences of five organisms, Human, E. coli, Rat, Wheat and Grasshopper. The analysis is based on three methods (All Variables, Forward Selection and Backward Selection) of discrimination. Functions have been reached whereby discrimination is made among organisms under consideration. Empirical studies are conducted to show the value of our point of view and the applications based on. Therefore, we recommended that, other empirical studies should be done for other organisms and statistical methods by using the point of view adopted here. Also, aspects stated here must be used in an applied manner for DNA sequences discrimination.
    VL  - 8
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Mathematics Department, Education College, Al-Mustansiriya University, Baghdad, Iraq

  • Mathematics Department, Education College, Babylon University, Babil, Iraq

  • Section