..............................
            ..............................
            ..............................
            
Gene Expression Prediction Using Deep Neural Networks
        
        In the  field  of  molecular  biology,  gene  expression  is  a  term  that  encompasses  all  the  information  contained  in  an 
organism’s genome. Although, researchers have developed several clinical techniques to quantitatively measure the expressions 
of  genes  of  an  organism,  they  are  too  costly  to  be  extensively  used.  The  NIH  LINCS  program  revealed  that  human  gene 
expressions are  highly correlated. Further research at the University of California, Irvine  (UCI) led to the development of D-
GEX, a Multi Layer Perceptron (MLP) model that was trained to predict unknown target expressions from previously identified 
landmark  expressions. But,  bowing to hardware  limitations, they had split the  target  genes into different sets and constructed 
separate  models  to  profile  the  whole  genome.  This  paper  proposes  an  alternative  solution  using  a  combination  of  deep 
autoencoder  and  MLP  to  overcome  this  bottleneck  and  improve  the  prediction  performance.  The  microarray  based  Gene 
Expression Omnibus (GEO) dataset was employed to train the neural networks. Experimental result shows that this new model, 
abbreviated  as  E-GEX,  outperforms  D-GEX  by  16.64%  in  terms  of  overall  prediction  accuracy  on  GEO  dataset.  The  models 
were further tested on an RNA-Seq based 1000G dataset and E-GEX was found to be 49.23% more accurate than D-GEX.    
            [1] Arel I., Rose D., and Karnowski T., “Deep Machine Learning-A New Frontier in Artificial Intelligence Research,” IEEE Computational Intelligence Magazine, vol. 5, no. 4, pp. 13-18 2010.
[2] Baldi P., “Autoencoders, Unsupervised Learning, and Deep Architectures,” in Proceedings of ICML Workshop on Unsupervised and Transfer Learning, Washington, pp. 37-50, 2012.
[3] Baldi P. and Sadowski P., “Understanding Dropout,” in Proceedings of Neural Information Processing Systems, pp. 2814-2822, 2013.
[4] Bansal M., Belcastro V., Ambesi-Impiombato A., and Bernardo D., “How to infer gene networks from expression profiles,” Molecular Systems Biology, vol. 3, no. 78, pp. 1-10, 2007.
[5] Bengio Y., “Learning Deep Architectures for AI,” Foundations and Trends® in Machine Learning, vol. 2, no. 1, pp. 1-127, 2009.
[6] Bengio Y., Courville A., and Vincent P., “Representation Learning: A Review and New Perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, 2013.
[7] Caruana R., Lawrence S., and Giles L., “Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping,” Advances in Neural Information Processing Systems, pp. 402-408, 2001.
[8] Chen Y., Li Y., Narayan R., Subramanian A., and Xie X., “Gene Expression Inference with Deep Learning,” Bioinformatics, vol. 32, no. 12, pp. 1832-1839, 2016.
[9] Chen L., Villa O., Krishnamoorthy S., and Gao G., “Dynamic Load Balancing on Single- And Multi- GPU Systems,” in Proceedings of IEEE International Parallel and Distributed Processing Symposium, Georgia, pp. 1-12, 2010.
[10] Clevert D., Unterthiner T., and Hochreiter S., “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUS),” arXiv preprint arXiv:1511.07289, pp. 1-14, 2015.
[11] De Sousa C., “An Overview on Weight Initialization Methods for Feedforward Neural Networks,” in Proceedings of the International Joint Conference on Neural Networks, pp. 52-59, 2016.
[12] Edgar R., Domrachev M., and Lash A., “Gene Expression Omnibus: NCBI Gene Expression and Hybridization Array Data Repository,” Nucleic Acids Research, vol. 30, no. 1, pp. 207-210, 2002.
[13] Forgy E., “Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of Classifications,” Biometrics, vol. 21, pp. 768-769 1965.
[14] Géron A., Hands-On Machine Learning with Scikit-Learn and TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems, O’Reilly Media, 2019.
[15] Glorot X. and Bengio Y., “Understanding the Difficulty of Training Deep Feedforward Neural Networks,” in Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, Sardinia, pp. 249-256, 2010.
[16] Glorot X., Bordes A., and Bengio Y., “Deep Sparse Rectifier Neural Networks,” in Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, FL, pp. 315-323, 2011.
[17] Huang G., “Learning Capability and Storage Capacity of Two-Hidden-Layer Feedforward Networks,” IEEE Transactions on Neural Networks, vol. 14, no. 2, pp. 274-281, 2003.
[18] Ioffe S. and Szegedy C., “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv preprint arXiv:1502.03167, pp. 137-141, 2015.
[19] Kassam S., “Quantization Based on the Mean- Absolute-Error Criterion,” IEEE Transactions on Communications, vol. 26, no. 2, pp. 267-270, 1978.
[20] Lamb J., Crawford E., Peck D., Model J., Blat I., Wrobel M., Lerner J., Brunet J., Subramanian A., Ross K., Reich M., Hieronymus H., Wei G., Armstrong S., Haggarty S., Clemons P., Wei R., Carr S., Lander E., and Golub T., “The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, And Disease,” Science, vol. 313, pp. 1929-1935 2006.
[21] Lecun Y., Bengio Y., and Hinton G., “Deep Learning,” Nature, vol. 521, no. 7553, pp. 436- 444, 2015.
[22] Le Q., Ranzato M., Monga R., Devin M., Chen K., Corrado G., Dean J., and Ng A., “Building High- Level Features Using Large Scale Unsupervised Learning,” in Proceedings of the 29th International Conference on Machine Learning, Scotland, pp. 8595-8598, 2011. Gene Expression Prediction Using Deep Neural Networks 431
[23] Lin C., Jain S., Kim H., and Bar-Joseph Z., “Using Neural Networks For Reducing The Dimensions Of Single-Cell RNA-Seq Data,” Nucleic Acids Research, vol. 45, no. 17, pp. 1-11, 2017.
[24] Nesterov Y., “A Method of Solving A Convex Programming Problem with Convergence Rate O(1/k^2),” Doklady Mathematics, vol. 27, no. 2, pp. 372-376, 1983.
[25] NIH LINCS Program. http://lincsproject.org/, Available at: www.lincsproject.org, Last Visited, 2018.
[26] Pierson E. and Yau C., “ZIFA: Dimensionality Reduction for Zero-Inflated Single-Cell Gene Expression Analysis,” Genome Biology, vol. 16, no. 1, 2015.
[27] Polyak B., “Some Methods of Speeding Up the Convergence of Iteration Methods,” USSR Computational Mathematics and Mathematical Physics, vol. 4, no. 5, pp. 1-17, 1964.
[28] Rumelhart E., Hinton E., and Williams J., “Learning Representations by Back-Propagating Errors,” Nature, vol. 323, no. 6088, pp. 533-536, 1986.
[29] Senior A., Heigold G., Ranzato M., and Yang K., “An Empirical Study of Learning Rates in Deep Neural Networks for Speech Recognition,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 6724-6728, 2013.
[30] Srivastava N., Hinton G., Krizhevsky A., Sutskever I., and Salakhutdinov R., “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
[31] Vincent P. and Larochelle H., “Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion Pierre-Antoine Manzagol,” Journal of Machine Learning Research, vol. 11, pp. 3371- 3408, 2010.
[32] Vincent P., Larochelle H., Bengio Y., and Manzagol P., “Extracting And Composing Robust Features With Denoising Autoencoders,” in Proceedings of The 25th International Conference on Machine Learning, NY, pp. 1096-1103, 2008. Raju Bhukya has received his B.Tech in Computer Science and Engineering from Nagarjuna University in the year 2003, M.Tech degree in Computer Science and Engineering from Andhra University in the year 2005 and P.hD in Computer Science and Engineering from National Institute of Technology (NIT) Warangal in the year 2014. He is currently working as an Assistant Professor in the Department of Computer Science and Engineering in National Institute of Technology, Warangal, Telangana, India. He is currently working in the areas of Bio-Informatics and Data Mining. Achyuth Ashok is M.Tech student of CSE department at NIT Warangal. He has interest in analyzing information contained in genome sequences using deep learning to predic DNA sequences and time involved in DNA Sequence Analysis.
