The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


A Fault-Tolerant Routing Algorithm for 3-D Torus Interconnection Networks

This paper describes a new fault-tolerant routing algorithm for 3-D tori using the concept of “probability vectors”. To compute these vectors, a node determines first its faulty set, which represents the set of all its neighbouring nodes that are faulty or unreachable due to faulty links. Each node then calculates a probability vector, where the lth element represents the probability that a destination node at distance l cannot be reached through a minimal path due to a faulty node or link. The probability vectors are used by all the nodes to achieve an efficient fault-tolerant routing in the network. An extensive performance evaluation conducted in this study reveals that the proposed algorithm exhibits good fault-tolerance properties in terms of the achieved percentage of reachability and routing distances.

 


[1] Al-Sadi J., Day K., and Ould-Khaoua M., “Probability-based fault-tolerant routing in hypercubes,” in Proceedings of (Europar’2000), in Lecture Notes in Computer Science, Springer- Verlag, Munich, pp. 935-938, 2000.

[2] Al-Sadi J., Day K., and Ould-Khaoua M., “Fault- tolerant routing in hypercubes using probability vectors,” Parallel Computing, vol. 27, no. 10, pp. 1381-1399, 2001.

[3] Anderson E., Brooks J., Grassl C., and Scott S., “Performance of the Cray T3E multiprocessor,” in Proceedings of Supercomputing Conference, San Jose, California, 1997.

[4] Chen M. S. and Shin K. G., “Adaptive fault- tolerant routing in hypercube multicomputers,” IEEE Trans. Computers, vol. 39, no. 12, pp. 1406-1416, 1990.

[5] Chen M. S. and Shin K. G., “On hypercube fault- tolerant routing using global information,” in Proceedings of 4th Conf. Hypercube Concurrent Computers & Applications, pp. 83-86, 1989.

[6] Chen M. S. and Shin K. G., “Depth-first search approach for fault-tolerant routing in hypercube multicomputers,” IEEE Trans. Parallel & Distributed Systems, vol. 1, no. 2, pp. 152-159, 1990.

[7] Chiu G. M. and Chen K. S., “Fault-tolerant routing strategy using routing capability in hypercube multicomputers,” in Proceedings of Int. Conf. Parallel and Distributed Systems, pp. 396-403, 1996.

[8] Day K. and Al-Ayyoub A., “Fault Diameter of K- ary n-cube Networks,” IEEE Trans. Parallel & Distributed Systems, vol. 8, no. 9, pp. 903-907, 1997.

[9] Duato J., Yalamanchili S., and Ni L., Interconnection networks: An engineering approach, IEEE Computer Society Press, 1997.

[10] Gaughan P. T. and Yalamanchili S., “Adaptive routing protocols for hypercube interconnection networks,” IEEE Computer, vol. 26, no. 5, pp. 12-24, 1993.

[11] Graham S. and Seidel S., “The cost of broadcasting on star graphs and k-ary hypercubes,” IEEE Trans. Computers, vol. 42, no. 6, pp. 756-759, 1993.

[12] Gravano L. and Pifarre G., Berman P., and Sanz J., “Adaptive deadlock- and livelock-free routing with all minimal paths in torus networks,” IEEE Trans. Parallel & Distributed Systems, vol.5, no.12, pp. 1233-1251, 1994.

[13] Gordon J. M. and Stout Q. F., “Hypercube message routing in the presence of faults,” in Proceedings of 3rd Conf. Hypercube Concurrent Computers and Applications, Pasadena, California, pp. 251-263, 1988.

[14] Kessler R. E. and Schwarzmeier J. L., “CRAY T3D: A new dimension for Cray research,” in CompCon, Houston, Texas, pp. 176-182, 1993.

[15] Lan Y., “A fault-tolerant routing algorithm in hypercubes,” in Proceedings of Int. Conf. Parallel Processing, pp. 163-166, 1994.

[16] Lee T. C. and Hayes J. P., “A fault-tolerant communication scheme for hypercube computers,” IEEE Trans. Computers, vol. 41, no. 10, pp. 1242-1256, 1992.

[17] Linder D. and Harden J., “An adaptive and fault tolerant wormhole routing strategy for k-ary hypecubes,” IEEE Trans. Computers , vol. 40, no. 1, pp. 2-12, 1991.

[18] Ni L. M. and McKinley P. K., “A survey of routing techniques in wormhole networks,” IEEE Computer, vol. 26, no. 2, pp. 62-76, 1993.

[19] Noakes M. et al, “The J-machine multicomputer: An architectural evaluation,” in Proceedings of 20th Int. Symp, Computer Architecture, 1993.

[20] Nugent S. F., “The iPSC/2 Direct-Connect communication technology,” in Proceedings of Conf. on Hypercube Concurrent Computers & Applications, vol. 1, pp. 51-60, 1988.

[21] Raghavendra C. S., Yang P. J., and Tien S. B., “Free dimension: an effective approach to achieving fault tolerance in hypercubes,” in Proceedings of 22nd Int. Symp. on Fault-Tolerant Comp., pp. 170- 177, 1992.

[22] Saad Y. and Schultz M. H., Data communication in hypercubes, Technical Report YALEU/DCS/RR-428, Computer Science Dept., Yale Univ., 1985.

[23] Sheu J. P. and Su M.Y., “A multicast algorithm for hypercube multiprocessors,” in Proceedings of Int. Conf. Parallel Processing, pp. 18-22, 1992.

[24] Vanvoorst B., Seidel S., and Barscz E., “Workload of an iPSC/860,” in Proceedings of Scalable High-Performance Computing Conf., pp.221-228, 1994.

[25] Wu J., “Adaptive fault-tolerant routing in cube- based multicomputers using safety vectors,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 4, pp. 321-334, 1998.

[26] Wu J. and Fernandez E. B., “Broadcasting in faulty hypercubes,” in Proceedings of 11th Symp. Reliable Distributed Systems, pp. 122-129, 1992. A Fault-Tolerant Routing Algorithm for 3-D Torus Interconnection Networks 79 Jehad Al-Sadi received his BSc degree in computer science from Tennessee State University, USA, in 1989, MSc degree in computer science from Jackson State University, USA, in 1993, and PhD degree in computer science from University of Glasgow, UK. He is currently an assistant professor in the Department of Computer Science at Zarka Private University, Jordan. Khaled Day received an engineering degree in computer science from the University of Tunis in 1986 and the MSc and PhD degree from the University of Minnesota in 1989 and 1992, respectively. He is currently serving as associate professor and head of the Department of Computer Science of Sultan Qaboos University in Oman. His areas of interest include interconnection networks, parallel algorithms, distributed systems, and computer networks. He is a member of the IEEE. Mohamed Ould-Khaoua received his BSc degree from the University of Algiers, Algeria, in 1986, and the MSc and PhD degrees in computer science from the University of Glasgow, UK, in 1990 and 1994, respectively. He is currently a lecturer in the Department of Computer Science at the University of Glasgow, UK. His research focuses on applying theoretical results from stochastic processor and queuing theory to the quantitative study of hardware and software architectures. His currently research interest are performance modeling/evaluation of parallel and distributed systems, parallel algorithms, ATM and multimedia networks. He is a member of the IEEE-CS, BCS, and C.Eng.