The International Arab Journal of Information Technology (IAJIT)


Verification of Cooperative Transient Fault Diagnosis and Recovery in Critical

Zibouda Aliouat,
 The faults caused by ambient cosmic radiation are a growing threat to the dependability of advanced embedded computer systems. Maintaining availability and cons istency in distributed applications is one of the fundamental attribute in building complex critical systems. To achieve this, a key factor is the ability to detect the fault and handle it by means of recovery. Such systems can use membership protocols designed to provide this function. The objective of membership protocol is to give all entities of every node in the cluste r a consistent view of the system status, all withi n a pre)defined time. This paper describes a formal analysis of an extension of the group membership algorithm implemented in the time) triggered protocol. The proposed extension is to allow nodes reintegrat ion after transient fault. We provide a detailed analysis of properties of formal model of the algorithm. The paper is intende d to verify the safety and liveness properties that the protocol must satisfy. The correctness of the protocol is verified by the PVS theorem prover.

[1] Aliouat Z., Formal Analysis of Fault#Tolerant Algorithm in the Time#Triggered Architecture, Journal of Computer Science , vol. 3, no. 1, pp. 28#34, 2007.

[2] Anceaume E., Charron#Bost B., Minet P., and Toueg S., On the Formal Specification of Group Verification of Cooperative Transient Fault Diagnosis and Recovery in Critical Embedded Systems 381 Membership Services, Technical Report, Cornell University, Unit de Recherche INRIA Rocquencourt, 1995.

[3] Barbosa R. and Johan Karlsson J., Formal Specification and Verification of A Protocol for Consistent Diagnosis in Real#Time Embedded Systems, in Proceedings of the 3 rd IEEE International Symposium on Industrial Embedded Systems, France, pp. 216#223, 2008.

[4] Bauman R., Soft Errors in Advanced Computer Systems, IEEE Design and Test of Computers , vol. 22, no. 3, pp. 258#266, 2005.

[5] Bouajjani A. and Merceron A., Parametric Verification of a Group Membership Algorithm, Journal of Theory and Practice of Logic Programming Cambridge University , vol. 6, no. 3, pp. 321#353, 2006.

[6] Constantinescu C., Impact of Deep Submicron Technology on Dependability of VLSI Circuits, in Proceedings of The International Conference on Dependable Systems and Networks , pp. 205# 209, 2002.

[7] Cristian F., Reaching Agreement on Processor# Group Membership in Synchronous Distributed Systems, Distributed Computing , vol. 4, no. 4, pp. 175#187, 1991.

[8] Cristian F. and Schmuck F., Agreeing on Processor Group Membership in Timed Asynchronous Distributed Systems, Technical Report , University of California, 1995.

[9] Heiner G. and Thurner T., Time#Triggered Architecture for Safety#Related Distributed Real# Time Systems in Transportation Systems, in Proceedings of 28 th Annual International Symposium on Fault)Tolerant Computing , Germany, pp. 402#407, 1998.

[10] Kopetz H. and Bauer G., The Time#Triggered Architecture, in Proceedings of the IEEE Special Issue on Modeling and Design of Embedded Software , pp. 1#14, 2002.

[11] Owre S., Rushby J., Shankar N., and Henke F., Formal Verification for Fault#Tolerant Architectures: Prolegomena to the Design of PVS, IEEE Transactions on Software Engineering , vol. 21, no. 2, pp. 107#125, 1995.

[12] Pascoe J., Loader R., and Sunderam V., Working Towards the Agreement Problem Protocol Verification Environment , in Alan C., Majid M., and Henk M., (Eds.), Communication Process Architectures, IOS Press, UK, 2001.

[13] Pfeifer H., Formal Verification of the TTP Group Membership Algorithm, in Proceedings of Formal Methods for Distributed System Development , Tommaso B., and Diego L., (Eds.), Italy, pp. 3#18, 2000.

[14] Ramasamy H., Cukier M., and Sanders W., Formal Specification and Verification of A GMP for an Intrusion#Tolerant Group Communication System, in Proceedings of the Pacific Rim International Symposium on Dependable Computing , USA, pp. 9#18, 2002.

[15] Ricciardi A. and Birman K., Using Process Groups to Implement Failure Detection in Asynchronous Environments, in Proceedings of the 10 th ACM Symposium on Principles of Distributed Computing , NY, pp. 341#352, 1991.

[16] Schiper A., Dynamic Group Communication, Distributed Computing , vol. 18, no. 5, pp. 359# 374, 2006.

[17] Ziade H., Ayoubi R., and Velazco R., A Survey on Fault Injection Techniques, The International Arab Journal for Information Technology , vol. 1, no. 2, pp. 171#186, 2004. Zibouda Aliouat obtained her engineer diploma in 1984 and MSc in 1993 from Constantine University. She received her PhD from Setif University of Algeria. She was an assistant at Constantine University from 1985 to 1994. She is currently an assistant professor in Computer Engineering Department at Setif University of Alger ia. Her research interests are in the areas of wireless mobile networks modelling and simulation, wireless sensor networks and fault tolerance of embedded systems. Makhlouf Aliouat received the engineer diploma degree from Computer Science Department of Constantine University, Algeria in 1978, and the MS and PhD degrees from polytechnic national institute of Grenoble, France in 1983 and 1986 respectively. He also received the enabling t o conduct research diploma from Constantine University in 2008. He was assistant professor from 1986 to 1990 in Computer Science Department of Constantine University. Since 1995, he is associate professor in Computer Science Department of Ferhat Abb s University of S tif, Algeria. His areas of research are operating systems, distributed systems , wireless mobile computing and wireless network security.