Exploiting non-determinism for reliability of mobile agent systems
Abstract
An important technical hurdle blocking the adoption of mobile agent technology is the lack of reliability. Designing a reliable mobile agent system is especially challenging since a mobile agent is potentially affected by failure of any host that it visits, or failure of any communication link that it needs to traverse. Previous work in this domain has attempted techniques such as periodic checkpointing of mobile agent state and restarting upon machine or communication recovery. Such approaches render an agent unavailable until a machine or a communication link itself recovers. In this paper, we take an alternate approach based on the premise that a mobile agent can often complete its task in more than one way. We capture such redundancy in non-deterministic constructs in the agent language and maintain state about an agent's actual computational path in its possible computational tree. We design and implement a distributed recovery scheme that detects a failure, rolls back an agent's computation, and restarts the agent from a previous point in its computational tree down a different but equivalent computational path without waiting for the actual failure itself to be repaired.