Priority inversion problem
Mars pathfinder is the first mission of NASA’s Discovery program for investigating the atmosphere and other factors of Mars. Although the project was successful from scientific point of view, its path to success was not smooth at the beginning. The system suffered from priority inversion problem – a real-time synchronization problem – soon after it was landed on Mars on July 4, 1997. The spacecraft kept resetting itself due to the priority inversion problem and caused significant delay in collecting scientific data from the surface of the red planet. However, the problem was mitigated in few days by patching the onboard software from Earth. In this article on the priority inversion problem, we will look into its consequences, debugging, correcting the problem.
In a previous article, I had discussed about the working of a semaphore and in the same post, I had discussed about a situation called ‘The unavoidable blocking.’ When a low prio job is in its critical section, it acquires the semaphore for the shared resource. Until the semaphore is released, no other job, even jobs with higher priority than the current job can pre-empt it to gain control of the shared resource. The higher prio job is given the semaphore only after the semaphore is released by the previous job.
Now consider three jobs J1, J2 and J3 such that prior(J1)>prior(J2)>prio(J3). Assume that job J1 and J3 shares a resource and J2 does not share any resource with other tasks. The following scenario in Figure 3 is an example of priority inversion: a medium priority task is prioritized over a higher priority task while a lower priority task blocks the higher priority task.
Job J3 starts it normal execution at time t0, accesses the shared resource at time t1 and starts execution in the critical section. At time t2, a higher priority job J1 arrives and preempts job J3 and executes up to time t3 without requiring the shared resource. At time t3, job J1 attempts to access the resource and is blocked since J3 has the resource. Then, job J3 starts execution of the critical section using the shared resource. However, at time t4, a medium priority job J2 arrives, preempts job J3 and executes its normal execution. The medium priority job does not share any resource with any other jobs. Now while the medium priority job J2 is executing, the highest priority job J1 is blocked. Job J2 can block J1 arbitrary long even though job J1 has higher priority and does not share any resource with J2. It feels that priorities of J1 and J2 are inverted and this is called the priority inversion problem.
A protocol to resolve the priority inversion problem is proposed by Sha et al. in the paper – ‘L. Sha, R. Rajkumar, and J. P. Lehoczky. 1990. Priority Inheritance Protocols: An Approach to Real-Time Synchronization. IEEE Trans. Comput. 39, 9 (September 1990), 1175-1185.’
This is called the priority inheritance protocol (PIP): a lower priority task temporarily inherits the highest priority of all the blocking tasks. In other words, when a lower priority task blocks a higher priority task, it temporality assumes the priority of the higher priority blocked task. Figure 4 shows how PIP solves the priority inversion problem.
When the medium-priority job J2 arrives at time t4, job J2 cannot preempt the execution of job J3 since J3 has already assumed (inherited) the highest priority when blocked job J1 at time t3. After finishing critical section execution of job J3 at time at time t 5, the higher priority job J1 can access the semaphore and can finish its execution. The highest priority job J1 is not blocked when the medium priority task is executed.
However, PIP can lead to deadlock. We will not go into the details of such issues because the basic principle of PIP is enough to understand the priority inversion problem in Mars Pathfinder.
This article is an excerpt from the 'Report for the Seminar Series on Software Failures' by Risat Mahmud Pathan
Other posts by Vivek