Which real-time priority is the highest priority in Linux

I did an experiment to nail this down, as follows:

  • process1: RT priority = 40, CPU affinity = CPU 0. This process “spins” for 10 seconds so it won’t let any lower-priority process run on CPU 0.

  • process2: RT priority = 39, CPU affinity = CPU 0. This process prints a message to stdout every 0.5 second, sleeping in between. It prints out the elapsed time with each message.

I’m running a 2.6.33 kernel with the PREEMPT_RT patch.

To run the experiment, I run process2 in one window (as root) and then start process1 (as root) in another window. The result is process1 appears to preempt process2, not allowing it to run for a full 10 seconds.

In a second experiment, I change process2’s RT priority to 41. In this case, process2 is not preempted by process1.

This experiment shows that a larger RT priority value in sched_setscheduler() has a higher priority. This appears to contradict what Michael Foukarakis pointed out from sched.h, but actually it does not. In sched.c in the kernel source, we have:

static void
__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
{
        BUG_ON(p->se.on_rq);

        p->policy = policy;
        p->rt_priority = prio;
        p->normal_prio = normal_prio(p);
        /* we are holding p->pi_lock already */
        p->prio = rt_mutex_getprio(p);
        if (rt_prio(p->prio))
                p->sched_class = &rt_sched_class;
        else
                p->sched_class = &fair_sched_class;
        set_load_weight(p);
}

rt_mutex_getprio(p) does the following:

return task->normal_prio;

While normal_prio() happens to do the following:

prio = MAX_RT_PRIO-1 - p->rt_priority;  /* <===== notice! */
...
return prio;

In other words, we have (my own interpretation):

p->prio = p->normal_prio = MAX_RT_PRIO - 1 - p->rt_priority

Wow! That is confusing! To summarize:

  • With p->prio, a smaller value preempts a larger value.

  • With p->rt_priority, a larger value preempts a smaller value. This is the real-time priority set using sched_setscheduler().

Leave a Comment