Updated and verified
Of key importance to testing this is that the implementation doesn’t support deleting a nonexistent or previously deleted node! I spent way too long trying to figure out why my working solution was “broken”. This can be fixed by doing a preliminary search for the key and returning false if it’s not in the tree at all, and that solution was employed in the linked code at the bottom.
It doesn’t appear Sedgewick wrote a deletion for 2-3-4 deletion that is publicly available. His results specifically deal with 2-3 trees (he only makes cursory mention of 2-3-4 trees in that their average path length (and thus search cost), as well as that of other red-black trees, is indistinguishable from the 2-3 case). Nobody else seems to have one easily found, either, so here’s what I found after debugging the problem:
To begin, take Sedgewick’s code and fix the out of date bits. In the slides here (pg 31) you can see that his code still uses the old representation of 4 nodes where it was done by having two left reds in a row, rather than balance. The first bit to write a 2-3-4 deletion routine, then, is to fix this so that we can do a sanity check which will help us verify our fixes later:
private boolean is234(Node x)
{
if (x == null)
return true;
// Note the TD234 check is here because we also want this method to verify 2-3 trees
if (isRed(x.right))
return species == TD234 && isRed(x.left);
if (!isRed(x.right))
return true;
return is234(x.left) && is234(x.right);
}
Once we have this, we know a couple things. One, from the paper we see that 4 nodes should not be broken on the way up when using a 2-3-4 tree. Two, there’s a special case for a right 4-node on the search path. There’s a third special case that isn’t mentioned, and that is when you are going back up a tree, you may end up where you have h.right.left
be red, which would leave you invalid with just a rotate left. This is the mirror of the case described for insert on page 4 of the paper.
The rotation fix for a 4-node you need is as follows:
private Node moveRedLeft(Node h)
{ // Assuming that h is red and both h.left and h.left.left
// are black, make h.left or one of its children red.
colorFlip(h);
if (isRed(h.right.left))
{
h.right = rotateRight(h.right);
h = rotateLeft(h);
colorFlip(h);
if (isRed(h.right.right) )
h.right = rotateLeft(h.right);
}
return h;
}
And this removes the splitting on 2-3-4, as well as adds the fix for the third special case
private Node fixUp(Node h)
{
if (isRed(h.right))
{
if (species == TD234 && isRed(h.right.left))
h.right = rotateRight(h.right);
h = rotateLeft(h);
}
if (isRed(h.left) && isRed(h.left.left))
h = rotateRight(h);
if (species == BU23 && isRed(h.left) && isRed(h.right))
colorFlip(h);
return setN(h);
}
Finally, we need to test this and make sure it works. They don’t have to be the most efficient, but as I found during the debugging of this, they have to actually work with the expected tree behavior (i.e. not insert/delete duplicate data)! I did this with a test helper methods. The commented lines were there for when I was debugging, I’d break and check the tree for obvious imbalance. I’ve tried this method with 100000 nodes, and it performed flawlessly:
public static boolean Test()
{
return Test(System.nanoTime());
}
public static boolean Test(long seed)
{
StdOut.println("Seeding test with: " + seed);
Random r = new Random(seed);
RedBlackBST<Integer, Integer> llrb = new RedBlackBST<Integer,Integer>(TD234);
ArrayList<Integer> treeValues = new ArrayList<Integer>();
for (int i = 0; i < 1000; i++)
{
int val = r.nextInt();
if (!treeValues.contains(val))
{
treeValues.add(val);
llrb.put(val, val);
}
else
i--;
}
for (int i = 0; i < treeValues.size(); i++)
{
llrb.delete(treeValues.get(i));
if (!llrb.check())
{
return false;
}
// StdDraw.clear(Color.GRAY);
// llrb.draw(.95, .0025, .008);
}
return true;
}
The complete source can be found here.