This looks like a really neat idea, but sadly I believe the worst case behaviour is O(n^2).
Here is my attempt at a counterexample. (I’m not a mathematician so please forgive my use of Python instead of equations to express my ideas!)
Consider the string with 4K+1 symbols
s="a"*K+'X'+'a'*3*K
This will have
borders[1:] = range(K)*2+[K]*(2*K+1)
ne_borders[1:] = [-1]*(K-1)+[K-1]+[-1]*K+[K]*(2*K+1)
Note that:
1) ne_borders[i] will equal K for (2K+1) values of i.
2) for 0<=j<=K, borders[j]=j-1
3) the final loop in your algorithm will go into the inner loop with j==K for 2K+1 values of i
4) the inner loop will iterate K times to reduce j to 0
5) This results in the algorithm needing more than N*N/8 operations to do a worst case string of length N.
For example, for K=4 it goes round the inner loop 39 times
s="aaaaXaaaaaaaaaaaa"
borders[1:] = [0, 1, 2, 3, 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4]
ne_borders[1:] = [-1, -1, -1, 3, -1, -1, -1, -1, 4, 4, 4, 4, 4, 4, 4, 4, 4]
For K=2,248 it goes round the inner loop 10,111,503 times!
Perhaps there is a way to fix the algorithm for this case?