Can there be a (Java 7) FileSystem for which a Path .isAbsolute() but has a null root?

Question

Well, there are some obscure things with file systems. I made a few enterprise search crawlers, and somewhere down the road you will notice some strange file system things going on with paths. BTW: these are all implementations of custom (overridden) file systems, so no standard ones, and you can definitely argue for hours on what of those things are good ideas and what are not… Still, I don’t think you’ll encounter any of these cases with the standard file systems.

Here goes a few examples of strange things:

Files in container file systems (OLE2, ZIP, TAR, etc): c:\foo\bar\blah.zip\myfile

In this case, you can decide what item is ‘the root’:

‘c:\’ ? That’s not the root of the zip file containing the file…
‘c:\foo\bar\blah.zip’ ? It might be the root of the file, but by doing that it might break your application.
‘blah.zip’ ? Might be the root of the zip file – but regardless this might probably break your application as well.
“https://stackoverflow.com/” ? As in the “https://stackoverflow.com/” folder in the zip file? It might be possible, but that will give you a serious headache in the long run.

‘graph’ like structures like HTTP:

The fact that you have ‘/foo/bar’ doesn’t imply that ‘/foo’ or even “https://stackoverflow.com/” exists. (Suppose that meets your criterium). The only thing you can do is walk the graph…
Note that protocols like WebDav are HTTP based and can give you a similar headache. I have some examples here of custom webdav file systems that don’t have a ‘root’ folder, but do have absolute paths.

Still, you can argue that the top-most common path (if that exists…) that you can reach is the root or that there is a root – but you simply cannot reach it (even though it’s really non-existent).

Samba/netbios

If you see a complete Samba (windows networking) network as a single file system, then you basically end up with a ‘root’ containing all workgroups, a workgroup containing all computers, a computer containing all shares, and then the files in the share.

However… the root and the workgroups don’t really exist. They are things that are made up from a broadcast protocol (which is also quite unreliable if you have a network of over 1000 computers). From a crawler perspective, it makes all the sense in the world to treat the ‘root’ and ‘workgroup’ directories completely different from the (reliable) rest.

However

These scenario’s describe only paths where the root is unreachable, unreliable or something else. Theoretically, I suppose that in any URL you can think of, there is always a root. After all, it’s made up as a string of characters defining a hierarchy, which therefore by definition has a start.

Leave a Comment Cancel reply