The best way to do this is use urllib.parse.
From the docs:
The module has been designed to match the Internet RFC on Relative
Uniform Resource Locators. It supports the following URL schemes:
file,ftp,gopher,hdl,http,https,imap,mailto,mms,news,nntp,
prospero,rsync,rtsp,rtspu,sftp,shttp,sip,sips,snews,svn,
svn+ssh,telnet,wais,ws,wss.
You’d want to do something like this using urlsplit and urlunsplit:
from urllib.parse import urlsplit, urlunsplit
split_url = urlsplit('http://127.0.0.1/asdf/login.php?q=abc#stackoverflow')
# You now have:
# split_url.scheme "http"
# split_url.netloc "127.0.0.1"
# split_url.path "/asdf/login.php"
# split_url.query "q=abc"
# split_url.fragment "stackoverflow"
# Use all the path except everything after the last "https://stackoverflow.com/"
clean_path = "".join(split_url.path.rpartition("https://stackoverflow.com/")[:-1])
# "/asdf/"
# urlunsplit joins a urlsplit tuple
clean_url = urlunsplit(split_url)
# "http://127.0.0.1/asdf/login.php?q=abc#stackoverflow"
# A more advanced example
advanced_split_url = urlsplit('http://foo:[email protected]:5000/asdf/login.php?q=abc#stackoverflow')
# You now have *in addition* to the above:
# advanced_split_url.username "foo"
# advanced_split_url.password "bar"
# advanced_split_url.hostname "127.0.0.1"
# advanced_split_url.port "5000"