You can use urllib.parse.urlparse
with os.path.basename
:
import os
from urllib.parse import urlparse
url = "http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg"
a = urlparse(url)
print(a.path) # Output: /kyle/09-09-201315-47-571378756077.jpg
print(os.path.basename(a.path)) # Output: 09-09-201315-47-571378756077.jpg
Your URL might contain percent-encoded characters like %20
for space or %E7%89%B9%E8%89%B2
for “特色”. If that’s the case, you’ll need to unquote
(or unquote_plus
) them. You can also use pathlib.Path().name
instead of os.path.basename
, which could help to add a suffix in the name (like asked in the original question):
from pathlib import Path
from urllib.parse import urlparse, unquote
url = "http://photographs.500px.com/kyle/09-09-2013%20-%2015-47-571378756077.jpg"
urlparse(url).path
url_parsed = urlparse(url)
print(unquote(url_parsed.path)) # Output: /kyle/09-09-2013 - 15-47-571378756077.jpg
file_path = Path("/home/ubuntu/Desktop/") / unquote(Path(url_parsed.path).name)
print(file_path) # Output: /home/ubuntu/Desktop/09-09-2013 - 15-47-571378756077.jpg
new_file = file_path.with_stem(file_path.stem + "_small")
print(new_file) # Output: /home/ubuntu/Desktop/09-09-2013 - 15-47-571378756077_small.jpg
Also, an alternative is to use unquote(urlparse(url).path.split("/")[-1])
.