How to extract a filename from a URL and append a word to it?

Question

You can use urllib.parse.urlparse with os.path.basename:

import os
from urllib.parse import urlparse

url = "http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg"
a = urlparse(url)
print(a.path)                    # Output: /kyle/09-09-201315-47-571378756077.jpg
print(os.path.basename(a.path))  # Output: 09-09-201315-47-571378756077.jpg

Your URL might contain percent-encoded characters like %20 for space or %E7%89%B9%E8%89%B2 for “特色”. If that’s the case, you’ll need to unquote (or unquote_plus) them. You can also use pathlib.Path().name instead of os.path.basename, which could help to add a suffix in the name (like asked in the original question):

from pathlib import Path
from urllib.parse import urlparse, unquote

url = "http://photographs.500px.com/kyle/09-09-2013%20-%2015-47-571378756077.jpg"
urlparse(url).path

url_parsed = urlparse(url)
print(unquote(url_parsed.path))  # Output: /kyle/09-09-2013 - 15-47-571378756077.jpg
file_path = Path("/home/ubuntu/Desktop/") / unquote(Path(url_parsed.path).name)
print(file_path)        # Output: /home/ubuntu/Desktop/09-09-2013 - 15-47-571378756077.jpg

new_file = file_path.with_stem(file_path.stem + "_small")
print(new_file)         # Output: /home/ubuntu/Desktop/09-09-2013 - 15-47-571378756077_small.jpg

Also, an alternative is to use unquote(urlparse(url).path.split("/")[-1]).

Leave a Comment Cancel reply