Selenium headless: How to bypass Cloudflare detection using Selenium

Using the latest Google Chrome v96.0 if you retrive the useragent

  • For the google-chrome browser the following user-agent is in use:

    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36
    
  • Where as for google-chrome-headless browser the following user-agent is in use:

    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/96.0.4664.110 Safari/537.36
    

In majority of the cases the presence of the additional Headless string/parameter/attribute is intercepted as a bot and cloudflare blocks the access to the website.


Solution

There are different approaches to evade the Cloudflare detection even using Chrome in headless mode and some of the efficient approaches are as follows:

  • An efficient solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context. undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.

    • Code Block:

      import undetected_chromedriver as uc
      from selenium import webdriver
      
      options = webdriver.ChromeOptions() 
      options.headless = True
      options.add_argument("start-maximized")
      options.add_experimental_option("excludeSwitches", ["enable-automation"])
      options.add_experimental_option('useAutomationExtension', False)
      driver = uc.Chrome(options=options)
      driver.get('https://bet365.com')
      

You can find a couple of relevant detailed discussions in:

  • Selenium app redirect to Cloudflare page when hosted on Heroku
  • Is there any possible ways to bypass cloudflare security checks?
  • The most efficient solution would be to use Selenium Stealth to initialize the Chrome Browsing Context. selenium-stealth is a python package to prevent detection. This programme tries to make python selenium more stealthy.

    • Code Block:

      from selenium import webdriver
      from selenium_stealth import stealth
      
      options = webdriver.ChromeOptions()
      options.add_argument("start-maximized")
      options.add_argument("--headless")
      options.add_experimental_option("excludeSwitches", ["enable-automation"])
      options.add_experimental_option('useAutomationExtension', False)
      driver = webdriver.Chrome(options=options, executable_path=r"C:\path\to\chromedriver.exe")
      
      stealth(driver,
              languages=["en-US", "en"],
              vendor="Google Inc.",
              platform="Win32",
              webgl_vendor="Intel Inc.",
              renderer="Intel Iris OpenGL Engine",
              fix_hairline=True,
              )
      
      driver.get("https://bot.sannysoft.com/")
      

You can find a couple of relevant detailed discussions in:

  • Can a website detect when you are using Selenium with chromedriver?
  • How to automate login to a site which is detecting my attempts to login using
    selenium-stealth

Leave a Comment