Automate Web Scraping and Testing with Python Selenium Recipes
Many automation libraries exist for python which help scrape sites and perform testing. Projects like Roboform, BeautifulSoup and requests all provide excellent features.
One slightly lesser known name to Python projects is Selenium. It’s an industry accepted automation and testing framework for UI’s. Unlike other web browsing/scraping libraries it actually plays out inside a browser as though the system was being interacted with.
A major hurdle in web scraping can be how JS causes the site to change after loading. We might find that a request response being parsed in something like BeautifulSoup is missing lots of parts. This is because sites that use JS need a browser such as chrome to actually run the JS code.
This is where Selenium comes into its own as it actually runs a browser window and sends our commands to it.
Here is a visual example of Selenium running through the JsonPlacerholder UI
An entire ecosystem exists for Selenium and its possible to make a career out of being good at this kind of automation and testing. There are UI toolkits, a bespoke IDE and even a custom language which can be used to write tests.
However, we are going to completely skip all this and get straight into the Python API.
Install Chrome and the Chrome Driver
The first step in automating our Python UI testing is getting a supported browser and its associated driver.
For our examples we will be using chrome so get chrome installed first.
Once in chrome we need to get the version by visiting chrome://settings/help
Now head over to Chrome Driver and get the latest driver for our OS:
As the installation instructions vary by OS we wont go into detail on individual process but download the driver, unzip the contents and add it to the system path.
For Windows it will involve unzipping into something like:
C:\Program Files (x86)\ChromeDriver
Then adding to the path via the System Properties > Environment Variables window.
For OSX and other Unix based operating systems simply copy the binary from the download to the /usr/local/bin folder.
For a more comprehensive guide on Selenium and the installation process the see the docs:
Hello World of Selenium
In the following recipe we see how to perform simple interactions with a basic site https://jsonplaceholder.typicode.com/
The below recipe illustrates how to perform the actions from the GIF in the intro. We see how along the way we capture screenshots to create the GIF using the driver.save_screenshot(…) method.
We first setup the Options for Chrome. This is just a way of passing arguments.
We create the Chrome driver, this will give us our Chrome window.
We head over to https://jsonplaceholder.typicode.com/
We scroll down the page so the element with ID run-message is in the window. This isn’t necessary for automation or testing but it is for the screenshot to actually show changes.
We click on the element with ID run-button this triggers the demo for the page, we then wait 1 second for the ajax request to fire and come back before testing the result.
Next we see how we can access whats displayed in the browser in our Python, we grab the text from the element with ID run-message and test it against a known text, this causes True to be printed to the console if there is a match, otherwise it will print False.
This page will be updated with additional recipes.