Python API test automation framework (Part 4) Working with XML using python lxml

Logos in header image sources: Python, Requests, JSON, HTTP

This is fourth post in a series on how to build an API framework using python.

You can read previous parts below:

Any API framework would be incomplete without having the ability to deal with XML responses and requests.

You might primarily need this if you are automating a SOAP (Simple object access protocol) based services in your project or if you choose to use XML as a data format for configuration, test data and what not.

Though JSON, YAML are probably a more reasonable bet for this. XML is still quite a popular data format

Regardless of your use case/requirements.

Let’s see how can we work with XML

Introducing lxml

Let’s get started.

To set up, Ensure you have installed it in your pipenv using:

pipenv install lxml

An example

I’ve created a dummy service called covid_tracker.py which is a python flask API to return a canned and static XML response

To ensure the service is running, execute below commands

# cd to dir
cd people-api
# activate pipenv and ensure all dependencies are installed
pipenv shell
pipenv install
# Run the local flask service
python covid_tracker/covid_tracker.py

Below is the cURL for this:

curl --location --request GET 'http://localhost:3000/api/v1/summary/latest'

And this would return a response like below:

<?xml version="1.0" encoding="UTF-8" ?>
<root>
<status>200</status>
<type>stack</type>
<data>
<summary>
<total_cases>69169558</total_cases>
<active_cases>19895522</active_cases>
<deaths>1574941</deaths>
<recovered>47699103</recovered>
<critical>104419</critical>
<tested>1003760026</tested>
<death_ratio>0.022769279514551762</death_ratio>
<recovery_ratio>0.6895967587359746</recovery_ratio>
</summary>
<change>
<total_cases>653173</total_cases>
<active_cases>142334</active_cases>
<deaths>12042</deaths>
<recovered>498799</recovered>
<critical>164</critical>
<tested>13146244</tested>
<death_ratio>-0.00004130805512226471</death_ratio>
<recovery_ratio>0.0007060065458233122</recovery_ratio>
</change>
<generated_on>1607547603</generated_on>
<regions>
<usa>
<name>USA</name>
<iso3166a2>US</iso3166a2>
<iso3166a3>USA</iso3166a3>
<iso3166numeric></iso3166numeric>
<total_cases>15740193</total_cases>
<active_cases>6277786</active_cases>
<deaths>295403</deaths>
<recovered>9167004</recovered>
<critical>26975</critical>
<tested>212565283</tested>
<death_ratio>0.018767431886000382</death_ratio>
<recovery_ratio>0.5823946377277585</recovery_ratio>
<change>
<total_cases>222261</total_cases>
<active_cases>84705</active_cases>
<deaths>2828</deaths>
<recovered>134728</recovered>
<death_ratio>-0.00008656231889753868</death_ratio>
<recovery_ratio>0.0003405341268405415</recovery_ratio>
</change>
</usa>

Let’s say, hypothetically we want to check that this API returns a valid no greater than a million of total worldwide cases and write a test for this.

Below is a test that achieves this.

import requests
from assertpy import assert_that
from lxml import etree

from config import COVID_TRACKER_HOST
from utils.print_helpers import pretty_print


def test_covid_cases_have_crossed_a_million():
response = requests.get(f'{COVID_TRACKER_HOST}/api/v1/summary/latest')
pretty_print(response.headers)

response_xml = response.text
xml_tree = etree.fromstring(bytes(response_xml, encoding='utf8'))

# use .xpath on xml_tree object to evaluate the expression
total_cases = xml_tree.xpath("//data/summary/total_cases")[0].text
assert_that(int(total_cases)).is_greater_than(1000000)

Let’s break it down and understand whats happening here.

We make an HTTP Get call to our GET API /api/v1/summary/latest and get the response XML in text format.

response = requests.get(f'{COVID_TRACKER_HOST}/api/v1/summary/latest')
response_xml = response.text

Next, to make use of this XML response, we need to deserialize (i.e. string to python object) it into a ElementTree object

Element tree belongs to the lxml library.

This can be done with below:

tree = etree.fromstring(bytes(response_xml, encoding='utf8'))

📝 Its important to provide fromstring() data in bytes format with UTF-8 encoding since without that it would give error like: ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

tree is now an object representation of the XML string and we can then use node.xpath('<your_xpath_expression>') to get the required node which we want to process.

In our current case we want the total_cases node under the summary section.

We can get that using relative XPath expression as follows. If you are unfamiliar with XPath syntax, you refer to this tutorial on w3schools.com

total_cases = tree.xpath("//data/summary/total_cases")[0].text

To get the text in the first node we use [0].text property

And finally now that we have the desired node, we could assert as follows

assert_that(int(total_cases)).is_greater_than(1000000)

Another way to work with XPath using lxml

Let’s say we want to assert that the total cases worldwide is greater than the total of cases across countries.

Below is the test, we could write for this:

def test_overall_covid_cases_match_sum_of_total_cases_by_country():
response = requests.get(f'{COVID_TRACKER_HOST}/api/v1/summary/latest')
pretty_print(response.headers)

response_xml = response.text
xml_tree = etree.fromstring(bytes(response_xml, encoding='utf8'))

overall_cases = int(xml_tree.xpath("//data/summary/total_cases")[0].text)
# Another way to specify XPath first and then use to evaluate
# on an XML tree
search_for = etree.XPath("//data//regions//total_cases")
cases_by_country = 0
for region in search_for(xml_tree):
cases_by_country += int(region.text)

assert_that(overall_cases).is_greater_than(cases_by_country)

First few lines should be familiar now, Notice we use:

search_for = etree.XPath("//data//regions//total_cases")

Which gives us an XPath object but does not evaluate it as that point itself.

We make use of it to get a list of elements from the XPath and then use a loop to get the total for that specific region

cases_by_country = 0
for region in search_for(xml_tree):
cases_by_country += int(region.text)

And finally we can assert:

When I run this test, I can see it fail:

>       assert_that(overall_cases).is_greater_than(cases_by_country)
E AssertionError: Expected <69169558> to be greater than <69822731>, but was not.

Which means that there is data mismatch bug 🐛 in this data set

Conclusion

Originally published at https://automationhacks.io on December 17, 2020.

Manager SDET at Gojek, Bengaluru, I ❤️ to build scalable test automation frameworks and teams. Blog at automationhacks.io 🇮🇳

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store