Automating boring stuffs in easy steps with Python and Robot Framework
If you’ve ever spent hours renaming files or updating hundreds of spreadsheet cells, you know how tedious tasks like these can be, but what if you could have your computer do them for you?
Don’t spend your time doing work a well-trained monkey could do. Even if you’ve never written a line of code, you can make your computer do the grunt work.
What is Robot Framework?
Robot Framework is a generic open source automation framework. It can be used for test automation and robotic process automation (RPA). Robot frame work can be integrated with virtually any other tool to create powerful and flexible automation solutions.
Robot Framework has easy syntax, utilizing human-readable keywords. Its capabilities can be extended by libraries implemented with Python or Java. The framework has a rich ecosystem around it, consisting of libraries and tools that are developed as separate projects.
Robot Framework project is hosted on GitHub where you can find further documentation, source code, and issue tracker. Downloads are hosted at PyPI. Robot Framework is operating system and application independent. The core framework is implemented using Python and also runs on Jython (JVM) and IronPython (.NET).
Learn how to use Python to write programs that do in minutes what would take you hours to do by hand-no prior programming experience required. Once you’ve mastered the basics of programming, you’ll create Python programs that effortlessly perform useful and impressive feats of automation to:
- Search for text in a file or across multiple files
- Create, update, move, and rename files and folders
- Search the Web and download online content
- Update and format data in Excel spreadsheets of any size
- Split, merge, watermark, and encrypt PDFs
- Send reminder emails and text notifications
- Fill out online forms
How to install Robot Frame work?
we have made a video on how to install robot framework, watch this video to install robot frame work on your computer.
What is Web Scrapping?
In those rare, terrifying moments when I’m without Wi-Fi, I realize just how much of what I do on the computer is really what I do on the internet. Since so much work on a computer involves going on the internet, it’d be great if your programs could get online.
Web scraping is the term for using a program to download and process content from the web. For example, Google runs many web scraping programs to index web pages for its search engine.
Example of importing libraries from the vast ocean:-
Libraries in python is like watching the sky, what ever comes under the sun are libraries. Python has a vast ocean of libraries due the creative community. Here are some few examples.
Web browser
Comes with Python and opens a browser to a specific page.
The webbrowser module’s open() function can launch a new browser to a specified URL. Enter the following into the interactive shell:
>>> import webbrowser
>>> webbrowser.open(‘https://inventwithpython.com/')
A web browser tab will open to the URL https://inventwithpython.com/. This is about the only thing the webbrowser module can do.
Requests
Downloads files and web pages from the internet.
The requests module lets you easily download files from the web without having to worry about complicated issues such as network errors, connection problems, and data compression. The requests module doesn’t come with Python, so you’ll have to install it first. From the command line, run pip install — user requests.
Downloading a Web Page with the requests.get() Function
The requests.get() function takes a string of a URL to download. By calling type() on requests.get()’s return value, you can see that it returns a Response object, which contains the response that the web server gave for your request. I’ll explain the Response object in more detail later, but for now, enter the following into the interactive shell while your computer is connected to the internet:
>>> import requests
➊ >>> res = requests.get(‘https://automatetheboringstuff.com/files/rj.txt')
>>> type(res)
<class ‘requests.models.Response’>
➋ >>> res.status_code == requests.codes.ok
True
>>> len(res.text)
178981
>>> print(res.text[:250])
The URL goes to a text web page for the entire play of Romeo and Juliet, provided on this book’s site ➊. You can tell that the request for this web page succeeded by checking the status_code attribute of the Response object. If it is equal to the value of requests.codes.ok, then everything went fine ➋. (Incidentally, the status code for “OK” in the HTTP protocol is 200. You may already be familiar with the 404 status code for “Not Found.”) You can find a complete list of HTTP status codes and their meanings at https://en.wikipedia.org/wiki/List_of_HTTP_status_codes.
If the request succeeded, the downloaded web page is stored as a string in the Response object’s text variable. This variable holds a large string of the entire play; the call to len(res.text) shows you that it is more than 178,000 characters long. Finally, calling print(res.text[:250]) displays only the first 250 characters.
If the request failed and displayed an error message, like “Failed to establish a new connection” or “Max retries exceeded,” then check your internet connection. Connecting to servers can be quite complicated, and I can’t give a full list of possible problems here. You can find common causes of your error by doing a web search of the error message in quotes.
Saving Downloaded Files to the Hard Drive
>>> import requests
>>> res = requests.get(‘https://automatetheboringstuff.com/files/rj.txt')
>>> res.raise_for_status()
>>> playFile = open(‘RomeoAndJuliet.txt’, ‘wb’)
>>> for chunk in res.iter_content(100000):
playFile.write(chunk)
100000
78981
>>> playFile.close()
rom here, you can save the web page to a file on your hard drive with the standard open() function and write() method. There are some slight differences, though. First, you must open the file in write binary mode by passing the string ‘wb’ as the second argument to open(). Even if the page is in plaintext (such as the Romeo and Juliet text you downloaded earlier), you need to write binary data instead of text data in order to maintain the Unicode encoding of the text.
To write the web page to a file, you can use a for loop with the Response object’s iter_content() method.
bs4 (Beautiful Soup)
Parses HTML, the format that web pages are written in.
Beautiful Soup is a module for extracting information from an HTML page (and is much better for this purpose than regular expressions). The Beautiful Soup module’s name is bs4 (for Beautiful Soup, version 4). To install it, you will need to run pip install — user beautifulsoup4 from the command line. While beautifulsoup4 is the name used for installation, to import Beautiful Soup you run import bs4.
Dont use Regular expressions to parse html. Locating a specific piece of HTML in a string seems like a perfect case for regular expressions. However, I advise you against it. There are many different ways that HTML can be formatted and still be considered valid HTML, but trying to capture all these possible variations in a regular expression can be tedious and error prone. A module developed specifically for parsing HTML, such as bs4, will be less likely to result in bugs.
Creating a BeautifulSoup Object from HTML
The bs4.BeautifulSoup() function needs to be called with a string containing the HTML it will parse. The bs4.BeautifulSoup() function returns a BeautifulSoup object. Enter the following into the interactive shell while your computer is connected to the internet:
>>> import requests, bs4
>>> res = requests.get(‘https://nostarch.com')
>>> res.raise_for_status()
>>> noStarchSoup = bs4.BeautifulSoup(res.text, ‘html.parser’)
>>> type(noStarchSoup)
<class ‘bs4.BeautifulSoup’>
Enter the following into the interactive shell (after making sure the example.html file is in the working directory):
>>> exampleFile = open(‘example.html’)
>>> exampleSoup = bs4.BeautifulSoup(exampleFile, ‘html.parser’)
>>> type(exampleSoup)
<class ‘bs4.BeautifulSoup’>
Examples of CSS Selectors
Selector passed to the select() method
Will match . . .
soup.select(‘div’) All elements named <div>
soup.select(‘#author’) The element with an id attribute of author
soup.select(‘.notice’) All elements that use a CSS class attribute named notice
soup.select(‘div span’) All elements named <span> that are within an element named <div>
soup.select(‘div > span’) All elements named <span> that are directly within an element named <div>, with no other element in between
soup.select(‘input[name]’) All elements named <input> that have a name attribute with any value
soup.select(‘input[type=”button”]’) All elements named <input> that have an attribute named type with value button
selenium
Launches and controls a web browser. The selenium module is able to fill in forms and simulate mouse clicks in this browser.
The selenium module lets Python directly control the browser by programmatically clicking links and filling in login information, almost as though there were a human user interacting with the page. Using selenium, you can interact with web pages in a much more advanced way than with requests and bs4; but because it launches a web browser, it is a bit slower and hard to run in the background if, say, you just need to download some files from the web.
Still, if you need to interact with a web page in a way that, say, depends on the JavaScript code that updates the page, you’ll need to use selenium instead of requests. That’s because major ecommerce websites such as Amazon almost certainly have software systems to recognize traffic that they suspect is a script harvesting their info or signing up for multiple free accounts. These sites may refuse to serve pages to you after a while, breaking any scripts you’ve made. The selenium module is much more likely to function on these sites long-term than requests.
Starting a selenium-Controlled Browser
The following examples will show you how to control Chrome web browser. You can install selenium by running pip install — user selenium from a command line terminal.
>>> from selenium import webdriver
>>> browser = webdriver.chrome()
>>> type(browser)
<class ‘selenium.webdriver.chrome.webdriver.WebDriver’>
>>> browser.get(‘https://www.youtube.com')
You’ll notice when webdriver.chrome() is called, the Chrome web browser starts up. Calling type() on the value webdriver.chrome() reveals it’s of the WebDriver data type. And calling browser.get(‘https://inventwithpython.com') directs the browser to https://youtube.com/. Your browser should look something like
If you encounter the error message “‘geckodriver’ executable needs to be in PATH.”, then you need to manually download the webdriver for Chrome before you can use selenium to control it. You can also control browsers other than Chrome if you install the webdriver for them.
For Chrome, go to https://sites.google.com/a/chromium.org/chromedriver/downloads and download the ZIP file for your operating system. This ZIP file will contain a chromedriver.exe (on Windows) or chromedriver (on macOS or Linux) file that you can put on your system PATH.
If you still have problems opening up a new browser under the control of selenium, it may be because the current version of the browser is incompatible with the selenium module. One workaround is to install an older version of the web browser — or, more simply, an older version of the selenium module. You can find the list of selenium version numbers at https://pypi.org/project/selenium/#history. Unfortunately, the compatibility between versions of selenium and a browser sometimes breaks, and you may need to search the web for possible solutions.
Nonetheless, if you are still a beginner, or you want to learn Python to automate boring things, or to land in some well paid jobs using Python, you can check our YOUTUBE CHANNEL CODERS ARCADE, in which we have covered all the topics related to Python from zero to hero for beginners