Web scraping is an automated method used to extract large amounts of data from websites. The data on the websites are unstructured. Web scraping helps collect these unstructured data and store it in a structured form.
As we know, Python is has various applications and there are different libraries for different purposes. In our further demonstration, we will be using the following libraries:
- BeautifulSoup : (Documentation link)
- Beautiful Soup is a Python library for pulling data out of HTML and XML files.
- It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.
- Selenium: (Documentation link)
- The selenium module allows a Python program to directly control the browser with functions for clicking links and filling in textboxes like login information, almost as though there is a human user interacting with the page.
- Selenium allows us to interact with web pages in a much more advanced way than Requests and Beautiful Soup.
- Because it launches a web browser, it is a bit slower and hard to run in the background if, say, you just need to download some files from the Web.