The Ultimate scrape Google search results with Python Easily
The web is brimming with data. Data is the new fuel in the 21st century. Consistently colossal measures of information are transferred to the web. When talking about the web we can’t disregard the web monster Google. You most likely arrived at this post through Google. Google’s hunt calculation is surprisingly best with regards to returning us the most well-suited list items.
For different reasons, one would need to bring these outcomes. In spite of the fact that there is an official API from Google to do as such, there are a few confinements that accompany it. Here I might want to disclose to you a tried technique for how to get Google results utilizing Python3. You can get the whole venture on Github. Note: I have added extra functionalities to the task in Github.
For this task we are going to utilize the accompanying modules in Python3; Beautiful Soup, Requests. How about we begin!
Make a Python document and name it ‘googleSearch.py’. Import all the necessary libraries.
How about we characterize a capacity called scrape google search results which takes the google search inquiry as a parameter.
Presently, let us send a solicitation to the above question and bring the reaction into a variable ‘HTML’. We will likewise check for any system blunders and handle the special cases.
In the wake of getting the reaction on the off chance that it restores a status code of 200 beginning scratching utilizing Beautiful Soup.
On the off chance that you notice the rendered HTML of a Google search page. All the connections are <a> labels. Along these lines, let us discover all the <a> labels inside the parsed HTML.
Presently we have all the <a> labels on the query items page. Presently extricate the ‘href’ quality from the <a> labels.
Likewise, notice that we have brought all the <a> labels which may not really be query items. In this way, we use the re module to get just the query output URLs utilizing customary articulation.
We have utilized the re module first to remove just the href qualities with the example of a URL.
Later we have parsed the URL to discover if the URL has a place with google.com to avoid them.
On the off chance that the URL meets all our necessary conditions, we have added the URL to the rundown ‘g_clean’. After the fruition of the capacity, it returns us with a rundown of google list items dependent on the gave inquiry.
We have made a google indexed lists scrubber in python effectively.