python - How to extract a Google link's href from search results with Selenium? -
ultimately trying href of first link google's search result
the information need exists in 'a' element, stored in 'data-href' attribute, not figure how extract data (get_attribute('data-href')
returns none
).
i using phantomjs, have tried firefox web driver
the href displayed in cite
tag in google search (which can found inspecting small green link text under each link in google search results).
the cite element apparently found selenium, text returned (element.text
, or get_attribute('innerhtml')
, or (text
)) not shown in html.
for instance, there cite tag <cite class="_rm">www.fcv.org.br/</cite>
, element.text
shows “wikimapia.org/.../fundação-cristiano-varella-hospital...”
i have tried retrieve cite element by_css_selector
, tag_name
, class_name
, , xpath same results.
links = driver.find_elements_by_css_selector('div.g') # div[class="g"] link = links[0] # looking first link in main links section next = link.find_element_by_css_selector('div[class="s"]') # location of cite tag nextb = next.find_element_by_tag_name('cite')
div containing cite tag (there 1 in div)
<div class="s"> <div> <div class="f kv _swb" style="white-space:nowrap"> <cite class="_rm">www.fcv.org.br/</cite>
find first a
element inside every search result , it's href
attribute value:
from selenium import webdriver driver = webdriver.phantomjs() driver.get("https://www.google.com/search?q=test") results = driver.find_elements_by_css_selector('div.g') link = results[0].find_element_by_tag_name("a") href = link.get_attribute("href")
then can extract actual url href
value urlparse
:
import urlparse print(urlparse.parse_qs(urlparse.urlparse(href).query)["q"])
prints:
[u'http://www.speedtest.net/']
Comments
Post a Comment