python - How to extract a Google link's href from search results with Selenium? -


ultimately trying href of first link google's search result

the information need exists in 'a' element, stored in 'data-href' attribute, not figure how extract data (get_attribute('data-href') returns none).

i using phantomjs, have tried firefox web driver


the href displayed in cite tag in google search (which can found inspecting small green link text under each link in google search results).

the cite element apparently found selenium, text returned (element.text, or get_attribute('innerhtml'), or (text)) not shown in html.

for instance, there cite tag <cite class="_rm">www.fcv.org.br/</cite>, element.text shows “wikimapia.org/.../fundação-cristiano-varella-hospital...”

i have tried retrieve cite element by_css_selector, tag_name, class_name, , xpath same results.

links = driver.find_elements_by_css_selector('div.g') # div[class="g"] link = links[0] # looking first link in main links section next = link.find_element_by_css_selector('div[class="s"]') # location of cite tag nextb = next.find_element_by_tag_name('cite')  

div containing cite tag (there 1 in div)

    <div class="s">          <div>              <div class="f kv _swb" style="white-space:nowrap">                   <cite class="_rm">www.fcv.org.br/</cite> 

find first a element inside every search result , it's href attribute value:

from selenium import webdriver  driver = webdriver.phantomjs() driver.get("https://www.google.com/search?q=test")  results = driver.find_elements_by_css_selector('div.g') link = results[0].find_element_by_tag_name("a") href = link.get_attribute("href") 

then can extract actual url href value urlparse:

import urlparse  print(urlparse.parse_qs(urlparse.urlparse(href).query)["q"]) 

prints:

[u'http://www.speedtest.net/'] 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -