web scraping - unable to fetch the Video source link from this website using JSoup? -
i have website want fetch video link using jsoup. im unable program throws error. can please me?
here code:
import java.io.ioexception; import org.jsoup.jsoup; import org.jsoup.nodes.document; import org.jsoup.nodes.element; import org.jsoup.select.elements; public class movmaker { public static void main(string[] args) { try { string url="http://www.tamilyogi.tv/7aum-arivu-2011-hd-720p-tamil-movie-watch-online/"; document doc = jsoup.connect(url).get(); element vid = doc.getelementsbytag("video").get(0); system.out.println("\nlink: " + vid.attr("src")); system.out.println("text: " + vid.text()); catch (ioexception e) { e.printstacktrace(); } } } my error:
exception in thread "main" java.lang.indexoutofboundsexception: index: 0, size: 0 @ java.util.arraylist.rangecheck(unknown source) @ java.util.arraylist.get(unknown source) @ movmaker.main(movmaker.java:16)
the page source want fetch data is: here
im new java , jsoup thankful if can give me code.
regards, bhuvanesh
there no <video> tag in directly loaded html of link have given. tag instead created javascript in browser. since jsoup not run javascript out of luck here.
what can either use
or analyze contents of html , maybe network traffic happens in browser when load site in order find out if can construct link info hand. in case had quick view on html , found video tag generated within iframe. in source of iframe find part:
<script type="text/javascript"> jwplayer("vplayer").setup({ sources: [{file:"http://cdn7.vidmad.tv/h7todtdxamlbu3tf6rutlihpzoz4di2fcsaje74hlrcqda7qibjmlb4vblxq/v.mp4",label:"720p"},{file:"http://cdn7.vidmad.tv/h7todtdxamlbu3tf6rutlihpzoz4di2fcsaje74hljcqda7qibjjd3opruyq/v.mp4",label:"360p","default": "true"},{file:"http://cdn7.vidmad.tv/h7todtdxamlbu3tf6rutlihpzoz4di2fcsaje74hlbcqda7qibjgcvfli2eq/v.mp4",label:"240p"}], image: "http://cdn7.vidmad.tv/i/01/00000/cjwf05thn2vm.jpg", duration:"9607", width: "100%", height: "350", aspectratio: "16:9", preload: "none", androidhls: "true", startparam: "start" ,tracks: [] ,skin: "glow",abouttext:"vidmad", aboutlink:"http://vidmad.tv" }); ... </script> so url part of <script> tag. can use regular expressions it:
document doc = jsoup.connect("http://www.tamilyogi.tv/7aum-arivu-2011-hd-720p-tamil-movie-watch-online/") .useragent("mozilla/5.0") .get(); element iframeel = doc.select("iframe[src*=embed]").first(); if (iframeel != null){ document framedoc = jsoup.connect(iframeel.attr("src")) .useragent("mozilla/5.0") .get(); elements scriptels = framedoc.select("script"); (element scriptel :scriptels ){ string html = scriptel.html(); pattern p = pattern.compile("sources:\\s*\\[\\{file:\"([^\"]+)"); matcher m = p.matcher(html); if (m.find()){ string link = m.group(1); system.out.println(link); break; } } } of course solution above works site , link. may need adapt approach fit needs, general idea should clear now.
Comments
Post a Comment