javascript - Express app.get() equivalent in CasperJS -
i have built simple web scraper scrapes website , outputs data need when visit url - localhost:3434/page
. implemented functionality using express app.get()
method.
i have following questions,
1) want know if there way implement functionality in casperjs.
2) there way make code start scraping after visit url -localhost:8081/scrape
. don't think creating endpoint correctly because starting scrape before visit url
3) when visit url gives me error saying url not available.
i think of these problems solved if can set end point correctly localhost:3434/page
in casperjs. don't need results appear on page. need start scraping when visit url.
below code developed scrape website , create server in casper.
var server = require('webserver').create(); var service = server.listen(3434, function(request, response) { var casper = require('casper').create({ loglevel:"verbose", debug:true }); var links; var name; var paragraph; var firstname; var expression = /[-a-za-z0-9@:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-za-z0-9@:%_\+.~#?&//=]*)?/gi; var regex = new regexp(expression); casper.start('http://www.home.com/professionals/c/oho,-tn'); casper.then(function getlinks(){ links = this.evaluate(function(){ var links = document.getelementsbyclassname('pro-title'); links = array.prototype.map.call(links,function(link){ return link.getattribute('href'); }); return links; }); }); casper.then(function(){ this.each(links,function(self,link){ if (link.match(regex)) { self.thenopen(link,function(a){ var firstname = this.fetchtext('div.info-list-text'); this.echo(firstname); }); } }); }); casper.run(function() { response.statuscode = 200; response.write(firstname); response.close(); }); });
the webserver
used in casperjs script phantomjs's web server module "intended ease of communication between phantomjs scripts , outside world , not recommended use general production server"
you should not build web server in phantomjs. checkout these node-phantom bridges let use phantom regular nodejs web server:
- https://github.com/spookyjs/spookyjs
- https://github.com/peerigon/phridge
- https://github.com/baudehlo/node-phantom-simple
- https://github.com/johntitus/node-horseman
- https://github.com/nodeca/navit
spookyjs driver particularly casperjs, whereas others phantomjs only.
although casperjs allows being loaded within phantomjs can @ least use in phridge (not sure others) since has .run
function runs function directly inside phantomjs environment:
casperpath = path.join(require.resolve('casperjs/bin/bootstrap'), '/../..'); phantom.run(casperpath, function(casperpath) { phantom.casperpath = casperpath; phantom.injectjs(casperpath + '/bin/bootstrap.js'); casper = require('casper').create(); ...
besides ones use phantomjs, there's others:
zombiejs uses native nodejs libraries makes fastest , natural use in nodejs app. although it's meant more testing purposes , may not work on sites other scrapers might.
Comments
Post a Comment