java - How to filter IP from log file in another Rdd -
i fetching ip access log file,tried using pattern not getting correct output.
public class ipcount { public static void main(string[] args) { string ipaddress_pattern = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"; pattern pattern = pattern.compile(ipaddress_pattern); matcher matcher = pattern.matcher(t); javasparkcontext sc = new javasparkcontext("local", "ipcount"); @suppresswarnings({ "unused", "serial" }) javardd<string> lines = sc.textfile("/home/bhaumik/documents/access_log", 5) .flatmap(new flatmapfunction<string, string>() { @override public iterable<string> call(string t) throws exception { // todo auto-generated method stub return null; //here should can ip filter log file. } }); } }
here's java method extracting ips javardd<string>
, assuming each line might contain zero, one, or more ips:
public javardd<string> getips(javardd<string> rdd) { final string ipaddress_pattern = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"; final pattern pattern = pattern.compile(ipaddress_pattern); return rdd.flatmap(new flatmapfunction<string, string>() { @override public iterable<string> call(string t) throws exception { final matcher matcher = pattern.matcher(t); final linkedlist<string> matches = new linkedlist<>(); while (matcher.find()) { matches.add(matcher.group()); } return matches; } }); }
Comments
Post a Comment