How to execute a spark sql query from a map function (Python)? -
how 1 execute spark sql queries routines not driver portion of program?
from pyspark import sparkcontext pyspark.sql import sqlcontext pyspark.sql.types import * def dowork(rec): data = sqlcontext.sql("select * zip_data statefp ='{sfp}' , countyfp = '{cfp}' ".format(sfp=rec[0], cfp=rec[1])) item in data.collect(): print(item) # return (rec[0], rec[1]) if __name__ == "__main__": sc = sparkcontext(appname="some app") print("starting app") sqlcontext = sqlcontext(sc) parquetfile = sqlcontext.read.parquet("/path/to/data/") parquetfile.registertemptable("zip_data") df = sqlcontext.sql("select distinct statefp,countyfp zip_data statefp in ('12') ") rslts = df.map(dowork) rslt in rslts.collect(): print(rslt)
in example i'm attempting query same table query other tables registered in spark sql too.
one not execute nested operations on distributed data structure.it not supported in spark. have use joins
, local (optionally broadcasted) data structures or access external data directly instead.
Comments
Post a Comment