How to execute a spark sql query from a map function (Python)? -


how 1 execute spark sql queries routines not driver portion of program?

from pyspark import sparkcontext pyspark.sql import sqlcontext pyspark.sql.types import *   def dowork(rec):     data = sqlcontext.sql("select * zip_data statefp ='{sfp}' , countyfp = '{cfp}' ".format(sfp=rec[0], cfp=rec[1]))     item in data.collect():         print(item)     #        return (rec[0], rec[1])  if __name__ == "__main__":     sc = sparkcontext(appname="some app")     print("starting app")      sqlcontext = sqlcontext(sc)      parquetfile = sqlcontext.read.parquet("/path/to/data/")     parquetfile.registertemptable("zip_data")       df = sqlcontext.sql("select distinct statefp,countyfp zip_data statefp in ('12') ")     rslts = df.map(dowork)      rslt in rslts.collect():         print(rslt) 

in example i'm attempting query same table query other tables registered in spark sql too.

one not execute nested operations on distributed data structure.it not supported in spark. have use joins, local (optionally broadcasted) data structures or access external data directly instead.


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -