### The following code shows how to transform binary type information in spark dataframe into struct type
- **This notebook is running with a pyspark kernel** 

In [3]:
geotwt_sdf=spark.read.parquet("BMC_UserGeoTwt/BMC_GeoTwt_Snappy*")

In [16]:
print geotwt_sdf.count()
geotwt_sdf.show(2)

38820846
+--------------------+---------+------------------+----------+-----------+--------------------+--------------------+--------------------+--------------------+---------------+-------------+----------+--------------------+------+---------+----------------+-----------+-----------------+
|               ctime|      uid|             uname|       lat|        lng|    profile_location|                term|             hashtag|            fulltext|place_full_name|      country|place_type|        bounding_box|c_code|attribute|           geoid| place_name|__index_level_0__|
+--------------------+---------+------------------+----------+-----------+--------------------+--------------------+--------------------+--------------------+---------------+-------------+----------+--------------------+------+---------+----------------+-----------+-----------------+
|[54 75 65 20 53 6...|126371773|CompassUSAJobBoard|37.5407246|-77.4360481|[43 68 61 72 6C 6...|[6A 6F 69 6E 20 6...|[6A 6F 62 20 46 6...

  <br>
<font style="font-family:'Times new man';font-size: 1.15em">
    
We notice that there are several columns have been encoded into binary type. Well, it is pretty easy to cast byte array into string using `astype` function. However, it becomes tricky for colume with structured information,e.g., the `bounding_box` column. The `bounding_box` column contains a json string that is supposed to be read in as a struct type column.  
  
<br>
We have to specify the struct type manually in order to let the reader recognize the information.  



In [4]:
schema = StructType([
    StructField("type", StringType(), True),
    StructField("coordinates", ArrayType(ArrayType(ArrayType(FloatType()))),
                True),
])

In [6]:
temp_sdf=geotwt_sdf.withColumn('ctime_str',geotwt_sdf.ctime.astype('string'))
temp_sdf=temp_sdf.withColumn('term_str',temp_sdf.term.astype('string'))
temp_sdf=temp_sdf.withColumn('bbox_str',temp_sdf.bounding_box.astype('string'))
temp_sdf=temp_sdf.withColumn('coords',func.regexp_replace('bbox_str','u',""))
temp_sdf=temp_sdf.withColumn('bbox',func.from_json('coords',schema)) 

temp_sdf=temp_sdf.withColumn('htag_str',temp_sdf.hashtag.astype('string'))
temp_sdf=temp_sdf.withColumn('plocation_str',temp_sdf.profile_location.astype('string'))

temp_sdf = temp_sdf.withColumn(
    'll_lat',
    temp_sdf.bbox.coordinates.getItem(0).getItem(0).getItem(1)).withColumn(
        'll_lng',
        temp_sdf.bbox.coordinates.getItem(0).getItem(0).getItem(0))
temp_sdf = temp_sdf.withColumn(
    'ur_lat',
    temp_sdf.bbox.coordinates.getItem(0).getItem(2).getItem(1)).withColumn(
        'ur_lng',
        temp_sdf.bbox.coordinates.getItem(0).getItem(2).getItem(0))
temp_sdf = temp_sdf.drop('ctime', 'profile_location', 'term', 'hashtag',
                         'bounding_box', 'attribute', 'bbox', 'coords',
                         'bbox_str')

temp_sdf.show(1)

+---------+------------------+----------+-----------+--------------------+---------------+-------------+----------+------+----------------+----------+-----------------+--------------------+--------------------+--------------------+-------------+---------+---------+--------+--------+
|      uid|             uname|       lat|        lng|            fulltext|place_full_name|      country|place_type|c_code|           geoid|place_name|__index_level_0__|           ctime_str|            term_str|            htag_str|plocation_str|   ll_lat|   ll_lng|  ur_lat|  ur_lng|
+---------+------------------+----------+-----------+--------------------+---------------+-------------+----------+------+----------------+----------+-----------------+--------------------+--------------------+--------------------+-------------+---------+---------+--------+--------+
|126371773|CompassUSAJobBoard|37.5407246|-77.4360481|Join the Crothall...|   Richmond, VA|United States|      city|    US|00f751614d8ce37b|  Richmon