- GCP Credentials
Exporting GCP credentials to S3 and then PySpark.
We'll cover the following...
We now have a dataset that we can use as input to a PySpark pipeline, but we don’t yet have access to the bucket on GCS from our Spark environment.
Accessing GCP bucket
With AWS, we were able to set up programmatic access to S3 using an access and secret key. With GCP, the process is a bit more complicated because we need to move the JSON credentials file to the driver node of the cluster in order to read and write files on GCS.