Professional-Data-Engineer Professional Data Engineer on Google Cloud Platform - Practice Questions

Question 4

You are running a pipeline in Cloud Dataflow that receives messages from a Cloud Pub/Sub topic and writes the results to a BigQuery dataset in the EU. Currently, your pipeline is located in europe-west4 and has a maximum of 3 workers, instance type n1-standard-1. You notice that during peak periods, your pipeline is struggling to process records in a timely fashion, when all 3 workers are at maximum

CPU utilization. Which two actions can you take to increase performance of your pipeline? (Choose two.)

Select all that apply, then click Submit answer.

○
Increase the number of max workers
○
Use a larger instance type for your Cloud Dataflow workers
○
Change the zone of your Cloud Dataflow pipeline to run in us-central1
○
Create a temporary table in Cloud Bigtable that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Bigtable to BigQuery
○
Create a temporary table in Cloud Spanner that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Spanner to BigQuery

Question 5

You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by your data science team for analysis. Every hour, thousands of transactions are updated with a new status. The size of the initial dataset is 1.5 PB, and it will grow by 3 TB per day. The data is heavily structured, and your data science team will build machine learning models based on this data. You want to maximize performance and usability for your data science team. Which two strategies should you adopt? (Choose two.)

Select all that apply, then click Submit answer.

○
Denormalize the data as must as possible.
○
Preserve the structure of the data as much as possible.
○
Use BigQuery UPDATE to further reduce the size of the dataset.
○
Develop a data pipeline where status updates are appended to BigQuery instead of updated.
○
Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file. Use BigQuery’s support for external data sources to query.

Question 6

Data Analysts in your company have the Cloud IAM Owner role assigned to them in their projects to allow them to work with multiple GCP products in their projects. Your organization requires that all BigQuery data access logs be retained for 6 months. You need to ensure that only audit personnel in your company can access the data access logs for all projects. What should you do?

Select an option, then click Submit answer.

○
Enable data access logs in each Data Analyst’s project. Restrict access to Stackdriver Logging via Cloud IAM roles.
○
Export the data access logs via a project-level export sink to a Cloud Storage bucket in the Data Analysts’ projects. Restrict access to the Cloud Storage bucket.
○
Export the data access logs via a project-level export sink to a Cloud Storage bucket in a newly created projects for audit logs. Restrict access to the project with the exported logs.
○
Export the data access logs via an aggregated export sink to a Cloud Storage bucket in a newly created project for audit logs. Restrict access to the project that contains the exported logs.

Professional-Data-Engineer – Professional Data Engineer on Google Cloud Platform