DAS-C01 AWS Certified Data Analytics - Specialty - Practice Questions

Question 1

A company launched a service that produces millions of messages every day and uses Amazon Kinesis Data Streams as the streaming service.

The company uses the Kinesis SDK to write data to Kinesis Data Streams. A few months after launch, a data analyst found that write performance is significantly reduced. The data analyst investigated the metrics and determined that Kinesis is throttling the write requests. The data analyst wants to address this issue without significant changes to the architecture. Which actions should the data analyst take to resolve this issue? (Choose two.)

Select all that apply, then click Submit answer.

○
Increase the Kinesis Data Streams retention period to reduce throttling.
○
Replace the Kinesis API-based data ingestion mechanism with Kinesis Agent.
○
Increase the number of shards in the stream using the UpdateShardCount API.
○
Choose partition keys in a way that results in a uniform record distribution across shards.
○
Customize the application code to include retry logic to improve performance.

Question 2

An airline has been collecting metrics on flight activities for analytics. A recently completed proof of concept demonstrates how the company provides insights to data analysts to improve on-time departures. The proof of concept used objects in Amazon S3, which contained the metrics in .csv format, and used Amazon Athena for querying the data. As the amount of data increases, the data analyst wants to optimize the storage solution to improve query performance.

Which options should the data analyst use to improve performance as the data lake grows? (Choose three.)

Select all that apply, then click Submit answer.

○
Add a randomized string to the beginning of the keys in S3 to get more throughput across partitions.
○
Use an S3 bucket in the same account as Athena.
○
Compress the objects to reduce the data transfer I/O.
○
Use an S3 bucket in the same Region as Athena.
○
Preprocess the .csv data to JSON to reduce I/O by fetching only the document keys needed by the query.
○
Preprocess the .csv data to Apache Parquet to reduce I/O by fetching only the data blocks needed for predicates.

Question 3

A marketing company is using Amazon EMR clusters for its workloads. The company manually installs third-party libraries on the clusters by logging in to the master nodes. A data analyst needs to create an automated solution to replace the manual process.

Which options can fulfill these requirements? (Choose two.)

Select all that apply, then click Submit answer.

○
Place the required installation scripts in Amazon S3 and execute them using custom bootstrap actions.
○
Place the required installation scripts in Amazon S3 and execute them through Apache Spark in Amazon EMR.
○
Install the required third-party libraries in the existing EMR master node. Create an AMI out of that master node and use that custom AMI to re-create the EMR cluster.
○
Use an Amazon DynamoDB table to store the list of required applications. Trigger an AWS Lambda function with DynamoDB Streams to install the software.
○
Launch an Amazon EC2 instance with Amazon Linux and install the required third-party libraries on the instance. Create an AMI and use that AMI to create the EMR cluster.

DAS-C01 – AWS Certified Data Analytics - Specialty