AWS-Certified-Machine-Learning-Specialty-MLS-C01 AWS Certified Machine Learning - Specialty (MLS-C01)

Loading demo links...

Showing 1–3 of 15 questions

Question 1

A Machine Learning Specialist needs to move and transform data in preparation for training. Some of the data needs to be processed in near-real time, and other data can be moved hourly. There are existing Amazon EMR MapReduce jobs to clean and feature engineering to perform on the data.

Which of the following services can feed data to the MapReduce jobs? (Choose two.)

Select all that apply, then click Submit answer.

  • AWS DMS

  • Amazon Kinesis

  • AWS Data Pipeline

  • Amazon Athena

  • Amazon ES

Question 2

A company is observing low accuracy while training on the default built-in image classification algorithm in Amazon SageMaker. The Data Science team wants to use an Inception neural network architecture instead of a ResNet architecture.

Which of the following will accomplish this? (Choose two.)

Select all that apply, then click Submit answer.

  • Customize the built-in image classification algorithm to use Inception and use this for model training.

  • Create a support case with the SageMaker team to change the default image classification algorithm to Inception.

  • Bundle a Docker container with TensorFlow Estimator loaded with an Inception network and use this for model training.

  • Use custom code in Amazon SageMaker with TensorFlow Estimator to load the model with an Inception network, and use this for model training.

  • Download and apt-get install the inception network code into an Amazon EC2 instance and use this instance as a Jupyter notebook in Amazon SageMaker.

Question 3

A gaming company has launched an online game where people can start playing for free, but they need to pay if they choose to use certain features. The company needs to build an automated system to predict whether or not a new user will become a paid user within 1 year. The company has gathered a labeled dataset from 1 million users.

The training dataset consists of 1,000 positive samples (from users who ended up paying within 1 year) and 999,000 negative samples (from users who did not use any paid features). Each data sample consists of 200 features including user age, device, location, and play patterns.

Using this dataset for training, the Data Science team trained a random forest model that converged with over 99% accuracy on the training set. However, the prediction results on a test dataset were not satisfactory

Which of the following approaches should the Data Science team take to mitigate this issue? (Choose two.)

Select all that apply, then click Submit answer.

  • Add more deep trees to the random forest to enable the model to learn more features.

  • Include a copy of the samples in the test dataset in the training dataset.

  • Generate more positive samples by duplicating the positive samples and adding a small amount of noise to the duplicated data.

  • Change the cost function so that false negatives have a higher impact on the cost value than false positives.

  • Change the cost function so that false positives have a higher impact on the cost value than false negatives.