Updated Feb-2022 Pass AWS-Certified-Machine-Learning-Specialty Exam - Real Practice Test Questions [Q28-Q52]

Share

Updated Feb-2022 Pass AWS-Certified-Machine-Learning-Specialty Exam - Real Practice Test Questions

Download Free Amazon AWS-Certified-Machine-Learning-Specialty Real Exam Questions


Certification Path of AWS Certified Machine Learning - Specialty

The Amazon MLS certification path includes only one certification exam.


AWS Machine Learning Specialty Exam Syllabus Topics:

SectionObjectives

Data Engineering - 20%

Create data repositories for machine learning.- Identify data sources (e.g., content and location, primary sources such as user data)
- Determine storage mediums (e.g., DB, Data Lake, S3, EFS, EBS)
Identify and implement a data ingestion solution.- Data job styles/types (batch load, streaming)
  • Kinesis
  • Kinesis Analytics
  • Kinesis Firehose
  • EMR
  • Glue

- Data ingestion pipelines (Batch-based ML workloads and streaming-based ML workloads)
- Job scheduling

Identify and implement a data transformation solution.- Transforming data transit (ETL: Glue, EMR, AWS Batch)
- Handle ML-specific data using map reduce (Hadoop, Spark, Hive)

Exploratory Data Analysis - 24%

Sanitize and prepare data for modeling.- Identify and handle missing data, corrupt data, stop words, etc.
- Formatting, normalizing, augmenting, and scaling data
- Labeled data (recognizing when you have enough labeled data and identifying mitigation strategies [Data labeling tools (Mechanical Turk, manual labor)])
Perform feature engineering.- Identify and extract features from data sets, including from data sources such as text, speech, image, public datasets, etc.
- Analyze/evaluate feature engineering concepts (binning, tokenization, outliers, synthetic features, 1 hot encoding, reducing dimensionality of data)
Analyze and visualize data for machine learning.- Graphing (scatter plot, time series, histogram, box plot)
- Interpreting descriptive statistics (correlation, summary statistics, p value)
- Clustering (hierarchical, diagnosing, elbow plot, cluster size)

Modeling - 36%

Frame business problems as machine learning problems.- Determine when to use/when not to use ML
- Know the difference between supervised and unsupervised learning
- Selecting from among classification, regression, forecasting, clustering, recommendation, etc.


Understanding functional and technical aspects of AWS Certified Machine Learning Specialty Exam Modeling

The following will be dicussed here:

  • Train machine learning models
  • Evaluate machine learning models
  • Perform hyperparameter optimization
  • Frame business problems as machine learning problems
  • Select the appropriate model(s) for a given machine learning problem

 

NEW QUESTION 28
An insurance company is developing a new device for vehicles that uses a camera to observe drivers' behavior and alert them when they appear distracted. The company created approximately 10,000 training images in a controlled environment that a Machine Learning Specialist will use to train and evaluate machine learning models.
During the model evaluation, the Specialist notices that the training error rate diminishes faster as the number of epochs increases and the model is not accurately inferring on the unseen test images.
Which of the following should be used to resolve this issue? (Choose two.)

  • A. Add vanishing gradient to the model.
  • B. Add L2 regularization to the model.
  • C. Perform data augmentation on the training data.
  • D. Make the neural network architecture complex.
  • E. Use gradient checking in the model.

Answer: B,C

Explanation:
The model must have been overfitted. Regularization helps to solve the overfitting problem in machine learning (as well as data augmentation).

 

NEW QUESTION 29
A Data Scientist wants to gain real-time insights into a data stream of GZIP files.
Which solution would allow the use of SQL to query the stream with the LEAST latency?

  • A. Amazon Kinesis Data Analytics with an AWS Lambda function to transform the data.
  • B. Amazon Kinesis Data Firehose to transform the data and put it into an Amazon S3 bucket.
  • C. AWS Glue with a custom ETL script to transform the data.
  • D. An Amazon Kinesis Client Library to transform the data and save it to an Amazon ES cluster.

Answer: A

Explanation:
https://aws.amazon.com/big-data/real-time-analytics-featured-partners/

 

NEW QUESTION 30
A Data Science team is designing a dataset repository where it will store a large amount of training data commonly used in its machine learning models. As Data Scientists may create an arbitrary number of new datasets every day, the solution has to scale automatically and be cost-effective. Also, it must be possible to explore the data using SQL.
Which storage scheme is MOST adapted to this scenario?

  • A. Store datasets as files in an Amazon EBS volume attached to an Amazon EC2 instance.
  • B. Store datasets as files in Amazon S3.
  • C. Store datasets as global tables in Amazon DynamoDB.
  • D. Store datasets as tables in a multi-node Amazon Redshift cluster.

Answer: B

 

NEW QUESTION 31
A city wants to monitor its air quality to address the consequences of air pollution A Machine Learning Specialist needs to forecast the air quality in parts per million of contaminates for the next 2 days in the city As this is a prototype, only daily data from the last year is available Which model is MOST likely to provide the best results in Amazon SageMaker?

  • A. Use Amazon SageMaker Random Cut Forest (RCF) on the single time series consisting of the full year of data
  • B. Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full year of data with a predictor_type of regressor
  • C. Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full year of data with a predictor_tyce of classifier
  • D. Use the Amazon SageMaker k-Nearest-Neighbors (kNN) algorithm on the single time series consisting of the full year of data with a predictor_type of regressor.

Answer: D

 

NEW QUESTION 32
A web-based company wants to improve its conversion rate on its landing page. Using a large historical dataset of customer visits, the company has repeatedly trained a multi-class deep learning network algorithm on Amazon SageMaker. However, there is an overfitting problem: training data shows 90% accuracy in predictions, while test data shows 70% accuracy only.
The company needs to boost the generalization of its model before deploying it into production to maximize conversions of visits to purchases.
Which action is recommended to provide the HIGHEST accuracy model for the company's test and validation data?

  • A. Apply L1 or L2 regularization and dropouts to the training
  • B. Increase the randomization of training data in the mini-batches used in training
  • C. Allocate a higher proportion of the overall data to the training dataset
  • D. Reduce the number of layers and units (or neurons) from the deep learning network

Answer: D

 

NEW QUESTION 33
A large company has developed a BI application that generates reports and dashboards using data collected from various operational metrics. The company wants to provide executives with an enhanced experience so they can use natural language to get data from the reports. The company wants the executives to be able ask questions using written and spoken interfaces.
Which combination of services can be used to build this conversational interface? (Choose three.)

  • A. Amazon Connect
  • B. Amazon Transcribe
  • C. Amazon Polly
  • D. Alexa for Business
  • E. Amazon Comprehend
  • F. Amazon Lex

Answer: A,B,E

 

NEW QUESTION 34
A Data Science team within a large company uses Amazon SageMaker notebooks to access data stored in Amazon S3 buckets. The IT Security team is concerned that internet-enabled notebook instances create a security vulnerability where malicious code running on the instances could compromise data privacy. The company mandates that all instances stay within a secured VPC with no internet access, and data communication traffic must stay within the AWS network.
How should the Data Science team configure the notebook instance placement to meet these requirements?

  • A. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Place the Amazon SageMaker endpoint and S3 buckets within the same VPC.
  • B. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Ensure the VPC has S3 VPC endpoints and Amazon SageMaker VPC endpoints attached to it.
  • C. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Ensure the VPC has a NAT gateway and an associated security group allowing only outbound connections to Amazon S3 and Amazon SageMaker.
  • D. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Use IAM policies to grant access to Amazon S3 and Amazon SageMaker.

Answer: B

Explanation:
We must use the VPC endpoint (either Gateway Endpoint or Interface Endpoint)to comply with this requirement "Data communication traffic must stay within the AWS network".
https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-interface-endpoint.html

 

NEW QUESTION 35
A Machine Learning Specialist is packaging a custom ResNet model into a Docker container so the company can leverage Amazon SageMaker for training The Specialist is using Amazon EC2 P3 instances to train the model and needs to properly configure the Docker container to leverage the NVIDIA GPUs What does the Specialist need to do1?

  • A. Bundle the NVIDIA drivers with the Docker image
  • B. Organize the Docker container's file structure to execute on GPU instances.
  • C. Build the Docker container to be NVIDIA-Docker compatible
  • D. Set the GPU flag in the Amazon SageMaker Create TrainingJob request body

Answer: A

 

NEW QUESTION 36
A Machine Learning Specialist is working for a credit card processing company and receives an unbalanced dataset containing credit card transactions. It contains 99,000 valid transactions and 1,000 fraudulent transactions The Specialist is asked to score a model that was run against the dataset The Specialist has been advised that identifying valid transactions is equally as important as identifying fraudulent transactions What metric is BEST suited to score the model?

  • A. Recall
  • B. Area Under the ROC Curve (AUC)
  • C. Root Mean Square Error (RMSE)
  • D. Precision

Answer: D

 

NEW QUESTION 37
A Machine Learning Specialist is creating a new natural language processing application that processes a dataset comprised of 1 million sentences. The aim is to then run Word2Vec to generate embeddings of the sentences and enable different types of predictions.
Here is an example from the dataset:
"The quck BROWN FOX jumps over the lazy dog."
Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Choose three.)

  • A. Tokenize the sentence into words.
  • B. Remove stop words using an English stopword dictionary.
  • C. Perform part-of-speech tagging and keep the action verb and the nouns only.
  • D. Correct the typography on "quck" to "quick."
  • E. Normalize all words by making the sentence lowercase.
  • F. One-hot encode all words in the sentence.

Answer: A,B,E

Explanation:
1- Apply words stemming and lemmatization
2- Remove Stop words
3- Tokensize the sentences
https://towardsdatascience.com/nlp-extracting-the-main-topics-from-your-dataset-using-lda-in- minutes-21486f5aa925

 

NEW QUESTION 38
A Machine Learning Specialist is developing a daily ETL workflow containing multiple ETL jobs.
The workflow consists of the following processes:
- Start the workflow as soon as data is uploaded to Amazon S3.
- When all the datasets are available in Amazon S3, start an ETL job to join the uploaded datasets with multiple terabyte-sized datasets already stored in Amazon S3.
- Store the results of joining datasets in Amazon S3.
- If one of the jobs fails, send a notification to the Administrator.
Which configuration will meet these requirements?

  • A. Use AWS Lambda to trigger an AWS Step Functions workflow to wait for dataset uploads to complete in Amazon S3. Use AWS Glue to join the datasets. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
  • B. Develop the ETL workflow using AWS Batch to trigger the start of ETL jobs when data is uploaded to Amazon S3. Use AWS Glue to join the datasets in Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
  • C. Develop the ETL workflow using AWS Lambda to start an Amazon SageMaker notebook instance. Use a lifecycle configuration script to join the datasets and persist the results in Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
  • D. Use AWS Lambda to chain other Lambda functions to read and join the datasets in Amazon S3 as soon as the data is uploaded to Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.

Answer: A

Explanation:
https://aws.amazon.com/step-functions/use-cases/

 

NEW QUESTION 39
A Machine Learning Specialist is building a convolutional neural network (CNN) that will classify
10 types of animals. The Specialist has built a series of layers in a neural network that will take an input image of an animal, pass it through a series of convolutional and pooling layers, and then finally pass it through a dense and fully connected layer with 10 nodes. The Specialist would like to get an output from the neural network that is a probability distribution of how likely it is that the input image belongs to each of the 10 classes.
Which function will produce the desired output?

  • A. Smooth L1 loss
  • B. Softmax
  • C. Dropout
  • D. Rectified linear units (ReLU)

Answer: B

Explanation:
https://medium.com/data-science-bootcamp/understand-the-softmax-function-in-minutes- f3a59641e86d

 

NEW QUESTION 40
A Machine Learning Specialist is preparing data for training on Amazon SageMaker The Specialist is transformed into a numpy .array, which appears to be negatively affecting the speed of the training What should the Specialist do to optimize the data for training on SageMaker'?

  • A. Use AWS Glue to compress the data into the Apache Parquet format
  • B. Use the SageMaker batch transform feature to transform the training data into a DataFrame
  • C. Transform the dataset into the Recordio protobuf format
  • D. Use the SageMaker hyperparameter optimization feature to automatically optimize the data

Answer: C

 

NEW QUESTION 41
A Machine Learning Specialist is creating a new natural language processing application that processes a dataset comprised of 1 million sentences The aim is to then run Word2Vec to generate embeddings of the sentences and enable different types of predictions - Here is an example from the dataset
"The quck BROWN FOX jumps over the lazy dog "
Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Select THREE)

  • A. Tokenize the sentence into words.
  • B. One-hot encode all words in the sentence
  • C. Normalize all words by making the sentence lowercase
  • D. Remove stop words using an English stopword dictionary.
  • E. Correct the typography on "quck" to "quick."
  • F. Perform part-of-speech tagging and keep the action verb and the nouns only

Answer: C,E,F

 

NEW QUESTION 42
A Data Engineer needs to build a model using a dataset containing customer credit card information.
How can the Data Engineer ensure the data remains encrypted and the credit card information is secure?

  • A. Use an Amazon SageMaker launch configuration to encrypt the data once it is copied to the SageMaker instance in a VPC. Use the SageMaker principal component analysis (PCA) algorithm to reduce the length of the credit card numbers.
  • B. Use an IAM policy to encrypt the data on the Amazon S3 bucket and Amazon Kinesis to automatically discard credit card numbers and insert fake credit card numbers.
  • C. Use a custom encryption algorithm to encrypt the data and store the data on an Amazon SageMaker instance in a VPC. Use the SageMaker DeepAR algorithm to randomize the credit card numbers.
  • D. Use AWS KMS to encrypt the data on Amazon S3 and Amazon SageMaker, and redact the credit card numbers from the customer data with AWS Glue.

Answer: D

Explanation:
https://docs.aws.amazon.com/sagemaker/latest/dg/pca.html

 

NEW QUESTION 43
A Data Science team within a large company uses Amazon SageMaker notebooks to access data stored in Amazon S3 buckets. The IT Security team is concerned that internet-enabled notebook instances create a security vulnerability where malicious code running on the instances could compromise data privacy. The company mandates that all instances stay within a secured VPC with no internet access, and data communication traffic must stay within the AWS network.
How should the Data Science team configure the notebook instance placement to meet these requirements?

  • A. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Place the Amazon SageMaker endpoint and S3 buckets within the same VPC.
  • B. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Ensure the VPC has S3 VPC endpoints and Amazon SageMaker VPC endpoints attached to it.
  • C. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Ensure the VPC has a NAT gateway and an associated security group allowing only outbound connections to Amazon S3 and Amazon SageMaker.
  • D. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Use IAM policies to grant access to Amazon S3 and Amazon SageMaker.

Answer: B

 

NEW QUESTION 44
An online reseller has a large, multi-column dataset with one column missing 30% of its data. A Machine Learning Specialist believes that certain columns in the dataset could be used to reconstruct the missing data.
Which reconstruction approach should the Specialist use to preserve the integrity of the dataset?

  • A. Multiple imputation
  • B. Listwise deletion
  • C. Mean substitution
  • D. Last observation carried forward

Answer: A

Explanation:
Explanation/Reference: https://worldwidescience.org/topicpages/i/imputing+missing+values.html

 

NEW QUESTION 45
A Machine Learning Specialist kicks off a hyperparameter tuning job for a tree-based ensemble model using Amazon SageMaker with Area Under the ROC Curve (AUC) as the objective metric.
This workflow will eventually be deployed in a pipeline that retrains and tunes hyperparameters each night to model click-through on data that goes stale every 24 hours.
With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease costs, the Specialist wants to reconfigure the input hyperparameter range(s).
Which visualization will accomplish this?

  • A. A scatter plot showing the correlation between maximum tree depth and the objective metric.
  • B. A histogram showing whether the most important input feature is Gaussian.
  • C. A scatter plot with points colored by target variable that uses t-Distributed Stochastic Neighbor Embedding (t-SNE) to visualize the large number of input variables in an easier-to-read dimension.
  • D. A scatter plot showing the performance of the objective metric over each training iteration.

Answer: C

Explanation:
https://medium.com/all-things-ai/in-depth-parameter-tuning-for-random-forest-d67bb7e920d

 

NEW QUESTION 46
A Machine Learning Specialist is working with a large company to leverage machine learning within its products. The company wants to group its customers into categories based on which customers will and will not churn within the next 6 months. The company has labeled the data available to the Specialist.
Which machine learning model type should the Specialist use to accomplish this task?

  • A. Clustering
  • B. Linear regression
  • C. Reinforcement learning
  • D. Classification

Answer: D

Explanation:
The goal of classification is to determine to which class or category a data point (customer in our case) belongs to. For classification problems, data scientists would use historical data with predefined target variables AKA labels (churner/non-churner) ?answers that need to be predicted ?to train an algorithm.
With classification, businesses can answer the following questions:
Will this customer churn or not?
Will a customer renew their subscription?
Will a user downgrade a pricing plan?
Are there any signs of unusual customer behavior?
https://www.kdnuggets.com/2019/05/churn-prediction-machine-learning.html

 

NEW QUESTION 47
A Machine Learning Specialist is building a model that will perform time series forecasting using Amazon SageMaker. The Specialist has finished training the model and is now planning to perform load testing on the endpoint so they can configure Auto Scaling for the model variant.
Which approach will allow the Specialist to review the latency, memory utilization, and CPU utilization during the load test?

  • A. Build custom Amazon CloudWatch Logs and then leverage Amazon ES and Kibana to query and visualize the log data as it is generated by Amazon SageMaker.
  • B. Send Amazon CloudWatch Logs that were generated by Amazon SageMaker to Amazon ES and use Kibana to query and visualize the log data
  • C. Review SageMaker logs that have been written to Amazon S3 by leveraging Amazon Athena and Amazon QuickSight to visualize logs as they are being produced.
  • D. Generate an Amazon CloudWatch dashboard to create a single view for the latency, memory utilization, and CPU utilization metrics that are outputted by Amazon SageMaker.

Answer: D

Explanation:
https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html

 

NEW QUESTION 48
A Mobile Network Operator is building an analytics platform to analyze and optimize a company's operations using Amazon Athena and Amazon S3.
The source systems send data in .CSV format in real time. The Data Engineering team wants to transform the data to the Apache Parquet format before storing it on Amazon S3.
Which solution takes the LEAST effort to implement?

  • A. Ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Glue to convert data into Parquet.
  • B. Ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Kinesis Data Firehose to convert data into Parquet.
  • C. Ingest .CSV data using Apache Spark Structured Streaming in an Amazon EMR cluster and use Apache Spark to convert data into Parquet.
  • D. Ingest .CSV data using Apache Kafka Streams on Amazon EC2 instances and use Kafka Connect S3 to serialize data as Parquet

Answer: A

 

NEW QUESTION 49
A Machine Learning Specialist built an image classification deep learning model. However the Specialist ran into an overfitting problem in which the training and testing accuracies were 99% and 75%r respectively.
How should the Specialist address this issue and what is the reason behind it?

  • A. The epoch number should be increased because the optimization process was terminated before it reached the global minimum.
  • B. The dropout rate at the flatten layer should be increased because the model is not generalized enough.
  • C. The dimensionality of dense layer next to the flatten layer should be increased because the model is not complex enough.
  • D. The learning rate should be increased because the optimization process was trapped at a local minimum.

Answer: A

 

NEW QUESTION 50
A company is using Amazon Textract to extract textual data from thousands of scanned text-heavy legal documents daily. The company uses this information to process loan applications automatically. Some of the documents fail business validation and are returned to human reviewers, who investigate the errors. This activity increases the time to process the loan applications.
What should the company do to reduce the processing time of loan applications?

  • A. Use an Amazon Textract synchronous operation instead of an asynchronous operation.
  • B. Configure Amazon Textract to route low-confidence predictions to Amazon Augmented AI (Amazon A2I).
    Perform a manual review on those words before performing a business validation.
  • C. Use Amazon Rekognition's feature to detect text in an image to extract the data from scanned images. Use this information to process the loan applications.
  • D. Configure Amazon Textract to route low-confidence predictions to Amazon SageMaker Ground Truth.
    Perform a manual review on those words before performing a business validation.

Answer: B

 

NEW QUESTION 51
A retail company intends to use machine learning to categorize new products. A labeled dataset of current products was provided to the Data Science team. The dataset includes 1,200 products.
The labeled dataset has 15 features for each product such as title dimensions, weight, and price.
Each product is labeled as belonging to one of six categories such as books, games, electronics, and movies.
Which model should be used for categorizing new products using the provided dataset for training?

  • A. A DeepAR forecasting model based on a recurrent neural network (RNN)
  • B. A regression forest where the number of trees is set equal to the number of product categories
  • C. AnXGBoost model where the objective parameter is set to multi:softmax
  • D. A deep convolutional neural network (CNN) with a softmax activation function for the last layer

Answer: C

Explanation:
A XGBoost multi class classification.
https://medium.com/@gabrielziegler3/multiclass-multilabel-classification-with-xgboost-
66195e4d9f2d
CNN is used for image classificaiton problems.

 

NEW QUESTION 52
......

AWS-Certified-Machine-Learning-Specialty Dumps 100 Pass Guarantee With Latest Demo: https://topexamcollection.pdfvce.com/Amazon/AWS-Certified-Machine-Learning-Specialty-exam-pdf-dumps.html