DON'T WANT TO MISS A THING?

Certification Exam Passing Tips

Latest exam news and discount info

Curated and up-to-date by our experts

Yes, send me the newsletter

AWS MLS-C01 Exam Questions for Effective Preparation | AWS Certified Developer - Associate

Preparing for the AWS Certified Machine Learning - Specialty (MLS-C01) exam requires a comprehensive approach. One of the most effective ways to get ready is by utilizing high-quality exam questions and answers, test questions, and mock exams. These resources not only help you assess your knowledge but also provide valuable insights into the exam's format and the types of questions you can expect. Reputable providers like SPOTO offer a wide range of study materials, including exam questions, exam preparation guides, and practice tests. Their resources are carefully crafted by subject matter experts and kept up-to-date with the latest exam objectives. With these study aids, you can identify your strengths and weaknesses, allowing you to focus your efforts on areas that need improvement. Additionally, SPOTO's mock exams simulate the actual exam environment, helping you build confidence and familiarize yourself with the time constraints and question formats. By leveraging these exam resources and practicing diligently, you can increase your chances of passing the AWS Certified Machine Learning - Specialty certification exam successfully on your first attempt.
Take other online exams

Question #1
A company that runs an online library is implementing a chatbot using Amazon Lex to provide book recommendations based on category. This intent is fulfilled by an AWS Lambda function that queries an Amazon DynamoDB table for a list of book titles, given a particular category. For testing, there are only three categories implemented as the custom slot types: "comedy," "adventure,” and "documentary.”A machine learning (ML) specialist notices that sometimes the request cannot be fulfilled because Amazon Lex ca
A. dd the unrecognized words in the enumeration values list as new values in the slot type
B. reate a new custom slot type, add the unrecognized words to this slot type as enumeration values, and use this slot type for the slot
C. se the AMAZON
D. dd the unrecognized words as synonyms in the custom slot type
View answer
Correct Answer: D
Question #2
A medical imaging company wants to train a computer vision model to detect areas of concern on patients' CT scans. The company has a large collection of unlabeled CT scans that are linked to each patient and stored in an Amazon S3 bucket. The scans must be accessible to authorized users only. A machine learning engineer needs to build a labeling pipeline.Which set of steps should the engineer take to build the labeling pipeline with the LEAST effort?
A. onfigure Amazon Textract to route low-confidence predictions to Amazon SageMaker Ground Truth
B. se an Amazon Textract synchronous operation instead of an asynchronous operation
C. onfigure Amazon Textract to route low-confidence predictions to Amazon Augmented AI (Amazon A2I)
D. se Amazon Rekognition's feature to detect text in an image to extract the data from scanned images
View answer
Correct Answer: C
Question #3
A Machine Learning Specialist receives customer data for an online shopping website. The data includes demographics, past visits, and locality information. The Specialist must develop a machine learning approach to identify the customer shopping patterns, preferences, and trends to enhance the website for better service and smart recommendations.Which solution should the Specialist recommend?
A. atent Dirichlet Allocation (LDA) for the given collection of discrete data to identify patterns in the customer database
B. neural network with a minimum of three layers and random initial weights to identify patterns in the customer database
C. ollaborative filtering based on user interactions and correlations to identify patterns in the customer database
D. andom Cut Forest (RCF) over random subsamples to identify patterns in the customer database
View answer
Correct Answer: C
Question #4
A company that promotes healthy sleep patterns by providing cloud-connected devices currently hosts a sleep tracking application on AWS. The application collects device usage information from device users. The company's Data Science team is building a machine learning model to predict if and when a user will stop utilizing the company's devices. Predictions from this model are used by a downstream application that determines the best approach for contacting users.The Data Science team is building multiple v
A. uild and host multiple models in Amazon SageMaker
B. uild and host multiple models in Amazon SageMaker
C. uild and host multiple models in Amazon SageMaker Neo to take into account different types of medical devices
D. uild and host multiple models in Amazon SageMaker
View answer
Correct Answer: D
Question #5
A Machine Learning Specialist needs to move and transform data in preparation for training. Some of the data needs to be processed in near-real time, and other data can be moved hourly. There are existing Amazon EMR MapReduce jobs to clean and feature engineering to perform on the data.Which of the following services can feed data to the MapReduce jobs? (Choose two.)
A. uild the Docker image with the inference code
B. erialize the trained model so the format is compressed for deployment
C. erialize the trained model so the format is compressed for deployment
D. uild the Docker image with the inference code
View answer
Correct Answer: BC
Question #6
A Machine Learning Specialist prepared the following graph displaying the results of k-means for k = [1..10]: Considering the graph, what is a reasonable selection for the optimal choice of k?
A.
B.
C.
D. 0
View answer
Correct Answer: B
Question #7
A company sells thousands of products on a public website and wants to automatically identify products with potential durability problems. The company has 1.000 reviews with date, star rating, review text, review summary, and customer email fields, but many reviews are incomplete and have empty fields. Each review has already been labeled with the correct durability result.A machine learning specialist must train a model to identify reviews expressing concerns over product durability. The first model needs
A. rain a custom classifier by using Amazon Comprehend
B. uild a recurrent neural network (RNN) in Amazon SageMaker by using Gluon and Apache MXNet
C. rain a built-in BlazingText model using Word2Vec mode in Amazon SageMaker
D. se a built-in seq2seq model in Amazon SageMaker
View answer
Correct Answer: B
Question #8
A Machine Learning Specialist is building a logistic regression model that will predict whether or not a person will order a pizza. The Specialist is trying to build the optimal model with an ideal classification threshold.What model evaluation technique should the Specialist use to understand how different classification thresholds will impact the model's performance?
A. eceiver operating characteristic (ROC) curve
B. isclassification rate
C. oot Mean Square Error (RMSE)
D. 1 norm
View answer
Correct Answer: A
Question #9
A Machine Learning Specialist is building a model that will perform time series forecasting using Amazon SageMaker. The Specialist has finished training the model and is now planning to perform load testing on the endpoint so they can configure Auto Scaling for the model variant.Which approach will allow the Specialist to review the latency, memory utilization, and CPU utilization during the load test?
A. eview SageMaker logs that have been written to Amazon S3 by leveraging Amazon Athena and Amazon QuickSight to visualize logs as they are being produced
B. enerate an Amazon CloudWatch dashboard to create a single view for the latency, memory utilization, and CPU utilization metrics that are outputted by Amazon SageMaker
C. uild custom Amazon CloudWatch Logs and then leverage Amazon ES and Kibana to query and visualize the log data as it is generated by Amazon SageMaker
D. end Amazon CloudWatch Logs that were generated by Amazon SageMaker to Amazon ES and use Kibana to query and visualize the log data
View answer
Correct Answer: B
Question #10
A company is observing low accuracy while training on the default built-in image classification algorithm in Amazon SageMaker. The Data Science team wants to use an Inception neural network architecture instead of a ResNet architecture.Which of the following will accomplish this? (Choose two.)
A. he learning rate should be increased because the optimization process was trapped at a local minimum
B. he dropout rate at the flatten layer should be increased because the model is not generalized enough
C. he dimensionality of dense layer next to the flatten layer should be increased because the model is not complex enough
D. he epoch number should be increased because the optimization process was terminated before it reached the global minimum
View answer
Correct Answer: CD
Question #11
A large company has developed a BI application that generates reports and dashboards using data collected from various operational metrics. The company wants to provide executives with an enhanced experience so they can use natural language to get data from the reports. The company wants the executives to be able ask questions using written and spoken interfaces.Which combination of services can be used to build this conversational interface? (Choose three.)
A. pload the model to an Amazon SageMaker notebook instance and use the Amazon SageMaker HPO feature to optimize the model’s hyperparameters
B. dd more data to the training set and retrain the model using transfer learning to reduce the bias
C. se a neural network model with more layers that are pretrained on ImageNet and apply transfer learning to increase the variance
D. rain a new model using the current neural network architecture
View answer
Correct Answer: BEF
Question #12
A Machine Learning team runs its own training algorithm on Amazon SageMaker. The training algorithm requires external assets. The team needs to submit both its own algorithm code and algorithm-specific parameters to Amazon SageMaker.What combination of services should the team use to build a custom algorithm in Amazon SageMaker? (Choose two.)
A. 0
B. 0
C. 00
D. ,400
View answer
Correct Answer: CE
Question #13
A manufacturing company has a large set of labeled historical sales data. The manufacturer would like to predict how many units of a particular part should be produced each quarter.Which machine learning approach should be used to solve this problem?
A. ogistic regression
B. andom Cut Forest (RCF)
C. rincipal component analysis (PCA)
D. inear regression
View answer
Correct Answer: B
Question #14
An office security agency conducted a successful pilot using 100 cameras installed at key locations within the main office. Images from the cameras were uploaded to Amazon S3 and tagged using Amazon Rekognition, and the results were stored in Amazon ES. The agency is now looking to expand the pilot into a full production system using thousands of video cameras in its office locations globally. The goal is to identify activities performed by non-employees in real timeWhich solution should the agency consider
A. se a proxy server at each local office and for each camera, and stream the RTSP feed to a unique Amazon Kinesis Video Streams video stream
B. se a proxy server at each local office and for each camera, and stream the RTSP feed to a unique Amazon Kinesis Video Streams video stream
C. nstall AWS DeepLens cameras and use the DeepLens_Kinesis_Video module to stream video to Amazon Kinesis Video Streams for each camera
D. nstall AWS DeepLens cameras and use the DeepLens_Kinesis_Video module to stream video to Amazon Kinesis Video Streams for each camera
View answer
Correct Answer: A
Question #15
A Machine Learning Specialist wants to determine the appropriate SageMakerVariantInvocationsPerInstance setting for an endpoint automatic scaling configuration. The Specialist has performed a load test on a single instance and determined that peak requests per second (RPS) without service degradation is about 20 RPS. As this is the first deployment, the Specialist intends to set the invocation safety factor to 0.5.Based on the stated parameters and given that the invocations per instance setting is measured
A. nitialize the words by term frequency-inverse document frequency (TF-IDF) vectors pretrained on a large collection of news articles related to the energy sector
B. se gated recurrent units (GRUs) instead of LSTM and run the training process until the validation loss stops decreasing
C. educe the learning rate and run the training process until the training loss stops decreasing
D. nitialize the words by word2vec embeddings pretrained on a large collection of news articles related to the energy sector
View answer
Correct Answer: C
Question #16
A bank wants to launch a low-rate credit promotion. The bank is located in a town that recently experienced economic hardship. Only some of the bank's customers were affected by the crisis, so the bank's credit team must identify which customers to target with the promotion. However, the credit team wants to make sure that loyal customers' full credit history is considered when the decision is made.The bank's data science team developed a model that classifies account transactions and understands credit eli
A. se Amazon SageMaker Studio to rebuild the model
B. se Amazon SageMaker Studio to rebuild the model
C. reate an Amazon SageMaker notebook instance
D. se Amazon SageMaker Studio to rebuild the model
View answer
Correct Answer: C
Question #17
A manufacturer is operating a large number of factories with a complex supply chain relationship where unexpected downtime of a machine can cause production to stop at several factories. A data scientist wants to analyze sensor data from the factories to identify equipment in need of preemptive maintenance and then dispatch a service team to prevent unplanned downtime. The sensor readings from a single machine can include up to 200 data points including temperatures, voltages, vibrations, RPMs, and pressure
A. eploy the model in Amazon SageMaker
B. eploy the model on AWS IoT Greengrass in each factory
C. eploy the model to an Amazon SageMaker batch transformation job
D. eploy the model in Amazon SageMaker and use an IoT rule to write data to an Amazon DynamoDB table
View answer
Correct Answer: B
Question #18
A Machine Learning Specialist is deciding between building a naive Bayesian model or a full Bayesian network for a classification problem. The Specialist computes the Pearson correlation coefficients between each feature and finds that their absolute values range between 0.1 to 0.95.Which model describes the underlying data in this situation?
A. naive Bayesian model, since the features are all conditionally independent
B. full Bayesian network, since the features are all conditionally independent
C. naive Bayesian model, since some of the features are statistically dependent
D. full Bayesian network, since some of the features are statistically dependent
View answer
Correct Answer: D
Question #19
A Data Scientist is building a linear regression model and will use resulting p-values to evaluate the statistical significance of each coefficient. Upon inspection of the dataset, the Data Scientist discovers that most of the features are normally distributed. The plot of one feature in the dataset is shown in the graphic.What transformation should the Data Scientist apply to satisfy the statistical assumptions of the linear regression model?
A. xponential transformation
B. ogarithmic transformation
C. olynomial transformation
D. inusoidal transformation
View answer
Correct Answer: B
Question #20
A Data Engineer needs to build a model using a dataset containing customer credit card information.How can the Data Engineer ensure the data remains encrypted and the credit card information is secure?
A. se a custom encryption algorithm to encrypt the data and store the data on an Amazon SageMaker instance in a VPC
B. se an IAM policy to encrypt the data on the Amazon S3 bucket and Amazon Kinesis to automatically discard credit card numbers and insert fake credit card numbers
C. se an Amazon SageMaker launch configuration to encrypt the data once it is copied to the SageMaker instance in a VP Use the SageMaker principal component analysis (PCA) algorithm to reduce the length of the credit card numbers
D. se AWS KMS to encrypt the data on Amazon S3 and Amazon SageMaker, and redact the credit card numbers from the customer data with AWS Glue
View answer
Correct Answer: D
Question #21
An interactive online dictionary wants to add a widget that displays words used in similar contexts. A Machine Learning Specialist is asked to provide word features for the downstream nearest neighbor model powering the widget.What should the Specialist do to meet these requirements?
A. reate one-hot word encoding vectors
B. roduce a set of synonyms for every word using Amazon Mechanical Turk
C. reate word embedding vectors that store edit distance with every other word
D. ownload word embeddings pre-trained on a large corpus
View answer
Correct Answer: D
Question #22
A Machine Learning Specialist is implementing a full Bayesian network on a dataset that describes public transit in New York City. One of the random variables is discrete, and represents the number of minutes New Yorkers wait for a bus given that the buses cycle every 10 minutes, with a mean of 3 minutes.Which prior probability distribution should the ML Specialist use for this variable?
A. oisson distribution
B. niform distribution
C. ormal distribution
D. inomial distribution
View answer
Correct Answer: A
Question #23
A machine learning specialist is developing a proof of concept for government users whose primary concern is security. The specialist is using Amazon SageMaker to train a convolutional neural network (CNN) model for a photo classifier application. The specialist wants to protect the data so that it cannot be accessed and transferred to a remote host by malicious code accidentally installed on the training container.Which action will provide the MOST secure protection?
A. reate a workforce with AWS Identity and Access Management (IAM)
B. reate an Amazon Mechanical Turk workforce and manifest file
C. reate a private workforce and manifest file
D. reate a workforce with Amazon Cognito
View answer
Correct Answer: D
Question #24
A Machine Learning Specialist needs to be able to ingest streaming data and store it in Apache Parquet files for exploration and analysis.Which of the following services would both ingest and store this data in the correct format?
A. WS DMS
B. mazon Kinesis Data Streams
C. mazon Kinesis Data Firehose
D. mazon Kinesis Data Analytics
View answer
Correct Answer: C
Question #25
A Machine Learning Specialist uploads a dataset to an Amazon S3 bucket protected with server-side encryption using AWS KMS.How should the ML Specialist define the Amazon SageMaker notebook instance so it can read the same dataset from Amazon S3?
A. efine security group(s) to allow all HTTP inbound/outbound traffic and assign those security group(s) to the Amazon SageMaker notebook instance
B. onfigure the Amazon SageMaker notebook instance to have access to the VPC
C. ssign an IAM role to the Amazon SageMaker notebook with S3 read access to the dataset
D. ssign the same KMS key used to encrypt data in Amazon S3 to the Amazon SageMaker notebook instance
View answer
Correct Answer: D
Question #26
A Machine Learning Specialist is configuring Amazon SageMaker so multiple Data Scientists can access notebooks, train models, and deploy endpoints. To ensure the best operational performance, the Specialist needs to be able to track how often the Scientists are deploying models, GPU and CPU utilization on the deployed SageMaker endpoints, and all errors that are generated when an endpoint is invoked.Which services are integrated with Amazon SageMaker to track this information? (Choose two.)
A. equire that the stores to switch to capturing their data locally on AWS Storage Gateway for loading into Amazon S3, then use AWS Glue to do the transformation
B. eploy an Amazon EMR cluster running Apache Spark with the transformation logic, and have the cluster run each day on the accumulating records in Amazon S3, outputting new/transformed records to Amazon S3
C. pin up a fleet of Amazon EC2 instances with the transformation logic, have them transform the data records accumulating on Amazon S3, and output the transformed records to Amazon S3
D. nsert an Amazon Kinesis Data Analytics stream downstream of the Kinesis Data Firehose stream that transforms raw record attributes into simple transformed values using SQL
View answer
Correct Answer: AD
Question #27
A data scientist must build a custom recommendation model in Amazon SageMaker for an online retail company. Due to the nature of the company's products, customers buy only 4-5 products every 5-10 years. So, the company relies on a steady stream of new customers. When a new customer signs up, the company collects data on the customer's preferences. Below is a sample of the data available to the data scientist.How should the data scientist split the dataset into a training and test set for this use case?
A. huffle all interaction data
B. dentify the most recent 10% of interactions for each user
C. dentify the 10% of users with the least interaction data
D. andomly select 10% of the users
View answer
Correct Answer: D
Question #28
A data scientist is training a text classification model by using the Amazon SageMaker built-in BlazingText algorithm. There are 5 classes in the dataset, with 300 samples for category A, 292 samples for category B, 240 samples for category C, 258 samples for category D, and 310 samples for category E.The data scientist shuffiles the data and splits off 10% for testing. After training the model, the data scientist generates confusion matrices for the training and test sets. What could the data scientist con
A. utoregressive Integrated Moving Average (AIRMA)
B. xponential Smoothing (ETS)
C. onvolutional Neural Network - Quantile Regression (CNN-QR)
D. rophet
View answer
Correct Answer: A
Question #29
A Machine Learning Specialist kicks off a hyperparameter tuning job for a tree-based ensemble model using Amazon SageMaker with Area Under the ROC Curve (AUC) as the objective metric. This workflow will eventually be deployed in a pipeline that retrains and tunes hyperparameters each night to model click-through on data that goes stale every 24 hours.With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease costs, the Specialist wants to reconfigure the input
A. histogram showing whether the most important input feature is Gaussian
B. scatter plot with points colored by target variable that uses t-Distributed Stochastic Neighbor Embedding (t-SNE) to visualize the large number of input variables in an easier-to-read dimension
C. scatter plot showing the performance of the objective metric over each training iteration
D. scatter plot showing the correlation between maximum tree depth and the objective metric
View answer
Correct Answer: B
Question #30
A company offers an online shopping service to its customers. The company wants to enhance the site’s security by requesting additional information when customers access the site from locations that are different from their normal location. The company wants to update the process to call a machine learning (ML) model to determine when additional information should be requested.The company has several terabytes of data from its existing ecommerce web servers containing the source IP addresses for each reques
A. se Amazon SageMaker Ground Truth to label each record as either a successful or failed access attempt
B. se Amazon SageMaker to train a model using the IP Insights algorithm
C. se Amazon SageMaker Ground Truth to label each record as either a successful or failed access attempt
D. se Amazon SageMaker to train a model using the Object2Vec algorithm
View answer
Correct Answer: B
Question #31
A Data Scientist wants to gain real-time insights into a data stream of GZIP files.Which solution would allow the use of SQL to query the stream with the LEAST latency?
A. mazon Kinesis Data Analytics with an AWS Lambda function to transform the data
B. WS Glue with a custom ETL script to transform the data
C. n Amazon Kinesis Client Library to transform the data and save it to an Amazon ES cluster
D. mazon Kinesis Data Firehose to transform the data and put it into an Amazon S3 bucket
View answer
Correct Answer: A
Question #32
The chief editor for a product catalog wants the research and development team to build a machine learning system that can be used to detect whether or not individuals in a collection of images are wearing the company's retail brand. The team has a set of training data.Which machine learning algorithm should the researchers use that BEST meets their requirements?
A. atent Dirichlet Allocation (LDA)
B. ecurrent neural network (RNN)
C. -means
D. onvolutional neural network (CNN)
View answer
Correct Answer: D
Question #33
A Machine Learning Specialist working for an online fashion company wants to build a data ingestion solution for the company's Amazon S3-based data lake.The Specialist wants to create a set of ingestion mechanisms that will enable future capabilities comprised of:•Real-time analytics•Interactive analytics of historical data•Clickstream analytics•Product recommendationsWhich services should the Specialist use?
A. WS Glue as the data catalog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for real- time data insights; Amazon Kinesis Data Firehose for delivery to Amazon ES for clickstream analytics; Amazon EMR to generate personalized product recommendations
B. mazon Athena as the data catalog: Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for near-real-time data insights; Amazon Kinesis Data Firehose for clickstream analytics; AWS Glue to generate personalized product recommendations
C. WS Glue as the data catalog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for historical data insights; Amazon Kinesis Data Firehose for delivery to Amazon ES for clickstream analytics; Amazon EMR to generate personalized product recommendations
D. mazon Athena as the data catalog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for historical data insights; Amazon DynamoDB streams for clickstream analytics; AWS Glue to generate personalized product recommendations
View answer
Correct Answer: A
Question #34
A data scientist is developing a pipeline to ingest streaming web traffic data. The data scientist needs to implement a process to identify unusual web traffic patterns as part of the pipeline. The patterns will be used downstream for alerting and incident response. The data scientist has access to unlabeled historic data to use, if needed.The solution needs to do the following:•Calculate an anomaly score for each web traffic entry.•Adapt unusual event identification to changing web patterns over time.Which
A. se historic web traffic data to train an anomaly detection model using the Amazon SageMaker Random Cut Forest (RCF) built-in model
B. se historic web traffic data to train an anomaly detection model using the Amazon SageMaker built-in XGBoost model
C. ollect the streaming data using Amazon Kinesis Data Firehose
D. ollect the streaming data using Amazon Kinesis Data Firehose
View answer
Correct Answer: D
Question #35
A Machine Learning Specialist is building a convolutional neural network (CNN) that will classify 10 types of animals. The Specialist has built a series of layers in a neural network that will take an input image of an animal, pass it through a series of convolutional and pooling layers, and then finally pass it through a dense and fully connected layer with 10 nodes. The Specialist would like to get an output from the neural network that is a probability distribution of how likely it is that the input imag
A. oot Mean Square Error (RMSE)
B. esidual plots
C. rea under the curve
D. onfusion matrix
View answer
Correct Answer: C
Question #36
A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in the figure provided.Based on this information, which model would have the HIGHEST accuracy?
A. ong short-term memory (LSTM) model with scaled exponential linear unit (SELU)
B. ogistic regression
C. upport vector machine (SVM) with non-linear kernel
D. ingle perceptron with tanh activation function
View answer
Correct Answer: C
Question #37
A developer at a retail company is creating a daily demand forecasting model. The company stores the historical hourly demand data in an Amazon S3 bucket. However, the historical data does not include demand data for some hours.The developer wants to verify that an autoregressive integrated moving average (ARIMA) approach will be a suitable model for the use case.How should the developer verify the suitability of an ARIMA approach?
A. se Amazon SageMaker Data Wrangler
B. se Amazon SageMaker Autopilot
C. se Amazon SageMaker Data Wrangler
D. se Amazon SageMaker Autopilot
View answer
Correct Answer: A
Question #38
A Machine Learning Specialist is packaging a custom ResNet model into a Docker container so the company can leverage Amazon SageMaker for training. The Specialist is using Amazon EC2 P3 instances to train the model and needs to properly configure the Docker container to leverage the NVIDIA GPUs.What does the Specialist need to do?
A. undle the NVIDIA drivers with the Docker image
B. uild the Docker container to be NVIDIA-Docker compatible
C. rganize the Docker container's file structure to execute on GPU instances
D. et the GPU flag in the Amazon SageMaker CreateTrainingJob request body
View answer
Correct Answer: B
Question #39
A Marketing Manager at a pet insurance company plans to launch a targeted marketing campaign on social media to acquire new customers. Currently, the company has the following data in Amazon Aurora:•Profiles for all past and existing customers•Profiles for all past and existing insured pets•Policy-level information•Premiums received•Claims paidWhat steps should be taken to implement a machine learning model to identify potential new customers on social media?
A. se regression on customer profile data to understand key characteristics of consumer segments
B. se clustering on customer profile data to understand key characteristics of consumer segments
C. se a recommendation engine on customer profile data to understand key characteristics of consumer segments
D. se a decision tree classifier engine on customer profile data to understand key characteristics of consumer segments
View answer
Correct Answer: C
Question #40
A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena. The dataset contains more than 800,000 records stored as plaintext CSV files. Each record contains 200 columns and is approximately 1.5 MB in size. Most queries will span 5 to 10 columns only.How should the Machine Learning Specialist transform the dataset to minimize query runtime?
A. onvert the records to Apache Parquet format
B. onvert the records to JSON format
C. onvert the records to GZIP CSV format
D. onvert the records to XML format
View answer
Correct Answer: A
Question #41
A real estate company wants to create a machine learning model for predicting housing prices based on a historical dataset. The dataset contains 32 features.Which model will meet the business requirement?
A. ogistic regression
B. inear regression
C. -means
D. rincipal component analysis (PCA)
View answer
Correct Answer: B
Question #42
A health care company is planning to use neural networks to classify their X-ray images into normal and abnormal classes. The labeled data is divided into a training set of 1,000 images and a test set of 200 images. The initial training of a neural network model with 50 hidden layers yielded 99% accuracy on the training set, but only 55% accuracy on the test set.What changes should the Specialist consider to solve this issue? (Choose three.)
A. arly stopping
B. andom initialization of weights with appropriate seed
C. ncreasing the number of epochs
D. dding another layer with the 100 neurons
View answer
Correct Answer: BDF
Question #43
Which of the following metrics should a Machine Learning Specialist generally use to compare/evaluate machine learning classification models against each other?
A. ecall
B. isclassification rate
C. ean absolute percentage error (MAPE)
D. rea Under the ROC Curve (AUC)
View answer
Correct Answer: D
Question #44
A large consumer goods manufacturer has the following products on sale:•34 different toothpaste variants•48 different toothbrush variants•43 different mouthwash variantsThe entire sales history of all these products is available in Amazon S3. Currently, the company is using custom-built autoregressive integrated moving average (ARIMA) models to forecast demand for these products. The company wants to predict the demand for a new product that will soon be launched.Which solution should a Machine Learning Spe
A. rain a custom ARIMA model to forecast demand for the new product
B. rain an Amazon SageMaker DeepAR algorithm to forecast demand for the new product
C. rain an Amazon SageMaker k-means clustering algorithm to forecast demand for the new product
D. rain a custom XGBoost model to forecast demand for the new product
View answer
Correct Answer: B
Question #45
A gaming company has launched an online game where people can start playing for free, but they need to pay if they choose to use certain features. The company needs to build an automated system to predict whether or not a new user will become a paid user within 1 year. The company has gathered a labeled dataset from 1 million users.The training dataset consists of 1,000 positive samples (from users who ended up paying within 1 year) and 999,000 negative samples (from users who did not use any paid features)
A. rop all records from the dataset where age has been set to 0
B. eplace the age field value for records with a value of 0 with the mean or median value from the dataset
C. rop the age feature from the dataset and train the model using the rest of the features
D. se k-means clustering to handle missing features
View answer
Correct Answer: CD
Question #46
A company will use Amazon SageMaker to train and host a machine learning (ML) model for a marketing campaign. The majority of data is sensitive customer data. The data must be encrypted at rest. The company wants AWS to maintain the root of trust for the master keys and wants encryption key usage to be logged.Which implementation will meet these requirements?
A. se encryption keys that are stored in AWS Cloud HSM to encrypt the ML data volumes, and to encrypt the model artifacts and data in Amazon S3
B. se SageMaker built-in transient keys to encrypt the ML data volumes
C. se customer managed keys in AWS Key Management Service (AWS KMS) to encrypt the ML data volumes, and to encrypt the model artifacts and data in Amazon S3
D. se AWS Security Token Service (AWS STS) to create temporary tokens to encrypt the ML storage volumes, and to encrypt the model artifacts and data in Amazon S3
View answer
Correct Answer: C
Question #47
A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a machine learning specialist will build a binary classifier based on two features: age of account, denoted by x, and transaction month, denoted by y. The class distributions are illustrated in the provided figure. The positive class is portrayed in red, while the negative class is portrayed in black.Which model would have the HIGHEST accuracy?
A. inear support vector machine (SVM)
B. ecision tree
C. upport vector machine (SVM) with a radial basis function kernel
D. ingle perceptron with a Tanh activation function
View answer
Correct Answer: C
Question #48
A data science team is planning to build a natural language processing (NLP) application. The application’s text preprocessing stage will include part-of-speech tagging and key phase extraction. The preprocessed text will be input to a custom classification algorithm that the data science team has already written and trained using Apache MXNet.Which solution can the team build MOST quickly to meet these requirements?
A. se Amazon Comprehend for the part-of-speech tagging, key phase extraction, and classification tasks
B. se an NLP library in Amazon SageMaker for the part-of-speech tagging
C. se Amazon Comprehend for the part-of-speech tagging and key phase extraction tasks
D. se Amazon Comprehend for the part-of-speech tagging and key phase extraction tasks
View answer
Correct Answer: B
Question #49
An online reseller has a large, multi-column dataset with one column missing 30% of its data. A Machine Learning Specialist believes that certain columns in the dataset could be used to reconstruct the missing data.Which reconstruction approach should the Specialist use to preserve the integrity of the dataset?
A. istwise deletion
B. ast observation carried forward
C. ultiple imputation
D. ean substitution
View answer
Correct Answer: C
Question #50
A company is launching a new product and needs to build a mechanism to monitor comments about the company and its new product on social media. The company needs to be able to evaluate the sentiment expressed in social media posts, and visualize trends and configure alarms based on various thresholds.The company needs to implement this solution quickly, and wants to minimize the infrastructure and data science resources needed to evaluate the messages. The company already has a solution in place to collect p
A. rain a model in Amazon SageMaker by using the BlazingText algorithm to detect sentiment in the corpus of social media posts
B. rain a model in Amazon SageMaker by using the semantic segmentation algorithm to model the semantic content in the corpus of social media posts
C. rigger an AWS Lambda function when social media posts are added to the S3 bucket
D. rigger an AWS Lambda function when social media posts are added to the S3 bucket
View answer
Correct Answer: D
Question #51
A Data Scientist is developing a machine learning model to predict future patient outcomes based on information collected about each patient and their treatment plans. The model should output a continuous value as its prediction. The data available includes labeled outcomes for a set of 4,000 patients. The study was conducted on a group of individuals over the age of 65 who have a particular disease that is known to worsen with age.Initial models have performed poorly. While reviewing the underlying data, t
A. tore datasets as files in Amazon S3
B. tore datasets as files in an Amazon EBS volume attached to an Amazon EC2 instance
C. tore datasets as tables in a multi-node Amazon Redshift cluster
D. tore datasets as global tables in Amazon DynamoDB
View answer
Correct Answer: D
Question #52
A company is running a machine learning prediction service that generates 100 TB of predictions every day. A Machine Learning Specialist must generate a visualization of the daily precision-recall curve from the predictions, and forward a read-only version to the Business team.Which solution requires the LEAST coding effort?
A. un a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3
B. enerate daily precision-recall data in Amazon QuickSight, and publish the results in a dashboard shared with the Business team
C. un a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3
D. enerate daily precision-recall data in Amazon ES, and publish the results in a dashboard shared with the Business team
View answer
Correct Answer: C
Question #53
A Machine Learning Specialist is planning to create a long-running Amazon EMR cluster. The EMR cluster will have 1 master node, 10 core nodes, and 20 task nodes. To save on costs, the Specialist will use Spot Instances in the EMR cluster.Which nodes should the Specialist launch on Spot Instances?
A. aster node
B. ny of the core nodes
C. ny of the task nodes
D. oth core and task nodes
View answer
Correct Answer: C
Question #54
A data scientist has explored and sanitized a dataset in preparation for the modeling phase of a supervised learning task. The statistical dispersion can vary widely between features, sometimes by several orders of magnitude. Before moving on to the modeling phase, the data scientist wants to ensure that the prediction performance on the production data is as accurate as possible.Which sequence of steps should the data scientist take to meet these requirements?
A. pply random sampling to the dataset
B. plit the dataset into training, validation, and test sets
C. escale the dataset
D. plit the dataset into training, validation, and test sets
View answer
Correct Answer: D
Question #55
A manufacturing company uses machine learning (ML) models to detect quality issues. The models use images that are taken of the company's product at the end of each production step. The company has thousands of machines at the production site that generate one image per second on average.The company ran a successful pilot with a single manufacturing machine. For the pilot, ML specialists used an industrial PC that ran AWS IoT Greengrass with a long-running AWS Lambda function that uploaded the images to Ama
A. et up a 10 Gbps AWS Direct Connect connection between the production site and the nearest AWS Region
B. xtend the long-running Lambda function that runs on AWS IoT Greengrass to compress the images and upload the compressed files to Amazon S3
C. se auto scaling for SageMaker
D. eploy the Lambda function and the ML models onto the AWS IoT Greengrass core that is running on the industrial PCs that are installed on each machine
View answer
Correct Answer: D
Question #56
A company is setting up a mechanism for data scientists and engineers from different departments to access an Amazon SageMaker Studio domain. Each department has a unique SageMaker Studio domain.The company wants to build a central proxy application that data scientists and engineers can log in to by using their corporate credentials. The proxy application will authenticate users by using the company's existing Identity provider (IdP). The application will then route users to the appropriate SageMaker Studi
A. se the SageMaker CreatePresignedDomainUrl API to generate a presigned URL for each domain according to the DynamoDB table
B. se the SageMaker CreateHuman TaskUi API to generate a UI URL
C. se the Amazon SageMaker ListHumanTaskUis API to list all UI URLs
D. se the SageMaker CreatePresignedNotebookInstanceUrl API to generate a presigned URL
View answer
Correct Answer: A
Question #57
A manufacturing company asks its machine learning specialist to develop a model that classifies defective parts into one of eight defect types. The company has provided roughly 100,000 images per defect type for training. During the initial training of the image classification model, the specialist notices that the validation accuracy is 80%, while the training accuracy is 90%. It is known that human-level performance for this type of image classification is around 90%.What should the specialist consider to
A. longer training time
B. aking the network larger
C. sing a different optimizer
D. sing some form of regularization
View answer
Correct Answer: D
Question #58
A company has set up and deployed its machine learning (ML) model into production with an endpoint using Amazon SageMaker hosting services. The ML team has configured automatic scaling for its SageMaker instances to support workload changes. During testing, the team notices that additional instances are being launched before the new instances are ready. This behavior needs to change as soon as possible.How can the ML team solve this issue?
A. ecrease the cooldown period for the scale-in activity
B. eplace the current endpoint with a multi-model endpoint using SageMaker
C. et up Amazon API Gateway and AWS Lambda to trigger the SageMaker inference endpoint
D. ncrease the cooldown period for the scale-out activity
View answer
Correct Answer: D
Question #59
A Machine Learning Specialist is using an Amazon SageMaker notebook instance in a private subnet of a corporate VPC. The ML Specialist has important data stored on the Amazon SageMaker notebook instance's Amazon EBS volume, and needs to take a snapshot of that EBS volume. However, the ML Specialist cannot find the Amazon SageMaker notebook instance’s EBS volume or Amazon EC2 instance within the VPC.Why is the ML Specialist not seeing the instance visible in the VPC?
A. mazon SageMaker notebook instances are based on the EC2 instances within the customer account, but they run outside of VPCs
B. mazon SageMaker notebook instances are based on the Amazon ECS service within customer accounts
C. mazon SageMaker notebook instances are based on EC2 instances running within AWS service accounts
D. mazon SageMaker notebook instances are based on AWS ECS instances running within AWS service accounts
View answer
Correct Answer: C
Question #60
A data scientist wants to use Amazon Forecast to build a forecasting model for inventory demand for a retail company. The company has provided a dataset of historic inventory demand for its products as a .csv file stored in an Amazon S3 bucket. The table below shows a sample of the dataset.How should the data scientist transform the data?
A. edeploy the model as a batch transform job on an M5 instance
B. edeploy the model on an M5 instance
C. edeploy the model on a P3dn instance
D. eploy the model onto an Amazon Elastic Container Service (Amazon ECS) cluster using a P3 instance
View answer
Correct Answer: A
Question #61
A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in the figure provided.Based on this information, which model would have the HIGHEST recall with respect to the fraudulent class?
A. ecision tree
B. inear support vector machine (SVM)
C. aive Bayesian classifier
D. ingle Perceptron with sigmoidal activation function
View answer
Correct Answer: C
Question #62
An insurance company is developing a new device for vehicles that uses a camera to observe drivers’ behavior and alert them when they appear distracted. The company created approximately 10,000 training images in a controlled environment that a Machine Learning Specialist will use to train and evaluate machine learning models.During the model evaluation, the Specialist notices that the training error rate diminishes faster as the number of epochs increases and the model is not accurately inferring on the un
A. SV files
B. arquet files
C. ompressed JSON
D. ecordIO
View answer
Correct Answer: BE
Question #63
A Machine Learning Specialist is assigned to a Fraud Detection team and must tune an XGBoost model, which is working appropriately for test data. However, with unknown data, it is not working as expected. The existing parameters are provided as follows. Which parameter tuning guidelines should the Specialist follow to avoid overfitting?
A. ncrease the max_depth parameter value
B. ower the max_depth parameter value
C. pdate the objective to binary:logistic
D. ower the min_child_weight parameter value
View answer
Correct Answer: B
Question #64
A Data Scientist received a set of insurance records, each consisting of a record ID, the final outcome among 200 categories, and the date of the final outcome. Some partial information on claim contents is also provided, but only for a few of the 200 categories. For each outcome category, there are hundreds of records distributed over the past 3 years. The Data Scientist wants to predict how many claims to expect in each category from month to month, a few months in advance.What type of machine learning mo
A. lassification month-to-month using supervised learning of the 200 categories based on claim contents
B. einforcement learning using claim IDs and timestamps where the agent will identify how many claims in each category to expect from month to month
C. orecasting using claim IDs and timestamps to identify how many claims in each category to expect from month to month
D. lassification with supervised learning of the categories for which partial information on claim contents is provided, and forecasting using claim IDs and timestamps for all other categories
View answer
Correct Answer: C
Question #65
A Data Scientist is developing a machine learning model to classify whether a financial transaction is fraudulent. The labeled data available for training consists of 100,000 non-fraudulent observations and 1,000 fraudulent observations.The Data Scientist applies the XGBoost algorithm to the data, resulting in the following confusion matrix when the trained model is applied to a previously unseen validation dataset. The accuracy of the model is 99.1%, but the Data Scientist needs to reduce the number of fal
A. hange preprocessing to use n-grams
B. dd more nodes to the recurrent neural network (RNN) than the largest sentence's word count
C. djust hyperparameters related to the attention mechanism
D. hoose a different weight initialization type
View answer
Correct Answer: DE
Question #66
A Data Scientist needs to analyze employment data. The dataset contains approximately 10 million observations on people across 10 different features. During the preliminary analysis, the Data Scientist notices that income and age distributions are not normal. While income levels shows a right skew as expected, with fewer individuals having a higher income, the age distribution also show a right skew, with fewer older individuals participating in the workforce.Which feature transformations can the Data Scien
A. ncrease the randomization of training data in the mini-batches used in training
B. llocate a higher proportion of the overall data to the training dataset
C. pply L1 or L2 regularization and dropouts to the training
D. educe the number of layers and units (or neurons) from the deep learning network
View answer
Correct Answer: BD
Question #67
A Machine Learning Specialist is training a model to identify the make and model of vehicles in images. The Specialist wants to use transfer learning and an existing model trained on images of general objects. The Specialist collated a large custom dataset of pictures containing different vehicle makes and models.What should the Specialist do to initialize the model to re-train it with the custom data?
A. nitialize the model with random weights in all layers including the last fully connected layer
B. nitialize the model with pre-trained weights in all layers and replace the last fully connected layer
C. nitialize the model with random weights in all layers and replace the last fully connected layer
D. nitialize the model with pre-trained weights in all layers including the last fully connected layer
View answer
Correct Answer: B
Question #68
A company ingests machine learning (ML) data from web advertising clicks into an Amazon S3 data lake. Click data is added to an Amazon Kinesis data stream by using the Kinesis Producer Library (KPL). The data is loaded into the S3 data lake from the data stream by using an Amazon Kinesis Data Firehose delivery stream. As the data volume increases, an ML specialist notices that the rate of data ingested into Amazon S3 is relatively constant. There also is an increasing backlog of data for Kinesis Data Stream
A. ncrease the number of S3 prefixes for the delivery stream to write to
B. ecrease the retention period for the data stream
C. ncrease the number of shards for the data stream
D. dd more consumers using the Kinesis Client Library (KCL)
View answer
Correct Answer: C
Question #69
A law firm handles thousands of contracts every day. Every contract must be signed. Currently, a lawyer manually checks all contracts for signatures.The law firm is developing a machine learning (ML) solution to automate signature detection for each contract. The ML solution must also provide a confidence score for each contract page.Which Amazon Textract API action can the law firm use to generate a confidence score for each page of each contract?
A. se the AnalyzeDocument API action
B. se the Prediction API call on the documents
C. se the StartDocumentAnalysis API action to detect the signatures
D. se the GetDocumentAnalysis API action to detect the signatures
View answer
Correct Answer: A
Question #70
A Machine Learning Specialist is designing a scalable data storage solution for Amazon SageMaker. There is an existing TensorFlow-based model implemented as a train.py script that relies on static training data that is currently stored as TFRecords.Which method of providing training data to Amazon SageMaker would meet the business requirements with the LEAST development overhead?
A. se Amazon SageMaker script mode and use train
B. se Amazon SageMaker script mode and use train
C. ewrite the train
D. repare the data in the format accepted by Amazon SageMaker
View answer
Correct Answer: B
Question #71
This graph shows the training and validation loss against the epochs for a neural network.The network being trained is as follows:•Two dense layers, one output neuron•100 neurons in each layer•100 epochs•Random initialization of weightsWhich technique can be used to improve model performance in terms of accuracy in the validation set?
A. inear regression is inappropriate
B. inear regression is inappropriate
C. inear regression is appropriate
D. inear regression is appropriate
View answer
Correct Answer: A
Question #72
A Machine Learning Specialist is working for an online retailer that wants to run analytics on every customer visit, processed through a machine learning pipeline. The data needs to be ingested by Amazon Kinesis Data Streams at up to 100 transactions per second, and the JSON data blob is 100 KB in size.What is the MINIMUM number of shards in Kinesis Data Streams the Specialist should use to successfully ingest this data?
A. shards
B. 0 shards
C. 00 shards
D. ,000 shards
View answer
Correct Answer: B
Question #73
A Mobile Network Operator is building an analytics platform to analyze and optimize a company's operations using Amazon Athena and Amazon S3.The source systems send data in .CSV format in real time. The Data Engineering team wants to transform the data to the Apache Parquet format before storing it on Amazon S3.Which solution takes the LEAST effort to implement?
A. ngest
B. ngest
C. ngest
D. ngest
View answer
Correct Answer: B
Question #74
A library is developing an automatic book-borrowing system that uses Amazon Rekognition. Images of library members’ faces are stored in an Amazon S3 bucket. When members borrow books, the Amazon Rekognition CompareFaces API operation compares real faces against the stored faces in Amazon S3.The library needs to improve security by making sure that images are encrypted at rest. Also, when the images are used with Amazon Rekognition. they need to be encrypted in transit. The library also must ensure that the
A. nable server-side encryption on the S3 bucket
B. witch to using an Amazon Rekognition collection to store the images
C. witch to using the AWS GovCloud (US) Region for Amazon S3 to store images and for Amazon Rekognition to compare faces
D. nable client-side encryption on the S3 bucket
View answer
Correct Answer: A
Question #75
A logistics company needs a forecast model to predict next month's inventory requirements for a single item in 10 warehouses. A machine learning specialist uses Amazon Forecast to develop a forecast model from 3 years of monthly data. There is no missing data. The specialist selects the DeepAR+ algorithm to train a predictor.The predictor means absolute percentage error (MAPE) is much larger than the MAPE produced by the current human forecasters.Which changes to the CreatePredictor API call could improve t
A. se ETL jobs in AWS Glue to separate the dataset into a target time series dataset and an item metadata dataset
B. se a Jupyter notebook in Amazon SageMaker to separate the dataset into a related time series dataset and an item metadata dataset
C. se AWS Batch jobs to separate the dataset into a target time series dataset, a related time series dataset, and an item metadata dataset
D. se a Jupyter notebook in Amazon SageMaker to transform the data into the optimized protobuf recordIO format
View answer
Correct Answer: AD
Question #76
A company is setting up an Amazon SageMaker environment. The corporate data security policy does not allow communication over the internet.How can the company enable the Amazon SageMaker service without enabling direct internet access to Amazon SageMaker notebook instances?
A. reate a NAT gateway within the corporate VPC
B. oute Amazon SageMaker traffic through an on-premises network
C. reate Amazon SageMaker VPC interface endpoints within the corporate VP
D. reate VPC peering with Amazon VPC hosting Amazon SageMaker
View answer
Correct Answer: C

View Answers after Submission

Please submit your email and WhatsApp to get the answers of questions.

Note: Please make sure your email ID and Whatsapp are valid so that you can get the correct exam results.

Email:
Whatsapp/phone number: