Pass the AWS Exam Easily with Updated MLS-C01 Practice Questions

Question #1

A manufacturer of car engines collects data from cars as they are being driven. The data collected includes timestamp, engine temperature, rotations per minute (RPM), and other sensor readings. The company wants to predict when an engine is going to have a problem, so it can notify drivers in advance to get engine maintenance. The engine data is loaded into a data lake for training.Which is the MOST suitable predictive model that can be deployed into production?

A. dd labels over time to indicate which engine faults occur at what time in the future to turn this into a supervised learning problem

B. his data requires an unsupervised learning algorithm

C. dd labels over time to indicate which engine faults occur at what time in the future to turn this into a supervised learning problem

D. his data is already formulated as a time series

View answer

Correct Answer: A

Question #2

A data scientist needs to identify fraudulent user accounts for a company's ecommerce platform. The company wants the ability to determine if a newly created account is associated with a previously known fraudulent user. The data scientist is using AWS Glue to cleanse the company's application logs during ingestion.Which strategy will allow the data scientist to identify fraudulent accounts?

A. xecute the built-in FindDuplicates Amazon Athena query

B. reate a FindMatches machine learning transform in AWS Glue

C. reate an AWS Glue crawler to infer duplicate accounts in the source data

D. earch for duplicate accounts in the AWS Glue Data Catalog

View answer

Correct Answer: B

Question #3

A retail company intends to use machine learning to categorize new products. A labeled dataset of current products was provided to the Data Science team. The dataset includes 1,200 products. The labeled dataset has 15 features for each product such as title dimensions, weight, and price. Each product is labeled as belonging to one of six categories such as books, games, electronics, and movies.Which model should be used for categorizing new products using the provided dataset for training?

A. nXGBoost model where the objective parameter is set to multi:softmax

B. deep convolutional neural network (CNN) with a softmax activation function for the last layer

C. regression forest where the number of trees is set equal to the number of product categories

D. DeepAR forecasting model based on a recurrent neural network (RNN)

View answer

Correct Answer: A

Question #4

A data scientist uses an Amazon SageMaker notebook instance to conduct data exploration and analysis. This requires certain Python packages that are not natively available on Amazon SageMaker to be installed on the notebook instance.How can a machine learning specialist ensure that required packages are automatically available on the notebook instance for the data scientist to use?

A. nstall AWS Systems Manager Agent on the underlying Amazon EC2 instance and use Systems Manager Automation to execute the package installation commands

B. reate a Jupyter notebook file (

C. se the conda package manager from within the Jupyter notebook console to apply the necessary conda packages to the default kernel of the notebook

D. reate an Amazon SageMaker lifecycle configuration with package installation commands and assign the lifecycle configuration to the notebook instance

View answer

Correct Answer: D

Question #5

A machine learning specialist works for a fruit processing company and needs to build a system that categorizes apples into three types. The specialist has collected a dataset that contains 150 images for each type of apple and applied transfer learning on a neural network that was pretrained on ImageNet with this dataset.The company requires at least 85% accuracy to make use of the model.After an exhaustive grid search, the optimal hyperparameters produced the following:•68% accuracy on the training set•67

A. onvert the images to grayscale and retrain the model

B. educe the number of distinct items from 10 to 2, build the model, and iterate

C. ttach different colored labels to each item, take the images again, and build the model

D. ugment training data for each item using image variants like inversions and translations, build the model, and iterate

View answer

Correct Answer: B

Question #6

A Machine Learning Specialist has completed a proof of concept for a company using a small data sample, and now the Specialist is ready to implement an end-to-end solution in AWS using Amazon SageMaker. The historical training data is stored in Amazon RDS.Which approach should the Specialist use for training a model using that data?

A. rite a direct connection to the SQL database within the notebook and pull data in

B. ush the data from Microsoft SQL Server to Amazon S3 using an AWS Data Pipeline and provide the S3 location within the notebook

C. ove the data to Amazon DynamoDB and set up a connection to DynamoDB within the notebook to pull data in

D. ove the data to Amazon ElastiCache using AWS DMS and set up a connection within the notebook to pull data in for fast access

View answer

Correct Answer: B

Question #7

A company is building a predictive maintenance model based on machine learning (ML). The data is stored in a fully private Amazon S3 bucket that is encrypted at rest with AWS Key Management Service (AWS KMS) CMKs. An ML specialist must run data preprocessing by using an Amazon SageMaker Processing job that is triggered from code in an Amazon SageMaker notebook. The job should read data from Amazon S3, process it, and upload it back to the same S3 bucket. The preprocessing code is stored in a container image

A. reate an IAM role that has permissions to create Amazon SageMaker Processing jobs, S3 read and write access to the relevant S3 bucket, and appropriate KMS and ECR permissions

B. reate an IAM role that has permissions to create Amazon SageMaker Processing jobs

C. reate an IAM role that has permissions to create Amazon SageMaker Processing jobs and to access Amazon ECR

D. reate an IAM role that has permissions to create Amazon SageMaker Processing jobs

View answer

Correct Answer: D

Question #8

A Machine Learning Specialist is working with a large company to leverage machine learning within its products. The company wants to group its customers into categories based on which customers will and will not churn within the next 6 months. The company has labeled the data available to the Specialist.Which machine learning model type should the Specialist use to accomplish this task?

A. inear regression

B. lassification

C. lustering

D. einforcement learning

View answer

Correct Answer: B

Question #9

A machine learning (ML) specialist is administering a production Amazon SageMaker endpoint with model monitoring configured. Amazon SageMaker Model Monitor detects violations on the SageMaker endpoint, so the ML specialist retrains the model with the latest dataset. This dataset is statistically representative of the current production traffic. The ML specialist notices that even after deploying the new SageMaker model and running the first monitoring job, the SageMaker endpoint still has violations.What sh

A. anually trigger the monitoring job to re-evaluate the SageMaker endpoint traffic sample

B. un the Model Monitor baseline job again on the new training set

C. elete the endpoint and recreate it with the original configuration

D. etrain the model again by using a combination of the original training set and the new training set

View answer

Correct Answer: B

Question #10

A company uses camera images of the tops of items displayed on store shelves to determine which items were removed and which ones still remain. After several hours of data labeling, the company has a total of 1,000 hand-labeled images covering 10 distinct items. The training results were poor.Which machine learning approach fulfills the company’s long-term needs?

A. k-fold cross-validation strategy with k=5

B. stratified k-fold cross-validation strategy with k=5

C. k-fold cross-validation strategy with k=5 and 3 repeats

D. n 80/20 stratified split between training and validation

View answer

Correct Answer: D

Question #11

A Machine Learning Specialist is required to build a supervised image-recognition model to identify a cat. The ML Specialist performs some tests and records the following results for a neural network-based image classifier:Total number of images available = 1,000Test set images = 100 (constant test set)The ML Specialist notices that, in over 75% of the misclassified images, the cats were held upside down by their owners.Which techniques can be used by the ML Specialist to improve this specific test error?

A. ncrease the training data by adding variation in rotation for training images

B. ncrease the number of epochs for model training

C. ncrease the number of layers for the neural network

D. ncrease the dropout rate for the second-to-last layer

View answer

Correct Answer: A

Question #12

When submitting Amazon SageMaker training jobs using one of the built-in algorithms, which common parameters MUST be specified? (Choose three.)

A. inarization

B. ne-hot encoding

C. okenization

D. ormalization transformation

View answer

Correct Answer: AEF

Question #13

A Machine Learning Specialist is designing a system for improving sales for a company. The objective is to use the large amount of information the company has on users’ behavior and product preferences to predict which products users would like based on the users’ similarity to other users.What should the Specialist do to meet this objective?

A. uild a content-based filtering recommendation engine with Apache Spark ML on Amazon EMR

B. uild a collaborative filtering recommendation engine with Apache Spark ML on Amazon EMR

C. uild a model-based filtering recommendation engine with Apache Spark ML on Amazon EMR

D. uild a combinative filtering recommendation engine with Apache Spark ML on Amazon EMR

View answer

Correct Answer: B

Question #14

A company is converting a large number of unstructured paper receipts into images. The company wants to create a model based on natural language processing (NLP) to find relevant entities such as date, location, and notes, as well as some custom entities such as receipt numbers.The company is using optical character recognition (OCR) to extract text for data labeling. However, documents are in different structures and formats, and the company is facing challenges with setting up the manual workflows for eac

A. xtract text from receipt images by using Amazon Textract

B. xtract text from receipt images by using a deep learning OCR model from the AWS Marketplace

C. xtract text from receipt images by using Amazon Textract

D. xtract text from receipt images by using a deep learning OCR model from the AWS Marketplace

View answer

Correct Answer: C

Question #15

A Machine Learning Specialist is applying a linear least squares regression model to a dataset with 1,000 records and 50 features. Prior to training, the ML Specialist notices that two features are perfectly linearly dependent.Why could this be an issue for the linear least squares regression model?

A. t could cause the backpropagation algorithm to fail during training

B. t could create a singular matrix during optimization, which fails to define a unique solution

C. t could modify the loss function during optimization, causing it to fail during training

D. t could introduce non-linear dependencies within the data, which could invalidate the linear assumptions of the model

View answer

Correct Answer: C

Question #16

An e commerce company wants to launch a new cloud-based product recommendation feature for its web application. Due to data localization regulations, any sensitive data must not leave its on-premises data center, and the product recommendation model must be trained and tested using nonsensitive data only. Data transfer to the cloud must use IPsec. The web application is hosted on premises with a PostgreSQL database that contains all the data. The company wants the data to be uploaded securely to Amazon S3 e

A. reate an AWS Glue job to connect to the PostgreSQL DB instance

B. reate an AWS Glue job to connect to the PostgreSQL DB instance

C. se AWS Database Migration Service (AWS DMS) with table mapping to select PostgreSQL tables with no sensitive data through an SSL connection

D. se PostgreSQL logical replication to replicate all data to PostgreSQL in Amazon EC2 through AWS Direct Connect with a VPN connection

View answer

Correct Answer: C

Question #17

A manufacturing company has structured and unstructured data stored in an Amazon S3 bucket. A Machine Learning Specialist wants to use SQL to run queries on this data.Which solution requires the LEAST effort to be able to query this data?

A. se AWS Data Pipeline to transform the data and Amazon RDS to run queries

B. se AWS Glue to catalogue the data and Amazon Athena to run queries

C. se AWS Batch to run ETL on the data and Amazon Aurora to run the queries

D. se AWS Lambda to transform the data and Amazon Kinesis Data Analytics to run queries

View answer

Correct Answer: B

Question #18

A Machine Learning Specialist at a company sensitive to security is preparing a dataset for model training. The dataset is stored in Amazon S3 and contains Personally Identifiable Information (PII).The dataset:•Must be accessible from a VPC only.•Must not traverse the public internet.How can these requirements be satisfied?

A. reate a VPC endpoint and apply a bucket access policy that restricts access to the given VPC endpoint and the VPC

B. reate a VPC endpoint and apply a bucket access policy that allows access from the given VPC endpoint and an Amazon EC2 instance

C. reate a VPC endpoint and use Network Access Control Lists (NACLs) to allow traffic between only the given VPC endpoint and an Amazon EC2 instance

D. reate a VPC endpoint and use security groups to restrict access to the given VPC endpoint and an Amazon EC2 instance

View answer

Correct Answer: A

Question #19

A Machine Learning Specialist is preparing data for training on Amazon SageMaker. The Specialist is using one of the SageMaker built-in algorithms for the training. The dataset is stored in .CSV format and is transformed into a numpy.array, which appears to be negatively affecting the speed of the training.What should the Specialist do to optimize the data for training on SageMaker?

A. se the SageMaker batch transform feature to transform the training data into a DataFrame

B. se AWS Glue to compress the data into the Apache Parquet format

C. ransform the dataset into the RecordIO protobuf format

D. se the SageMaker hyperparameter optimization feature to automatically optimize the data

View answer

Correct Answer: C

Question #20

A data scientist is using an Amazon SageMaker notebook instance and needs to securely access data stored in a specific Amazon S3 bucket.How should the data scientist accomplish this?

A. dd an S3 bucket policy allowing GetObject, PutObject, and ListBucket permissions to the Amazon SageMaker notebook ARN as principal

B. ncrypt the objects in the S3 bucket with a custom AWS Key Management Service (AWS KMS) key that only the notebook owner has access to

C. ttach the policy to the IAM role associated with the notebook that allows GetObject, PutObject, and ListBucket operations to the specific S3 bucket

D. se a script in a lifecycle configuration to configure the AWS CLI on the instance with an access key ID and secret

View answer

Correct Answer: C

Question #21

A machine learning specialist stores IoT soil sensor data in Amazon DynamoDB table and stores weather event data as JSON files in Amazon S3. The dataset in DynamoDB is 10 GB in size and the dataset in Amazon S3 is 5 GB in size. The specialist wants to train a model on this data to help predict soil moisture levels as a function of weather events using Amazon SageMaker.Which solution will accomplish the necessary transformation to train the Amazon SageMaker model with the LEAST amount of administrative overh

A. aunch an Amazon EMR cluster

B. rawl the data using AWS Glue crawlers

C. nable Amazon DynamoDB Streams on the sensor table

D. rawl the data using AWS Glue crawlers

View answer

Correct Answer: D

Question #22

A machine learning specialist needs to analyze comments on a news website with users across the globe. The specialist must find the most discussed topics in the comments that are in either English or Spanish.What steps could be used to accomplish this task? (Choose two.)

A. se an Amazon SageMaker BlazingText algorithm to find the topics independently from language

B. se an Amazon SageMaker seq2seq algorithm to translate from Spanish to English, if necessary

C. se Amazon Translate to translate from Spanish to English, if necessary

D. se Amazon Translate to translate from Spanish to English, if necessary

E. se Amazon Translate to translate from Spanish to English, if necessary

View answer

Correct Answer: B

Question #23

A Data Scientist is working on an application that performs sentiment analysis. The validation accuracy is poor, and the Data Scientist thinks that the cause may be a rich vocabulary and a low average frequency of words in the dataset.Which tool should be used to improve the validation accuracy?

A. mazon Comprehend syntax analysis and entity detection

B. mazon SageMaker BlazingText cbow mode

C. atural Language Toolkit (NLTK) stemming and stop word removal

D. cikit-leam term frequency-inverse document frequency (TF-IDF) vectorizer

View answer

Correct Answer: D

Question #24

A retail chain has been ingesting purchasing records from its network of 20,000 stores to Amazon S3 using Amazon Kinesis Data Firehose. To support training an improved machine learning model, training records will require new but simple transformations, and some attributes will be combined. The model needs to be retrained daily.Given the large number of stores and the legacy data ingestion, which change will require the LEAST amount of development effort?

A. ropout

B. mooth L1 loss

C. oftmax

D. ectified linear units (ReLU)

View answer

Correct Answer: D

Question #25

A Machine Learning Specialist wants to bring a custom algorithm to Amazon SageMaker. The Specialist implements the algorithm in a Docker container supported by Amazon SageMaker.How should the Specialist package the Docker container so that Amazon SageMaker can launch the training correctly?

A. odify the bash_profile file in the container and add a bash command to start the training program

B. se CMD config in the Dockerfile to add the training program as a CMD of the image

C. onfigure the training program as an ENTRYPOINT named train

D. opy the training program to directory /opt/ml/train

View answer

Correct Answer: B

Question #26

A telecommunications company is developing a mobile app for its customers. The company is using an Amazon SageMaker hosted endpoint for machine learning model inferences.Developers want to introduce a new version of the model for a limited number of users who subscribed to a preview feature of the app. After the new version of the model is tested as a preview, developers will evaluate its accuracy. If a new version of the model has better accuracy, developers need to be able to gradually release the new ver

A. pdate the ProductionVariant data type with the new version of the model by using the CreateEndpointConfig operation with the InitialVariantWeight parameter set to 0

B. onfigure two SageMaker hosted endpoints that serve the different versions of the model

C. pdate the DesiredWeightsAndCapacity data type with the new version of the model by using the UpdateEndpointWeightsAndCapacities operation with the DesiredWeight parameter set to 0

D. onfigure two SageMaker hosted endpoints that serve the different versions of the model

View answer

Correct Answer: D

Question #27

A company provisions Amazon SageMaker notebook instances for its data science team and creates Amazon VPC interface endpoints to ensure communication between the VPC and the notebook instances. All connections to the Amazon SageMaker API are contained entirely and securely using the AWS network. However, the data science team realizes that individuals outside the VPC can still connect to the notebook instances across the internet.Which set of actions should the data science team take to fix the issue?

A. odify the notebook instances' security group to allow traffic only from the CIDR ranges of the VPC

B. reate an IAM policy that allows the sagemaker:CreatePresignedNotebooklnstanceUrl and sagemaker:DescribeNotebooklnstance actions from only the VPC endpoints

C. dd a NAT gateway to the VP Convert all of the subnets where the Amazon SageMaker notebook instances are hosted to private subnets

D. hange the network ACL of the subnet the notebook is hosted in to restrict access to anyone outside the VPC

View answer

Correct Answer: B

Question #28

A technology startup is using complex deep neural networks and GPU compute to recommend the company’s products to its existing customers based upon each customer’s habits and interactions. The solution currently pulls each dataset from an Amazon S3 bucket before loading the data into a TensorFlow model pulled from the company’s Git repository that runs locally. This job then runs for several hours while continually outputting its progress to the same S3 bucket. The job can be paused, restarted, and continue

A. mplement the solution using AWS Deep Learning Containers and run the container as a job using AWS Batch on a GPU-compatible Spot Instance

B. mplement the solution using a low-cost GPU-compatible Amazon EC2 instance and use the AWS Instance Scheduler to schedule the task

C. mplement the solution using AWS Deep Learning Containers, run the workload using AWS Fargate running on Spot Instances, and then schedule the task using the built-in task scheduler

D. mplement the solution using Amazon ECS running on Spot Instances and schedule the task using the ECS service scheduler

View answer

Correct Answer: C

Question #29

A Machine Learning Specialist is working with a large cybersecurity company that manages security events in real time for companies around the world. The cybersecurity company wants to design a solution that will allow it to use machine learning to score malicious events as anomalies on the data as it is being ingested. The company also wants be able to save the results in its data lake for later processing and analysis.What is the MOST efficient way to accomplish these tasks?

A. ngest the data using Amazon Kinesis Data Firehose, and use Amazon Kinesis Data Analytics Random Cut Forest (RCF) for anomaly detection

B. ngest the data into Apache Spark Streaming using Amazon EMR, and use Spark MLlib with k-means to perform anomaly detection

C. ngest the data and store it in Amazon S3

D. ngest the data and store it in Amazon S3

View answer

Correct Answer: A

Question #30

A Machine Learning Specialist previously trained a logistic regression model using scikit-learn on a local machine, and the Specialist now wants to deploy it to production for inference only.What steps should be taken to ensure Amazon SageMaker can host a model that was trained locally?

A. se a database, such as Amazon DynamoDB, to store the images, and set the IAM policies to restrict access to only the desired IAM users

B. se an Amazon S3-backed data lake to store the raw images, and set up the permissions using bucket policies

C. etup up Amazon EMR with Hadoop Distributed File System (HDFS) to store the files, and restrict access to the EMR instances using IAM policies

D. onfigure Amazon EFS with IAM policies to make the data available to Amazon EC2 instances owned by the IAM users

View answer

Correct Answer: A

Question #31

The displayed graph is from a forecasting model for testing a time series.Considering the graph only, which conclusion should a Machine Learning Specialist make about the behavior of the model?

A. he model predicts both the trend and the seasonality well

B. he model predicts the trend well, but not the seasonality

C. he model predicts the seasonality well, but not the trend

D. he model does not predict the trend or the seasonality well

View answer

Correct Answer: A

Question #32

A Machine Learning Specialist is building a prediction model for a large number of features using linear models, such as linear regression and logistic regression. During exploratory data analysis, the Specialist observes that many features are highly correlated with each other. This may make the model unstable.What should be done to reduce the impact of having such a large number of features?

A. erform one-hot encoding on highly correlated features

B. se matrix multiplication on highly correlated features

C. reate a new feature space using principal component analysis (PCA)

D. pply the Pearson correlation coefficient

View answer

Correct Answer: C

Question #33

A financial services company is building a robust serverless data lake on Amazon S3. The data lake should be flexible and meet the following requirements:•Support querying old and new data on Amazon S3 through Amazon Athena and Amazon Redshift Spectrum.•Support event-driven ETL pipelines•Provide a quick and easy way to understand metadataWhich approach meets these requirements?

A. se an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Glue ETL job, and an AWS Glue Data catalog to search and discover metadata

B. se an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Batch job, and an external Apache Hive metastore to search and discover metadata

C. se an AWS Glue crawler to crawl S3 data, an Amazon CloudWatch alarm to trigger an AWS Batch job, and an AWS Glue Data Catalog to search and discover metadata

D. se an AWS Glue crawler to crawl S3 data, an Amazon CloudWatch alarm to trigger an AWS Glue ETL job, and an external Apache Hive metastore to search and discover metadata

View answer

Correct Answer: A

Question #34

A Machine Learning Specialist is developing a custom video recommendation model for an application. The dataset used to train this model is very large with millions of data points and is hosted in an Amazon S3 bucket. The Specialist wants to avoid loading all of this data onto an Amazon SageMaker notebook instance because it would take hours to move and will exceed the attached 5 GB Amazon EBS volume on the notebook instance.Which approach allows the Specialist to use all the data to train the model?

A. oad a smaller subset of the data into the SageMaker notebook and train locally

B. aunch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to the instance

C. se AWS Glue to train a model using a small subset of the data to confirm that the data will be compatible with Amazon SageMaker

D. oad a smaller subset of the data into the SageMaker notebook and train locally

View answer

Correct Answer: A

Question #35

A company wants to predict the sale prices of houses based on available historical sales data. The target variable in the company’s dataset is the sale price. The features include parameters such as the lot size, living area measurements, non-living area measurements, number of bedrooms, number of bathrooms, year built, and postal code. The company wants to use multi-variable linear regression to predict house sale prices.Which step should a machine learning specialist take to remove features that are irrel

A. lot a histogram of the features and compute their standard deviation

B. lot a histogram of the features and compute their standard deviation

C. uild a heatmap showing the correlation of the dataset against itself

D. un a correlation check of all features against the target variable

View answer

Correct Answer: D

Question #36

A trucking company is collecting live image data from its fleet of trucks across the globe. The data is growing rapidly and approximately 100 GB of new data is generated every day. The company wants to explore machine learning uses cases while ensuring the data is only accessible to specific IAM users.Which storage option provides the most processing flexibility and will allow access control with IAM?

A. un self-correlation on all features and remove highly correlated features

B. ormalize all numerical values to be between 0 and 1

C. se an autoencoder or principal component analysis (PCA) to replace original features with new features

D. luster raw data using k-means and use sample data from each cluster to build a new dataset

View answer

Correct Answer: C

Question #37

A retail company wants to combine its customer orders with the product description data from its product catalog. The structure and format of the records in each dataset is different. A data analyst tried to use a spreadsheet to combine the datasets, but the effort resulted in duplicate records and records that were not properly combined. The company needs a solution that it can use to combine similar records from the two datasets and remove any duplicates.Which solution will meet these requirements?

A. se an AWS Lambda function to process the data

B. reate AWS Glue crawlers for reading and populating the AWS Glue Data Catalog

C. reate AWS Glue crawlers for reading and populating the AWS Glue Data Catalog

D. reate an AWS Lake Formation custom transform

View answer

Correct Answer: C

Question #38

A media company with a very large archive of unlabeled images, text, audio, and video footage wishes to index its assets to allow rapid identification of relevant content by the Research team. The company wants to use machine learning to accelerate the efforts of its in-house researchers who have limited machine learning expertise.Which is the FASTEST route to index the assets?

A. se Amazon Rekognition, Amazon Comprehend, and Amazon Transcribe to tag data into distinct categories/classes

B. reate a set of Amazon Mechanical Turk Human Intelligence Tasks to label all footage

C. se Amazon Transcribe to convert speech to text

D. se the AWS Deep Learning AMI and Amazon EC2 GPU instances to create custom models for audio transcription and topic modeling, and use object detection to tag data into distinct categories/classes

View answer

Correct Answer: A

Question #39

A machine learning (ML) specialist must develop a classification model for a financial services company. A domain expert provides the dataset, which is tabular with 10,000 rows and 1,020 features. During exploratory data analysis, the specialist finds no missing values and a small percentage of duplicate rows. There are correlation scores of > 0.9 for 200 feature pairs. The mean value of each feature is similar to its 50th percentile.Which feature engineering strategy should the ML specialist use with Amazo

A. pply dimensionality reduction by using the principal component analysis (PCA) algorithm

B. rop the features with low correlation scores by using a Jupyter notebook

C. pply anomaly detection by using the Random Cut Forest (RCF) algorithm

D. oncatenate the features with high correlation scores by using a Jupyter notebook

View answer

Correct Answer: A

Question #40

A financial company is trying to detect credit card fraud. The company observed that, on average, 2% of credit card transactions were fraudulent. A data scientist trained a classifier on a year's worth of credit card transactions data. The model needs to identify the fraudulent transactions (positives) from the regular ones (negatives). The company's goal is to accurately capture as many positives as possible.Which metrics should the data scientist use to optimize the model? (Choose two.)

A. emove Amazon S3 access permissions from the SageMaker execution role

B. ncrypt the weights of the CNN model

C. ncrypt the training and validation dataset

D. nable network isolation for training jobs

View answer

Correct Answer: AB

Question #41

A data scientist is trying to improve the accuracy of a neural network classification model. The data scientist wants to run a large hyperparameter tuning job in Amazon SageMaker.However, previous smaller tuning jobs on the same model often ran for several weeks. The ML specialist wants to reduce the computation time required to run the tuning job.Which actions will MOST reduce the computation time for the hyperparameter tuning job? (Select TWO.)

A. se the Hyperband tuning strategy

B. ncrease the number of hyperparameters

C. et a lower value for the MaxNumberOfTrainingJobs parameter

D. se the grid search tuning strategy

E. et a lower value for the MaxParallelTrainingJobs parameter

View answer

Correct Answer: AC

Question #42

A Data Scientist needs to create a serverless ingestion and analytics solution for high-velocity, real-time streaming data.The ingestion process must buffer and convert incoming records from JSON to a query-optimized, columnar format without data loss. The output datastore must be highly available, and Analysts must be able to run SQL queries against the data and connect to existing business intelligence dashboards.Which solution should the Data Scientist build to satisfy the requirements?

A. reate a schema in the AWS Glue Data Catalog of the incoming data format

B. rite each JSON record to a staging location in Amazon S3

C. rite each JSON record to a staging location in Amazon S3

D. se Amazon Kinesis Data Analytics to ingest the streaming data and perform real-time SQL queries to convert the records to Apache Parquet before delivering to Amazon S3

View answer

Correct Answer: A

Question #43

A financial services company wants to adopt Amazon SageMaker as its default data science environment. The company's data scientists run machine learning (ML) models on confidential financial data. The company is worried about data egress and wants an ML engineer to secure the environment.Which mechanisms can the ML engineer use to control data egress from SageMaker? (Choose three.)

A. Amazon EMR for data discovery, enrichment, and transformation-Amazon Athena for querying and analyzing the results in Amazon S3 using standard SQL-Amazon QuickSight for reporting and getting insights

B. Amazon Kinesis Data Analytics for data ingestion-Amazon EMR for data discovery, enrichment, and transformation-Amazon Redshift for querying and analyzing the results in Amazon S3

C. AWS Glue for data discovery, enrichment, and transformation-Amazon Athena for querying and analyzing the results in Amazon S3 using standard SQL-Amazon QuickSight for reporting and getting insights

D. AWS Data Pipeline for data transfer-AWS Step Functions for orchestrating AWS Lambda jobs for data discovery, enrichment, and transformation-Amazon Athena for querying and analyzing the results in Amazon S3 using standard SQL-Amazon QuickSight for reporting and getting insights

View answer

Correct Answer: BDF

Question #44

A Data Science team is designing a dataset repository where it will store a large amount of training data commonly used in its machine learning models. As Data Scientists may create an arbitrary number of new datasets every day, the solution has to scale automatically and be cost-effective. Also, it must be possible to explore the data using SQL.Which storage scheme is MOST adapted to this scenario?

A. he model needs to be completely re-engineered because it is unable to handle product inventory changes

B. he model's hyperparameters should be periodically updated to prevent drift

C. he model should be periodically retrained from scratch using the original data while adding a regularization term to handle product inventory changes

D. he model should be periodically retrained using the original training data plus new data as product inventory changes

View answer

Correct Answer: A

Question #45

A company is building a line-counting application for use in a quick-service restaurant. The company wants to use video cameras pointed at the line of customers at a given register to measure how many people are in line and deliver notifications to managers if the line grows too long. The restaurant locations have limited bandwidth for connections to external services and cannot accommodate multiple video streams without impacting other operations.Which solution should a machine learning specialist implemen

A. nstall cameras compatible with Amazon Kinesis Video Streams to stream the data to AWS over the restaurant's existing internet connection

B. eploy AWS DeepLens cameras in the restaurant to capture video

C. ecognized

D. uild a custom model in Amazon SageMaker to recognize the number of people in an image

E. uild a custom model in Amazon SageMaker to recognize the number of people in an image

View answer

Correct Answer: A

Question #46

A global bank requires a solution to predict whether customers will leave the bank and choose another bank. The bank is using a dataset to train a model to predict customer loss. The training dataset has 1,000 rows. The training dataset includes 100 instances of customers who left the bank.A machine learning (ML) specialist is using Amazon SageMaker Data Wrangler to train a churn prediction model by using a SageMaker training job. After training, the ML specialist notices that the model returns only false r

A. pply anomaly detection to remove outliers from the training dataset before training

B. pply Synthetic Minority Oversampling Technique (SMOTE) to the training dataset before training

C. pply normalization to the features of the training dataset before training

D. pply undersampling to the training dataset before training

View answer

Correct Answer: B

Question #47

A Machine Learning Specialist built an image classification deep learning model. However, the Specialist ran into an overfitting problem in which the training and testing accuracies were 99% and 75%, respectively.How should the Specialist address this issue and what is the reason behind it?

A. mplement an AWS Lambda function to log Amazon SageMaker API calls to Amazon S3

B. se AWS CloudTrail to log Amazon SageMaker API calls to Amazon S3

C. mplement an AWS Lambda function to log Amazon SageMaker API calls to AWS CloudTrail

D. se AWS CloudTrail to log Amazon SageMaker API calls to Amazon S3

View answer

Correct Answer: B

Question #48

A retail company is using Amazon Personalize to provide personalized product recommendations for its customers during a marketing campaign. The company sees a significant increase in sales of recommended items to existing customers immediately after deploying a new solution version, but these sales decrease a short time after deployment. Only historical data from before the marketing campaign is available for training.How should a data scientist adjust the solution?

A. se the event tracker in Amazon Personalize to include real-time user interactions

B. dd user metadata and use the HRNN-Metadata recipe in Amazon Personalize

C. mplement a new solution using the built-in factorization machines (FM) algorithm in Amazon SageMaker

D. dd event type and event value fields to the interactions dataset in Amazon Personalize

View answer

Correct Answer: A

Question #49

A Machine Learning Specialist is assigned a TensorFlow project using Amazon SageMaker for training, and needs to continue working for an extended period with no Wi-Fi access.Which approach should the Specialist use to continue working?

A. nstall Python 3 and boto3 on their laptop and continue the code development using that environment

B. ownload the TensorFlow Docker container used in Amazon SageMaker from GitHub to their local environment, and use the Amazon SageMaker Python SDK to test the code

C. ownload TensorFlow from tensorflow

D. ownload the SageMaker notebook to their local environment, then install Jupyter Notebooks on their laptop and continue the development in a local notebook

View answer

Correct Answer: B

Question #50

A Data Science team within a large company uses Amazon SageMaker notebooks to access data stored in Amazon S3 buckets. The IT Security team is concerned that internet-enabled notebook instances create a security vulnerability where malicious code running on the instances could compromise data privacy. The company mandates that all instances stay within a secured VPC with no internet access, and data communication traffic must stay within the AWS network.How should the Data Science team configure the noteboo

A. ssociate the Amazon SageMaker notebook with a private subnet in a VPC

B. ssociate the Amazon SageMaker notebook with a private subnet in a VPC

C. ssociate the Amazon SageMaker notebook with a private subnet in a VP Ensure the VPC has S3 VPC endpoints and Amazon SageMaker VPC endpoints attached to it

D. ssociate the Amazon SageMaker notebook with a private subnet in a VPC

View answer

Correct Answer: C

Question #51

A city wants to monitor its air quality to address the consequences of air pollution. A Machine Learning Specialist needs to forecast the air quality in parts per million of contaminates for the next 2 days in the city. As this is a prototype, only daily data from the last year is available.Which model is MOST likely to provide the best results in Amazon SageMaker?

A. se the Amazon SageMaker k-Nearest-Neighbors (kNN) algorithm on the single time series consisting of the full year of data with a predictor_type of regressor

B. se Amazon SageMaker Random Cut Forest (RCF) on the single time series consisting of the full year of data

C. se the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full year of data with a predictor_type of regressor

D. se the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full year of data with a predictor_type of classifier

View answer

Correct Answer: C

Question #52

An aircraft engine manufacturing company is measuring 200 performance metrics in a time-series. Engineers want to detect critical manufacturing defects in near-real time during testing. All of the data needs to be stored for offline analysis.What approach would be the MOST effective to perform near-real time defect detection?

A. se AWS IoT Analytics for ingestion, storage, and further analysis

B. se Amazon S3 for ingestion, storage, and further analysis

C. se Amazon S3 for ingestion, storage, and further analysis

D. se Amazon Kinesis Data Firehose for ingestion and Amazon Kinesis Data Analytics Random Cut Forest (RCF) to perform anomaly detection

View answer

Correct Answer: B

Question #53

A Machine Learning Specialist is creating a new natural language processing application that processes a dataset comprised of 1 million sentences. The aim is to then run Word2Vec to generate embeddings of the sentences and enable different types of predictions.Here is an example from the dataset:"The quck BROWN FOX jumps over the lazy dog.”Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Choose three.)

A. onvert current documents to SSML with pronunciation tags

B. reate an appropriate pronunciation lexicon

C. utput speech marks to guide in pronunciation

D. se Amazon Lex to preprocess the text files for pronunciation

View answer

Correct Answer: BCF

Question #54

A Data Scientist needs to migrate an existing on-premises ETL process to the cloud. The current process runs at regular time intervals and uses PySpark to combine and format multiple large data sources into a single consolidated output for downstream processing.The Data Scientist has been given the following requirements to the cloud solution:•Combine multiple data sources.•Reuse existing PySpark logic.•Run the solution on the existing schedule.•Minimize the number of servers that will need to be managed.Wh

A. rite the raw data to Amazon S3

B. rite the raw data to Amazon S3

C. rite the raw data to Amazon S3

D. se Amazon Kinesis Data Analytics to stream the input data and perform real-time SQL queries against the stream to carry out the required transformations within the stream

View answer

Correct Answer: D

Question #55

A company's Machine Learning Specialist needs to improve the training speed of a time-series forecasting model using TensorFlow. The training is currently implemented on a single-GPU machine and takes approximately 23 hours to complete. The training needs to be run daily.The model accuracy is acceptable, but the company anticipates a continuous increase in the size of the training data and a need to update the model on an hourly, rather than a daily, basis. The company also wants to minimize coding effort a

A. o not change the TensorFlow code

B. hange the TensorFlow code to implement a Horovod distributed framework supported by Amazon SageMaker

C. witch to using a built-in AWS SageMaker DeepAR model

D. ove the training to Amazon EMR and distribute the workload to as many machines as needed to achieve the business goals

View answer

Correct Answer: B

Question #56

Given the following confusion matrix for a movie classification model, what is the true class frequency for Romance and the predicted class frequency for Adventure?

A. he true class frequency for Romance is 77

B. he true class frequency for Romance is 57

C. he true class frequency for Romance is 0

D. he true class frequency for Romance is 77

View answer

Correct Answer: B

Question #57

A Machine Learning Specialist is developing a daily ETL workflow containing multiple ETL jobs. The workflow consists of the following processes:•Start the workflow as soon as data is uploaded to Amazon S3.•When all the datasets are available in Amazon S3, start an ETL job to join the uploaded datasets with multiple terabyte-sized datasets already stored in Amazon S3.•Store the results of joining datasets in Amazon S3.•If one of the jobs fails, send a notification to the Administrator.Which configuration wil

A. se AWS Lambda to trigger an AWS Step Functions workflow to wait for dataset uploads to complete in Amazon S3

B. evelop the ETL workflow using AWS Lambda to start an Amazon SageMaker notebook instance

C. evelop the ETL workflow using AWS Batch to trigger the start of ETL jobs when data is uploaded to Amazon S3

D. se AWS Lambda to chain other Lambda functions to read and join the datasets in Amazon S3 as soon as the data is uploaded to Amazon S3

View answer

Correct Answer: A

Question #58

A Data Scientist is training a multilayer perception (MLP) on a dataset with multiple classes. The target class of interest is unique compared to the other classes within the dataset, but it does not achieve and acceptable recall metric. The Data Scientist has already tried varying the number and size of the MLP’s hidden layers, which has not significantly improved the results. A solution to improve recall must be implemented as quickly as possible.Which techniques should be used to meet these requirements?

A. ather more data using Amazon Mechanical Turk and then retrain

B. rain an anomaly detection model instead of an MLP

C. rain an XGBoost model instead of an MLP

D. dd class weights to the MLP’s loss function and then retrain

View answer

Correct Answer: D

Question #59

A Machine Learning Specialist is given a structured dataset on the shopping habits of a company’s customer base. The dataset contains thousands of columns of data and hundreds of numerical columns for each customer. The Specialist wants to identify whether there are natural groupings for these columns across all customers and visualize the results as quickly as possible.What approach should the Specialist take to accomplish these tasks?

A. mbed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a scatter plot

B. un k-means using the Euclidean distance measure for different values of k and create an elbow plot

C. mbed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a line graph

D. un k-means using the Euclidean distance measure for different values of k and create box plots for each numerical column within each cluster

View answer

Correct Answer: A

Question #60

A large mobile network operating company is building a machine learning model to predict customers who are likely to unsubscribe from the service. The company plans to offer an incentive for these customers as the cost of churn is far greater than the cost of the incentive.The model produces the following confusion matrix after evaluating on a test dataset of 100 customers:Based on the model evaluation results, why is this a viable model for production?

A. he model is 86% accurate and the cost incurred by the company as a result of false negatives is less than the false positives

B. he precision of the model is 86%, which is less than the accuracy of the model

C. he model is 86% accurate and the cost incurred by the company as a result of false positives is less than the false negatives

D. he precision of the model is 86%, which is greater than the accuracy of the model

View answer

Correct Answer: A

Question #61

A Machine Learning Specialist works for a credit card processing company and needs to predict which transactions may be fraudulent in near-real time. Specifically, the Specialist must train a model that returns the probability that a given transaction may fraudulent.How should the Specialist frame this business problem?

A. treaming classification

B. inary classification

C. ulti-category classification

D. egression classification

View answer

Correct Answer: C

Question #62

Machine Learning Specialist is building a model to predict future employment rates based on a wide range of economic factors. While exploring the data, the Specialist notices that the magnitude of the input features vary greatly. The Specialist does not want variables with a larger magnitude to dominate the model.What should the Specialist do to prepare the data for model training?

A. pply quantile binning to group the data into categorical bins to keep any relationships in the data by replacing the magnitude with distribution

B. pply the Cartesian product transformation to create new combinations of fields that are independent of the magnitude

C. pply normalization to ensure each field will have a mean of 0 and a variance of 1 to remove any significant magnitude

D. pply the orthogonal sparse bigram (OSB) transformation to apply a fixed-size sliding window to generate new features of a similar magnitude

View answer

Correct Answer: C

Question #63

An employee found a video clip with audio on a company's social media feed. The language used in the video is Spanish. English is the employee's first language, and they do not understand Spanish. The employee wants to do a sentiment analysis.What combination of services is the MOST efficient to accomplish the task?

A. mazon Transcribe, Amazon Translate, and Amazon Comprehend

B. mazon Transcribe, Amazon Comprehend, and Amazon SageMaker seq2seq

C. mazon Transcribe, Amazon Translate, and Amazon SageMaker Neural Topic Model (NTM)

D. mazon Transcribe, Amazon Translate and Amazon SageMaker BlazingText

View answer

Correct Answer: A

Question #64

During mini-batch training of a neural network for a classification problem, a Data Scientist notices that training accuracy oscillates.What is the MOST likely cause of this issue?

A. he class distribution in the dataset is imbalanced

B. ataset shuffling is disabled

C. he batch size is too big

D. he learning rate is very high

View answer

Correct Answer: D

Question #65

An agricultural company is interested in using machine learning to detect specific types of weeds in a 100-acre grassland field. Currently, the company uses tractor-mounted cameras to capture multiple images of the field as 10 × 10 grids. The company also has a large training dataset that consists of annotated images of popular weed classes like broadleaf and non-broadleaf docks.The company wants to build a weed detection model that will detect specific types of weeds and the location of each type within th

A. repare the images in RecordIO format and upload them to Amazon S3

B. repare the images in Apache Parquet format and upload them to Amazon S3

C. repare the images in RecordIO format and upload them to Amazon S3

D. repare the images in Apache Parquet format and upload them to Amazon S3

View answer

Correct Answer: C

Question #66

A data scientist has been running an Amazon SageMaker notebook instance for a few weeks. During this time, a new version of Jupyter Notebook was released along with additional software updates. The security team mandates that all running SageMaker notebook instances use the latest security and software updates provided by SageMaker.How can the data scientist meet this requirements?

A. all the CreateNotebookInstanceLifecycleConfig API operation

B. reate a new SageMaker notebook instance and mount the Amazon Elastic Block Store (Amazon EBS) volume from the original instance

C. top and then restart the SageMaker notebook instance

D. all the UpdateNotebookInstanceLifecycleConfig API operation

View answer

Correct Answer: C

Question #67

An ecommerce company is automating the categorization of its products based on images. A data scientist has trained a computer vision model using the Amazon SageMaker image classification algorithm. The images for each product are classified according to specific product lines. The accuracy of the model is too low when categorizing new products. All of the product images have the same dimensions and are stored within an Amazon S3 bucket. The company wants to improve the model so it can be used for new produ

A. lasses C and D are too similar

B. he dataset is too small for holdout cross-validation

C. he data distribution is skewed

D. he model is overfitting for classes B and E

View answer

Correct Answer: BCE

DON'T WANT TO MISS A THING?

Latest Cisco, PMP, AWS, CompTIA, Microsoft Materials on SALE
Get Now

Pass the AWS Exam Easily with Updated MLS-C01 Practice Questions

View The Updated MLS C01 Exam Questions

View The Updated Microsoft Exam Questions

View Answers after Submission

DON'T WANT TO MISS A THING?

Latest Cisco, PMP, AWS, CompTIA, Microsoft Materials on SALE Get Now

Pass the AWS Exam Easily with Updated MLS-C01 Practice Questions

View The Updated MLS C01 Exam Questions

View The Updated Microsoft Exam Questions

View Answers after Submission

Latest Cisco, PMP, AWS, CompTIA, Microsoft Materials on SALE
Get Now