السؤال #1

A Machine Learning Specialist kicks off a hyperparameter tuning job for a tree-based ensemble model using AmazonSageMaker with Area Under the ROC Curve (AUC) as the objective metric. This workflow will eventually be deployed in apipeline that retrains and tunes hyperparameters each night to model click-through on data that goes stale every 24 hours.With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease costs, the Specialistwants to reconfigure the input hyp

A. A histogram showing whether the most important input feature is Gaussian

B. A scatter plot with points colored by target variable that uses t-Distributed Stochastic Neighbor Embedding (t-SNE) to visualize the large number of input variables in an easier-to-read dimension

C. A scatter plot showing the performance of the objective metric over each training iteration

D. A scatter plot showing the correlation between maximum tree depth and the objective metric

عرض الإجابة

اجابة صحيحة: B

السؤال #2

A real-estate company is launching a new product that predicts the prices of new houses. The historical data for theproperties and prices is stored in .csv format in an Amazon S3 bucket. The data has a header, some categorical fields, andsome missing values. The companys data scientists have used Python with a common open-source library to fill the missingvalues with zeros. The data scientists have dropped all of the categorical fields and have trained a model by using the open-source linear regression algo

A. Create a service-linked role for Amazon Elastic Container Service (Amazon ECS) with access to the S3 bucket

B. Create an Amazon SageMaker notebook with a new IAM role that is associated with the notebook

C. Create an IAM role with access to Amazon S3, Amazon SageMaker, and AWS Lambda

D. Create an IAM role for Amazon SageMaker with access to the S3 bucket

عرض الإجابة

اجابة صحيحة: A

السؤال #3

A data scientist is training a text classification model by using the Amazon SageMaker built-in BlazingText algorithm. Thereare 5 classes in the dataset, with 300 samples for category A, 292 samples for category B, 240 samples for category C, 258samples for category D, and 310 samples for category E.The data scientist shuffles the data and splits off 10% for testing. After training the model, the data scientist generatesconfusion matrices for the training and test sets.What could the data scientist conclude

A. Classes C and D are too similar

B. The dataset is too small for holdout cross-validation

C. The data distribution is skewed

D. The model is overfitting for classes B and E

عرض الإجابة

اجابة صحيحة: B

السؤال #4

An Machine Learning Specialist discover the following statistics while experimenting on a model. What can the Specialist from the experiments?

A. The model In Experiment 1 had a high variance error lhat was reduced in Experiment 3 by regularization Experiment 2 shows that there is minimal bias error in Experiment 1

B. The model in Experiment 1 had a high bias error that was reduced in Experiment 3 by regularization Experiment 2 shows that there is minimal variance error in Experiment 1

C. The model in Experiment 1 had a high bias error and a high variance error that were reduced in Experiment 3 by regularization Experiment 2 shows thai high bias cannot be reduced by increasing layers and neurons in the model

D. The model in Experiment 1 had a high random noise error that was reduced in Expenment 3 by regularization Expenment 2 shows that random noise cannot be reduced by increasing layers and neurons in the model

عرض الإجابة

اجابة صحيحة: C

السؤال #5

A company wants to predict the sale prices of houses based on available historical sales data. The target variable in thecompanys dataset is the sale price. The features include parameters such as the lot size, living area measurements, non-living area measurements, number of bedrooms, number of bathrooms, year built, and postal code. The company wants touse multi-variable linear regression to predict house sale prices.Which step should a machine learning specialist take to remove features that are irreleva

A. Plot a histogram of the features and compute their standard deviation

B. Plot a histogram of the features and compute their standard deviation

C. Build a heatmap showing the correlation of the dataset against itself

D. Run a correlation check of all features against the target variable

عرض الإجابة

اجابة صحيحة: D

السؤال #6

A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine LearningSpecialist would like to build a binary classifier based on two features: age of account and transaction month. The classdistribution for these features is illustrated in the figure provided.Based on this information, which model would have the HIGHEST recall with respect to the fraudulent class?

A. Decision tree

B. Linear support vector machine (SVM)

C. Naive Bayesian classifier

D. Single Perceptron with sigmoidal activation function

عرض الإجابة

اجابة صحيحة: C

السؤال #7

A manufacturing company uses machine learning (ML) models to detect quality issues. The models use images that aretaken of the company's product at the end of each production step. The company has thousands of machines at theproduction site that generate one image per second on average.The company ran a successful pilot with a single manufacturing machine. For the pilot, ML specialists used an industrial PCthat ran AWS IoT Greengrass with a long-running AWS Lambda function that uploaded the images to Amazon

A. Set up a 10 Gbps AWS Direct Connect connection between the production site and the nearest AWS Region

B. Extend the long-running Lambda function that runs on AWS IoT Greengrass to compress the images and upload the compressed files to Amazon S3

C. Use auto scaling for SageMaker

D. Deploy the Lambda function and the ML models onto the AWS IoT Greengrass core that is running on the industrial PCs that are installed on each machine

عرض الإجابة

اجابة صحيحة: D

السؤال #8

A Machine Learning Specialist built an image classification deep learning model. However the Specialist ran into an overfitting problem in which the training and testing accuracies were 99% and 75%r respectively. How should the Specialist address this issue and what is the reason behind it?

A. The learning rate should be increased because the optimization process was trapped at a local minimum

B. The dropout rate at the flatten layer should be increased because the model is not generalized enough

C. The dimensionality of dense layer next to the flatten layer should be increased because the model is not complex enough

D. The epoch number should be increased because the optimization process was terminated before it reached the global minimum

عرض الإجابة

اجابة صحيحة: D

السؤال #9

A retail company is selling products through a global online marketplace. The company wants to use machine learning (ML)to analyze customer feedback and identify specific areas for improvement. A developer has built a tool that collects customerreviews from the online marketplace and stores them in an Amazon S3 bucket. This process yields a dataset of 40 reviews.A data scientist building the ML models must identify additional sources of data to increase the size of the dataset.Which data sources should the

A. Emails exchanged by customers and the company’s customer service agents

B. Social media posts containing the name of the company or its products

C. A publicly available collection of news articles

D. A publicly available collection of customer reviews

E. Product sales revenue figures for the company

F. Instruction manuals for the company’s products

عرض الإجابة

اجابة صحيحة: BDF

السؤال #10

A Data Scientist needs to create a serverless ingestion and analytics solution for high-velocity, real-time streaming data. The ingestion process must buffer and convert incoming records from JSON to a query-optimized, columnar format without data loss. The output datastore must be highly available, and Analysts must be able to run SQL queries against the data and connect to existing business intelligence dashboards. Which solution should the Data Scientist build to satisfy the requirements?

A. Create a schema in the AWS Glue Data Catalog of the incoming data forma

B. Use an Amazon Kinesis Data Firehose delivery stream to stream the data and transform the data to Apache Parquet or ORC format using the AWS Glue Data Catalog before delivering to Amazon S3

C. Write each JSON record to a staging location in Amazon S3

D. Write each JSON record to a staging location in Amazon S3

E. Have the Analysts query and run dashboards from the RDS database

F. Use Amazon Kinesis Data Analytics to ingest the streaming data and perform real-time SQL queries to convert the records to Apache Parquet before delivering to Amazon S3

عرض الإجابة

اجابة صحيحة: C

السؤال #11

A Data Scientist is developing a binary classifier to predict whether a patient has a particular disease on a series of testresults. The Data Scientist has data on 400 patients randomly selected from the population. The disease is seen in 3% of thepopulation.Which cross-validation strategy should the Data Scientist adopt?

A. A k-fold cross-validation strategy with k=5

B. A stratified k-fold cross-validation strategy with k=5

C. A k-fold cross-validation strategy with k=5 and 3 repeats

D. An 80/20 stratified split between training and validation

عرض الإجابة

اجابة صحيحة: B

السؤال #12

A Machine Learning Specialist at a company sensitive to security is preparing a dataset for model training. The dataset is stored in Amazon S3 and contains Personally Identifiable Information (Pll). The dataset: * Must be accessible from a VPC only. * Must not traverse the public internet. How can these requirements be satisfied?

A. Create a VPC endpoint and apply a bucket access policy that restricts access to the given VPC endpoint and the VPC

B. Create a VPC endpoint and apply a bucket access policy that allows access from the given VPC endpoint and an Amazon EC2 instance

C. Create a VPC endpoint and use Network Access Control Lists (NACLs) to allow traffic between only the given VPC endpoint and an Amazon EC2 instance

D. Create a VPC endpoint and use security groups to restrict access to the given VPC endpoint and an Amazon EC2 instance

عرض الإجابة

اجابة صحيحة: B

السؤال #13

During mini-batch training of a neural network for a classification problem, a Data Scientist notices that training accuracy oscillates What is the MOST likely cause of this issue?

A. The class distribution in the dataset is imbalanced

B. Dataset shuffling is disabled

C. The batch size is too big

D. The learning rate is very high

عرض الإجابة

اجابة صحيحة: C

السؤال #14

A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a machine learningspecialist will build a binary classifier based on two features: age of account, denoted by x, and transaction month, denotedby y. The class distributions are illustrated in the provided figure. The positive class is portrayed in red, while the negativeclass is portrayed in black.Which model would have the HIGHEST accuracy?

A. Linear support vector machine (SVM)

B. Decision tree

C. Support vector machine (SVM) with a radial basis function kernel

D. Single perceptron with a Tanh activation function

عرض الإجابة

اجابة صحيحة: C

السؤال #15

A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena. Thedataset contains more than 800,000 records stored as plaintext CSV files. Each record contains 200 columns and isapproximately 1.5 MB in size. Most queries will span 5 to 10 columns only.How should the Machine Learning Specialist transform the dataset to minimize query runtime?

A. Convert the records to Apache Parquet format

B. Convert the records to JSON format

C. Convert the records to GZIP CSV format

D. Convert the records to XML format

عرض الإجابة

اجابة صحيحة: A

السؤال #16

A city wants to monitor its air quality to address the consequences of air pollution A Machine Learning Specialist needs to forecast the air quality in parts per million of contaminates for the next 2 days in the city As this is a prototype, only daily data from the last year is available Which model is MOST likely to provide the best results in Amazon SageMaker?

A. Use the Amazon SageMaker k-Nearest-Neighbors (kNN) algorithm on the single time series consisting of the full year of data with a predictor_type of regressor

B. Use Amazon SageMaker Random Cut Forest (RCF) on the single time series consisting of the full year of data

C. Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full yearof data with a predictor_type of regressor

D. Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full yearof data with a predictor_type of classifier

عرض الإجابة

اجابة صحيحة: A

السؤال #17

A gaming company has launched an online game where people can start playing for free, but they need to pay if theychoose to use certain features. The company needs to build an automated system to predict whether or not a new user willbecome a paid user within 1 year. The company has gathered a labeled dataset from 1 million users.The training dataset consists of 1,000 positive samples (from users who ended up paying within 1 year) and 999,000negative samples (from users who did not use any paid features). E

A. Add more deep trees to the random forest to enable the model to learn more features

B. Include a copy of the samples in the test dataset in the training dataset

C. Generate more positive samples by duplicating the positive samples and adding a small amount of noise to the duplicated data

D. Change the cost function so that false negatives have a higher impact on the cost value than false positives

E. Change the cost function so that false positives have a higher impact on the cost value than false negatives

عرض الإجابة

اجابة صحيحة: CD

السؤال #18

A Machine Learning Specialist is assigned a TensorFlow project using Amazon SageMaker for training, and needs to continue working for an extended period with no Wi-Fi access. Which approach should the Specialist use to continue working?

A. Install Python 3 and boto3 on their laptop and continue the code development using that environment

B. Download the TensorFlow Docker container used in Amazon SageMaker from GitHub to their local environment, and use the Amazon SageMaker Python SDK to test the code

C. Download TensorFlow from tensorflow

D. Download the SageMaker notebook to their local environment then install Jupyter Notebooks on their laptop and continue the development in a local notebook

عرض الإجابة

اجابة صحيحة: B

السؤال #19

A Machine Learning Specialist is creating a new natural language processing application that processes a dataset comprised of 1 million sentences The aim is to then run Word2Vec to generate embeddings of the sentences and enable different types of predictions Here is an example from the dataset "The quck BROWN FOX jumps over the lazy dog " Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Select THREE)

A. Perform part-of-speech tagging and keep the action verb and the nouns only

B. Normalize all words by making the sentence lowercase

C. Remove stop words using an English stopword dictionary

D. Correct the typography on "quck" to "quick

E. One-hot encode all words in the sentence

F. Tokenize the sentence into words

عرض الإجابة

اجابة صحيحة: D

السؤال #20

A Machine Learning Specialist is deciding between building a naive Bayesian model or a full Bayesian network for aclassification problem. The Specialist computes the Pearson correlation coefficients between each feature and finds thattheir absolute values range between 0.1 to 0.95.Which model describes the underlying data in this situation?

A. A naive Bayesian model, since the features are all conditionally independent

B. A full Bayesian network, since the features are all conditionally independent

C. A naive Bayesian model, since some of the features are statistically dependent

D. A full Bayesian network, since some of the features are statistically dependent

عرض الإجابة

اجابة صحيحة: C

السؤال #21

A company wants to enhance audits for its machine learning (ML) systems. The auditing system must be able to perform metadata analysis on the features that the ML models use. The audit solution must generate a report that analyzes the metadata. The solution also must be able to set the data sensitivity and authorship of features. Which solution will meet these requirements with the LEAST development effort? The solution that will meet the requirements with the least development effort is to use Amazon SageM

A. se Amazon SageMaker Feature Store to select the features

B. se Amazon SageMaker Feature Store to set feature groups for the current features that the ML models use

C. se Amazon SageMaker Features Store to apply custom algorithms to analyze the feature-level metadata that the company requires

D. se Amazon SageMaker Feature Store to set feature groups for the current features that the ML models use

عرض الإجابة

اجابة صحيحة: D

السؤال #22

When submitting Amazon SageMaker training jobs using one of the built-in algorithms, which common parameters MUST bespecified? (Choose three.)

A. The training channel identifying the location of training data on an Amazon S3 bucket

B. The validation channel identifying the location of validation data on an Amazon S3 bucket

C. The IAM role that Amazon SageMaker can assume to perform tasks on behalf of the users

D. Hyperparameters in a JSON array as documented for the algorithm used

E. The Amazon EC2 instance class specifying whether training will be run using CPU or GPU

F. The output path specifying where on an Amazon S3 bucket the trained model will persist

عرض الإجابة

اجابة صحيحة: AEF

السؤال #23

A Data Scientist needs to create a serverless ingestion and analytics solution for high-velocity, real-time streaming data.The ingestion process must buffer and convert incoming records from JSON to a query-optimized, columnar format withoutdata loss. The output datastore must be highly available, and Analysts must be able to run SQL queries against the data andconnect to existing business intelligence dashboards.Which solution should the Data Scientist build to satisfy the requirements?

A. Create a schema in the AWS Glue Data Catalog of the incoming data format

B. Write each JSON record to a staging location in Amazon S3

C. Write each JSON record to a staging location in Amazon S3

D. Use Amazon Kinesis Data Analytics to ingest the streaming data and perform real-time SQL queries to convert the records to Apache Parquet before delivering to Amazon S3

عرض الإجابة

اجابة صحيحة: A

السؤال #24

A machine learning specialist stores IoT soil sensor data in Amazon DynamoDB table and stores weather event data asJSON files in Amazon S3. The dataset in DynamoDB is 10 GB in size and the dataset in Amazon S3 is 5 GB in size. Thespecialist wants to train a model on this data to help predict soil moisture levels as a function of weather events usingAmazon SageMaker.Which solution will accomplish the necessary transformation to train the Amazon SageMaker model with the LEAST amountof administrative overhead?

A. Launch an Amazon EMR cluster

B. Crawl the data using AWS Glue crawlers

C. Enable Amazon DynamoDB Streams on the sensor table

D. Crawl the data using AWS Glue crawlers

عرض الإجابة

اجابة صحيحة: C

السؤال #25

A Machine Learning Specialist kicks off a hyperparameter tuning job for a tree-based ensemble model using Amazon SageMaker with Area Under the ROC Curve (AUC) as the objective metric This workflow will eventually be deployed in a pipeline that retrains and tunes hyperparameters each night to model click-through on data that goes stale every 24 hours With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease costs, the Specialist wants to reconfigure the input h

A. A histogram showing whether the most important input feature is Gaussian

B. A scatter plot with points colored by target variable that uses (-Distributed Stochastic Neighbor Embedding (I-SNE) to visualize the large number of input variables in an easier-to-read dimension

C. A scatter plot showing (he performance of the objective metric over each training iteration

D. A scatter plot showing the correlation between maximum tree depth and the objective metric

عرض الإجابة

اجابة صحيحة: B

لا تريد أن تفوت شيئا؟

اجتياز اختبار الممارسة Cisco وPMP وCISA وCISM وAWS بنسبة 100% للبيع!
احصل الان

View The Updated MLS-C01 Exam Questions

View The Updated AWS Exam Questions

عرض الإجابات بعد التقديم

لا تريد أن تفوت شيئا؟

اجتياز اختبار الممارسة Cisco وPMP وCISA وCISM وAWS بنسبة 100% للبيع! احصل الان

View The Updated MLS-C01 Exam Questions

View The Updated AWS Exam Questions

عرض الإجابات بعد التقديم

اجتياز اختبار الممارسة Cisco وPMP وCISA وCISM وAWS بنسبة 100% للبيع!
احصل الان