A Machine Learning Specialist is creating a new natural language processing application that processes a dataset comprised of 1 million sentences. The aim is to then run Word2Vec to generate embeddings of the sentences and enable different types of predictions.
Here is an example from the dataset:
"The quck BROWN FOX jumps over the lazy dog."
Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Choose three.)
A. erform part-of-speech tagging and keep the action verb and the nouns only
B. ormalize all words by making the sentence lowercase
C. emove stop words using an English stopword dictionary
D. orrect the typography on "quck" to "quick
E. ne-hot encode all words in the sentence
F. okenize the sentence into words