question archive Tokenization

Tokenization

Subject:ManagementPrice: Bought3

Tokenization. Consider the following text version of a post to an online learning forum in a statistics course:

a. Identify 10 non-word tokens in the passage.

b. Suppose this passage constitutes a document to be classified, but you are not certain of the business goal of the classification task. Identify material (at least 20% of the terms) that, in your judgment, could be discarded fairly safely without knowing that goal.

c. Suppose the classification task is to predict whether this post requires the attention of the instructor, or whether a teaching assistant might suffice. Identify the 20% of the terms that you think might be most helpful in that task.

d. What aspect of the passage is most problematic from the standpoint of simply using a bag-of-words approach, as opposed to an approach in which meaning is extracted?

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE