Artificial Intelligence Question 1:

Consider a Naive Bayes classifier for spam filtering. We are given a training set of 500 randomly chosen emails. We examine them and label 200 of them as spam emails and 300 as non-spam emails. There are 2000 (words) in the 200 spam emails. 200 spam emails contain the word "a"; 60 contain the word "good"; and 50 contain the word "job". In the 300 nonspam emails, there are in total 1000 (words). 150 non-spam emails contain the word "a''; 30 non-spam emails contain the word "good'', and 10 non-spam emails contain the word "job''. We use S to denote a random event that one email is found to be spam, and use NS to denote non-spam. We use P(word|S) to denote the conditional probability that one word word appears in a spam email (P(word|NS) is defined similarly). P(word1, word2|S) is the probability that both word1 and word2 appear in a spam email.

i. What is the best approximation to P( "a"|S) and P( "good"|S) and P("job"|S) given the training set?

ii. What is the best approximation to P( "a", "good", "job"|S) and P( "a", "good", "job"|NS) given the training set (Hint: using the structure of a Naive Bayes network to answer this question)?

iii. Given a testing email "Well done! You did a good job in CS471!". Will a Naive Bayes classifier trained on the training set above classify it as a spam? Why or why not? (Hint: you should make the decision based on P(S| "a", "good", "job") and P(NS| "a", "good", "job").)

Name: Artificial Intelligence Question 1: Consider a Naive Bayes classifier for spam filtering. We are
Brand: Study Help Me
SKU: 51772
Price: 10 USD
Availability: LimitedAvailability
Rating: 5 (8 reviews)

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

i.P(a|S)=200/200=1 ,

P(good|S)=60/200 = 0.3,

P(job|S)=50/200=0.4 .

ii.

P(a,good,job|S)=P(a|S)P(good|S)P(job|S)=0.12

P(a,good,job|NS)=P(a|NS)P(good|NS)P(job|NS)=(150/300)*(30/300)*(10/300) = 0.001667

iii.

P(S|a,good,job)=(P(a,good,job|S)P(S))/P(a,good,job)=(0.12*(200/500))/P(a,good,job)}=0.48/P(a,good,job)

P(NS|a,good,job) = (0.001667*(300/500))/P(a,good,job) = 0.001/P(a,good,job)

P(a,good,job) is common in both => P(S|a,good,job)>P(NS|a,good,job)

so classify as Spam

Artificial Intelligence Question 1: Consider a Naive Bayes classifier for spam filtering

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Sitejabber (5.0)

Merchant Circle (4.8)

Trustpilot (4.6)

Study Help Me (4.9)

Related Questions

Address

Phone Number

Email Address