unbalancedparentheses · henry2004y · Apr 16, 2023
diff --git a/04_naive_bayes.Rmd b/04_naive_bayes.Rmd
@@ -157,12 +157,11 @@ the spam filter.
 As we mentioned, what we are facing here is a *classification* problem,
 and we will code from scratch and use a *supervised learning* algorithm
 to find a solution with the help of Bayes' theorem. We're going to use a
-*naive Bayes* classifier to create our spam filter. We're going to use a
-\emph{naive Bayes} classifier to create our spam filter. This method is
-going to treat each email just as a collection of words, with no regard
-for the order in which they appear. This means we won't take into
-account semantic considerations like the particular relationship between
-words and their context.
+*naive Bayes* classifier to create our spam filter. This method is going
+to treat each email just as a collection of words, with no regard for
+the order in which they appear. This means we won't take into account
+semantic considerations like the particular relationship between words
+and their context.
 
 Our strategy will be to estimate a probability of an incoming email
 being ham or spam and make a decision based on that. Our general
@@ -200,10 +199,9 @@ common in our example's training data. We would therefore expect
 $P(email|spam)$, the probability of the new email being generated by the
 words encountered in the training spam email set, to be relatively high.
 
-(The word \\emph{win} appears in the form \\emph{won} in the training
-set, but that's OK. The standard linguistic technique of
-\\emph{lemmatization} groups together any related forms of a word and
-treats them as the same word.)
+(The word *win* appears in the form *won* in the training set, but
+that's OK. The standard linguistic technique of *lemmatization* groups
+together any related forms of a word and treats them as the same word.)
 
 Mathematically, the way to calculate $P(email|spam)$ is to take each
 word in our target email, calculate the probability of it appearing in