Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typos #203

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 8 additions & 10 deletions 04_naive_bayes.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -157,12 +157,11 @@ the spam filter.
As we mentioned, what we are facing here is a *classification* problem,
and we will code from scratch and use a *supervised learning* algorithm
to find a solution with the help of Bayes' theorem. We're going to use a
*naive Bayes* classifier to create our spam filter. We're going to use a
\emph{naive Bayes} classifier to create our spam filter. This method is
going to treat each email just as a collection of words, with no regard
for the order in which they appear. This means we won't take into
account semantic considerations like the particular relationship between
words and their context.
*naive Bayes* classifier to create our spam filter. This method is going
to treat each email just as a collection of words, with no regard for
the order in which they appear. This means we won't take into account
semantic considerations like the particular relationship between words
and their context.

Our strategy will be to estimate a probability of an incoming email
being ham or spam and make a decision based on that. Our general
Expand Down Expand Up @@ -200,10 +199,9 @@ common in our example's training data. We would therefore expect
$P(email|spam)$, the probability of the new email being generated by the
words encountered in the training spam email set, to be relatively high.

(The word \\emph{win} appears in the form \\emph{won} in the training
set, but that's OK. The standard linguistic technique of
\\emph{lemmatization} groups together any related forms of a word and
treats them as the same word.)
(The word *win* appears in the form *won* in the training set, but
that's OK. The standard linguistic technique of *lemmatization* groups
together any related forms of a word and treats them as the same word.)

Mathematically, the way to calculate $P(email|spam)$ is to take each
word in our target email, calculate the probability of it appearing in
Expand Down