-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
14 changed files
with
827 additions
and
1,351 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,8 @@ | ||
# High dimensional data {.unnumbered} | ||
|
||
There is a variety of computational techniques and statistical concepts that are useful for analysis of datasets for which each observation is associated with a large number of numerical variables. In this chapter we provide a basic introduction to these techniques and concepts by describing matrix operations in R, dimension reduction, regularization, and matrix factorization. Handwritten digits data and movie recommendation systems serve as motivating examples. | ||
There is a variety of computational techniques and statistical concepts that are useful for analysis of datasets for which each observation is associated with a large number of numerical variables. In this chapter, we provide a basic introduction to these techniques and concepts by describing matrix operations in R, dimension reduction, regularization, and matrix factorization. Handwritten digits data and movie recommendation systems serve as motivating examples. | ||
|
||
A task that serves as motivation for this part of the book is quantifying the similarity between any two observations. For example, we might want to know how much two handwritten digits look like each other. However, note that each observations is associated with $28 \times 28 = 784$ pixels so we can't simply use subtraction as we would do if our data was one dimensional. | ||
A task that serves as motivation for this part of the book is quantifying the similarity between any two observations. For example, we might want to know how much two handwritten digits look like each other. However, note that each observation is associated with $28 \times 28 = 784$ pixels so we can't simply use subtraction as we would if our data was one dimensional. | ||
Instead, we will define observations as *points* in a *high-dimensional* space and mathematically define a *distance*. Many machine learning techniques, discussed in the next part of the book, require this calculation. | ||
|
||
Additionally, this part of the book discusses dimension reduction. Here we search of data summaries that result in more manageable lower dimension versions of the data, but preserve most or all the *information* we need. Here too we can use distance between observations as specific challenge: we will reduce the dimensions summarize the data into lower dimensions, but in a way that preserves the distance between any two observations. We use *linear algebra* as a mathematical foundation for all the techniques presented here. | ||
Additionally, this part of the book discusses dimension reduction. FIX Here we search of data summaries that result in more manageable lower dimension versions of the data, but preserve most or all the *information* we need. FIX Here too we can use distance between observations as specific challenge: we will reduce the dimensions summarize the data into lower dimensions, but in a way that preserves the distance between any two observations. We use *linear algebra* as a mathematical foundation for all the techniques presented here. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.