Skip to content

Latest commit

 

History

History
64 lines (51 loc) · 1.4 KB

data_analysis.md

File metadata and controls

64 lines (51 loc) · 1.4 KB

#Introduction Data Analysis

  • the buzzword, then meaningless
  • very easy to find real-world motivation

##Fields

  • statistical analysis
  • exploratory / descriptive
  • vs. hypothesis testing

Languages

  • what is R
  • what is S
  • what is SAS
  • pandas

why is pandas

  • incorporates 90% of the good stuff from R
  • keep python syntax

#Data Structures

  • what is a dataframe
    • like a 2d "matrix"
    • index
    • columns
    • associated functions/methods
  • what is the right way to represent 2d data
    • programmers trick questions "how do you reprsent excel?"
  • what is the right way to repersent higher dimensional data
  • what is a series? time-series?
  • what is a csv?
  • what is a flatfile?
  • how is is different from a database?
  • what is a vectorized operation?

#exploratory sitelinks example

#hypothesis

  • crosstab
  • logistics regression

#plotting

  • what is plotting?
  • what is a grammar of graphics?

#homework

  • find a csv, or html table, subset it, get some descriptive stastics, and plot it