Understanding Feature Space in Machine Learning

Презентация изнутри:

Слайд 0

Understanding Feature Space in Machine Learning Alice Zheng, Dato September 9, 2015 1

Слайд 1

My journey so far Applied machine learning (Data science) Build ML tools

Слайд 2

Why machine learning? Model data. Make predictions. Build intelligent applications.

Слайд 3

The machine learning pipeline I fell in love the instant I laid my eyes on that puppy. His big eyes and playful tail, his soft furry paws, … Raw data Features

Слайд 4

Feature = numeric representation of raw data

Слайд 5

Representing natural text It is a puppy and it is extremely cute. What’s important? Phrases? Specific words? Ordering? Subject, object, verb? Classify: puppy or not? Raw Text

Слайд 6

Representing natural text It is a puppy and it is extremely cute. Classify: puppy or not? Raw Text Sparse vector representation

Слайд 7

Representing images Image source: “Recognizing and learning object categories,” Li Fei-Fei, Rob Fergus, Anthony Torralba, ICCV 2005—2009. Raw image: millions of RGB triplets, one for each pixel Raw Image

Слайд 8

Representing images Raw Image Deep learning features 3.29 -15 -5.24 48.3 1.36 47.1 -1.9236.5 2.83 95.4 -19 -89 5.09 37.8 Dense vector representation

Слайд 9

Feature space in machine learning Raw data ? high dimensional vectors Collection of data points ? point cloud in feature space Model = geometric summary of point cloud Feature engineering = creating features of the appropriate granularity for the task

Слайд 10

Crudely speaking, mathematicians fall into two categories: the algebraists, who find it easiest to reduce all problems to sets of numbers and variables, and the geometers, who understand the world through shapes. -- Masha Gessen, “Perfect Rigor”

Слайд 11

Algebra vs. Geometry a b c a2 + b2 = c2 Algebra Geometry (Euclidean space)

Слайд 12

Visualizing a sphere in 2D x2 + y2 = 1

Слайд 13

Visualizing a sphere in 3D x2 + y2 + z2 = 1 x y z 1 1 1

Слайд 14

Visualizing a sphere in 4D x2 + y2 + z2 + t2 = 1 x y z 1 1 1

Слайд 15

Why are we looking at spheres? = = = = Poincare Conjecture: All physical objects without holes is “equivalent” to a sphere.

Слайд 16

The power of higher dimensions A sphere in 4D can model the birth and death process of physical objects Point clouds = approximate geometric shapes High dimensional features can model many things

Слайд 17

Visualizing Feature Space

Слайд 18

The challenge of high dimension geometry Feature space can have hundreds to millions of dimensions In high dimensions, our geometric imagination is limited Algebra comes to our aid

Слайд 19

Visualizing bag-of-words I have a puppy and it is extremely cute

Слайд 20

Visualizing bag-of-words puppy cute 1 1 1 extremely

Слайд 21

Document point cloud word 1 word 2

Слайд 22

What is a model? Model = mathematical “summary” of data What’s a summary? A geometric shape

Слайд 23

Classification model Feature 2 Feature 1 Decide between two classes

Слайд 24

Clustering model Feature 2 Feature 1 Group data points tightly

Слайд 25

Regression model Target Feature Fit the target values

Слайд 26

Visualizing Feature Engineering

Слайд 27

When does bag-of-words fail? puppy cat 2 1 1 have Task: find a surface that separates documents about dogs vs. cats Problem: the word “have” adds fluff instead of information 1

Слайд 28

Improving on bag-of-words Idea: “normalize” word counts so that popular words are discounted Term frequency (tf) = Number of times a terms appears in a document Inverse document frequency of word (idf) = N = total number of documents Tf-idf count = tf x idf

Слайд 29

From BOW to tf-idf puppy cat 2 1 1 have idf(puppy) = log 4 idf(cat) = log 4 idf(have) = log 1 = 0 1

Слайд 30

From BOW to tf-idf puppy cat 1 have tfidf(puppy) = log 4 tfidf(cat) = log 4 tfidf(have) = 0 1 log 4 log 4 Tf-idf flattens uninformative dimensions in the BOW point cloud

Слайд 31

Entry points of feature engineering Start from data and task What’s the best text representation for classification? Start from modeling method What kind of features does k-means assume? What does linear regression assume about the data?

Слайд 32

That’s not all, folks! There’s a lot more to feature engineering: Feature normalization Feature transformations “Regularizing” models Learning the right features Dato is hiring! [email protected] [email protected] @RainyData