# Demystifying Machine Learning

The Presentation inside:

Slide 0

@dhianadeva MACHINE LEARNING FOR EVERYONE Demystifying machine learning!

Slide 1

AGENDA Goal: Encourage you to start a machine learning project. Today! ● ● ● ● ● ● ● ● ● ● About me About you Machine Learning Problems Design Algorithms Evaluation Code snippets Pay-as-you-go Competitions

Slide 2

ABOUT ME Electronics Engineering, Software Development and Data Science… Why not?

Slide 3

DHIANA DEVA

Slide 4

NEURALTB

Slide 5

CERN

Slide 6

NEURALRINGER

Slide 7

DJBRAZIL

Slide 8

HIGGS CHALLENGE

Slide 9

ABOUT YOU You can do it!

Slide 10

FOR ALL

Slide 11

MASSIVE ONLINE OPEN COURSES

Slide 12

OPEN SOURCE TOOLS

Slide 13

OPEN SOURCE PYTHON TOOLS

Slide 14

PAY-AS-YOU-GO SERVICES

Slide 15

MACHINE LEARNING Learning, machine learning!

Slide 16

EXPECTATIONS

Slide 17

REALITY

Slide 18

FEATURE EXTRACTION Item { Feature 1 Feature 2 … Feature N

Slide 19

FEATURE SPACE

Slide 20

SUPERVISED LEARNING Items Feature Vectors Labels 458316,86.513,24.312,64.983,65.8 623 458318,-999.0,91.803,113.007,120. New item 150 58317,135.493,2.204,101.966,46.5 Machine Learning Algorithm 74 Feature Vector 458316,86.513,24.312,64.983,65.8 Predictive Model Expected Label 133

Slide 21

UNSUPERVISED LEARNING Items Feature Vectors Machine Learning Algorithm 458316,86.513,24.312,64.983,65.8 58317,135.493,2.204,101.966,46.5 458318,-999.0,91.803,113.007,120. New item Feature Vector 458316,86.513,24.312,64.983,65.8 Predictive Model Better Representation

Slide 22

MODELS

Slide 23

BIOLOGICAL MOTIVATION

Slide 24

PROBLEMS I've got 99 problems, but machine learning ain't one!

Slide 25

CLASSIFICATION A ? B

Slide 26

CLASSIFICATION

Slide 27

REGRESSION ? 8 15 7 1 11 13 6 3

Slide 28

REGRESSION

Slide 29

CLUSTERING

Slide 30

CLUSTERING

Slide 31

DIMENSIONALITY REDUCTION

Slide 32

DIMENSIONALITY REDUCTION

Slide 33

DESIGN DECISIONS 1, 2 steps!

Slide 34

NORMALIZATION ● z-score ● min-max

Slide 35

TRAINING

Slide 36

REGULARIZATION

Slide 37

RELEVANCE ANALYSIS

Slide 38

CROSS VALIDATION

Slide 39

ALGORITHMS Cheat sheet included!

Slide 40

CHEAT SHEET

Slide 41

CHEAT SHEET

Slide 42

ALGORITHMS PT. I Linear Regression Decision Trees Random Forest

Slide 43

ALGORITHMS PT. II K-Nearest Neighbors K-Means

Slide 44

NEURAL NETWORKS

Slide 45

SELF-ORGANIZING MAPS

Slide 46

PRINCIPAL COMPONENTS ANALYSIS

Slide 47

T-SNE

Slide 48

EVALUATION How you doin'?

Slide 49

PRECISION AND ACCURACY

Slide 50

CONFUSION MATRIX TP = True Positives TN = True Negatives FP = False Positives FN = False Negatives Precision = TP Recall = TP + FP F1-score = TP TP + FN 2 * precision * recall precision + recall

Slide 51

ROC CURVE

Slide 52

A/B TESTS

Slide 53

Code Snippets "Hello, Machine Learning"

Slide 54

MATLAB 101 [x,y] = ovarian_dataset; net = patternnet(5); [net,tr] = train(net,x,y); testX = x(:,tr.testInd); testY = net(testX);

Slide 55

MATLAB 201 net = patternnet(14); net.input.processFcns = {'mapminmax', 'fixunknowns', 'processpca'}; net.inputs{1}.processParams{3}.maxfrac = 0.02; net.trainFcn = 'trainlm'; net.performFcn = 'mse'; net.divideParam.trainRatio = 70/100; net.divideParam.valRatio = 15/100; net.divideParam.testRatio = 15/100; [net, tr] = train(net_config, test_inputs, train_targets); outputs = net(test_inputs);

Slide 56

R library(randomForest) raw.orig < - read.csv(file="train.txt", header=T, sep="\t") frmla = Metal ~ OTW + AirDecay + Koc fit.rf = randomForest(frmla, data=raw) print(fit.rf) importance(fit.rf)

Slide 57

SCIKIT LEARN dataset = pd.read_csv('Data/train.csv') target = dataset.Activity.values train = dataset.drop('Activity', axis=1).values test = pd.read_csv('Data/test.csv').values rf = RandomForestClassifier(n_estimators=100, n_jobs=-1) rf.fit(train, target) predicted_probs = [x[1] for x in rf.predict_proba(test)] importances = rf.feature_importances_

Slide 58

PAY-AS-YOU-GO SERVICES Amazon Machine Learning

Slide 59

AMAZON MACHINE LEARNING Five easy steps 1. 2. 3. 4. 5. Upload csv dataset to Amazon S3 Create Datasource with metadata about uploaded dataset Create ML Model with configurations for model training Create Evaluation to analyse and tune model efficiency Create Prediction to use trained model with new data

Slide 60

EVALUATION

Slide 61

SDKs

Slide 62

DATA SCIENCE COMPETITIONS Challenge accepted!

Slide 63

KAGGLE

Slide 64

Slide 65

END TO END TRAIN.CSV TRAIN TRAINED.DAT RUN TEST.CSV SOLUTION.CSV

Slide 66

THANK YOU Questions? Dhiana Deva [email protected]

Slide 67

×

HTML:

Ссылка: