Demystifying Machine Learning


The Presentation inside:

Slide 0

@dhianadeva MACHINE LEARNING FOR EVERYONE Demystifying machine learning!


Slide 1

AGENDA Goal: Encourage you to start a machine learning project. Today! ● ● ● ● ● ● ● ● ● ● About me About you Machine Learning Problems Design Algorithms Evaluation Code snippets Pay-as-you-go Competitions


Slide 2

ABOUT ME Electronics Engineering, Software Development and Data Science… Why not?


Slide 3

DHIANA DEVA


Slide 4

NEURALTB


Slide 5

CERN


Slide 6

NEURALRINGER


Slide 7

DJBRAZIL


Slide 8

HIGGS CHALLENGE


Slide 9

ABOUT YOU You can do it!


Slide 10

FOR ALL


Slide 11

MASSIVE ONLINE OPEN COURSES


Slide 12

OPEN SOURCE TOOLS


Slide 13

OPEN SOURCE PYTHON TOOLS


Slide 14

PAY-AS-YOU-GO SERVICES


Slide 15

MACHINE LEARNING Learning, machine learning!


Slide 16

EXPECTATIONS


Slide 17

REALITY


Slide 18

FEATURE EXTRACTION Item { Feature 1 Feature 2 … Feature N


Slide 19

FEATURE SPACE


Slide 20

SUPERVISED LEARNING Items Feature Vectors Labels 458316,86.513,24.312,64.983,65.8 623 458318,-999.0,91.803,113.007,120. New item 150 58317,135.493,2.204,101.966,46.5 Machine Learning Algorithm 74 Feature Vector 458316,86.513,24.312,64.983,65.8 Predictive Model Expected Label 133


Slide 21

UNSUPERVISED LEARNING Items Feature Vectors Machine Learning Algorithm 458316,86.513,24.312,64.983,65.8 58317,135.493,2.204,101.966,46.5 458318,-999.0,91.803,113.007,120. New item Feature Vector 458316,86.513,24.312,64.983,65.8 Predictive Model Better Representation


Slide 22

MODELS


Slide 23

BIOLOGICAL MOTIVATION


Slide 24

PROBLEMS I've got 99 problems, but machine learning ain't one!


Slide 25

CLASSIFICATION A ? B


Slide 26

CLASSIFICATION


Slide 27

REGRESSION ? 8 15 7 1 11 13 6 3


Slide 28

REGRESSION


Slide 29

CLUSTERING


Slide 30

CLUSTERING


Slide 31

DIMENSIONALITY REDUCTION


Slide 32

DIMENSIONALITY REDUCTION


Slide 33

DESIGN DECISIONS 1, 2 steps!


Slide 34

NORMALIZATION ● z-score ● min-max


Slide 35

TRAINING


Slide 36

REGULARIZATION


Slide 37

RELEVANCE ANALYSIS


Slide 38

CROSS VALIDATION


Slide 39

ALGORITHMS Cheat sheet included!


Slide 40

CHEAT SHEET


Slide 41

CHEAT SHEET


Slide 42

ALGORITHMS PT. I Linear Regression Decision Trees Random Forest


Slide 43

ALGORITHMS PT. II K-Nearest Neighbors K-Means


Slide 44

NEURAL NETWORKS


Slide 45

SELF-ORGANIZING MAPS


Slide 46

PRINCIPAL COMPONENTS ANALYSIS


Slide 47

T-SNE


Slide 48

EVALUATION How you doin'?


Slide 49

PRECISION AND ACCURACY


Slide 50

CONFUSION MATRIX TP = True Positives TN = True Negatives FP = False Positives FN = False Negatives Precision = TP Recall = TP + FP F1-score = TP TP + FN 2 * precision * recall precision + recall


Slide 51

ROC CURVE


Slide 52

A/B TESTS


Slide 53

Code Snippets "Hello, Machine Learning"


Slide 54

MATLAB 101 [x,y] = ovarian_dataset; net = patternnet(5); [net,tr] = train(net,x,y); testX = x(:,tr.testInd); testY = net(testX);


Slide 55

MATLAB 201 net = patternnet(14); net.input.processFcns = {'mapminmax', 'fixunknowns', 'processpca'}; net.inputs{1}.processParams{3}.maxfrac = 0.02; net.trainFcn = 'trainlm'; net.performFcn = 'mse'; net.divideParam.trainRatio = 70/100; net.divideParam.valRatio = 15/100; net.divideParam.testRatio = 15/100; [net, tr] = train(net_config, test_inputs, train_targets); outputs = net(test_inputs);


Slide 56

R library(randomForest) raw.orig < - read.csv(file="train.txt", header=T, sep="\t") frmla = Metal ~ OTW + AirDecay + Koc fit.rf = randomForest(frmla, data=raw) print(fit.rf) importance(fit.rf)


Slide 57

SCIKIT LEARN dataset = pd.read_csv('Data/train.csv') target = dataset.Activity.values train = dataset.drop('Activity', axis=1).values test = pd.read_csv('Data/test.csv').values rf = RandomForestClassifier(n_estimators=100, n_jobs=-1) rf.fit(train, target) predicted_probs = [x[1] for x in rf.predict_proba(test)] importances = rf.feature_importances_


Slide 58

PAY-AS-YOU-GO SERVICES Amazon Machine Learning


Slide 59

AMAZON MACHINE LEARNING Five easy steps 1. 2. 3. 4. 5. Upload csv dataset to Amazon S3 Create Datasource with metadata about uploaded dataset Create ML Model with configurations for model training Create Evaluation to analyse and tune model efficiency Create Prediction to use trained model with new data


Slide 60

EVALUATION


Slide 61

SDKs


Slide 62

DATA SCIENCE COMPETITIONS Challenge accepted!


Slide 63

KAGGLE


Slide 64

SPONSORED


Slide 65

END TO END TRAIN.CSV TRAIN TRAINED.DAT RUN TEST.CSV SOLUTION.CSV


Slide 66

THANK YOU Questions? Dhiana Deva [email protected]


Slide 67


×

HTML:





Ссылка: