Machine Learning at Netflix

The Presentation inside:

Slide 0

Slide 1

Machine Learning @ Netflix (and some lessons learned) Yves Raimond (@moustaki) Research/Engineering Manager Search & Recommendations Algorithm Engineering

Slide 2

Netflix evolution

Slide 3

Netflix scale ● ● ● ● ● > 69M members > 50 countries > 1000 device types > 3B hours/month 36% of peak US downstream traffic

Slide 4

Recommendations @ Netflix ● Goal: Help members find content to watch and enjoy to maximize satisfaction and retention ● Over 80% of what people watch comes from our recommendations ● Top Picks, Because you Watched, Trending Now, Row Ordering, Evidence, Search, Search Recommendations, Personalized Genre Rows, ...

Slide 5

Models & Algorithms ▪ Regression (Linear, logistic, elastic net) ▪ SVD and other Matrix Factorizations ▪ Factorization Machines ▪ Restricted Boltzmann Machines ▪ Deep Neural Networks ▪ Markov Models and Graph Algorithms ▪ Clustering ▪ Latent Dirichlet Allocation ▪ Gradient Boosted Decision Trees/Random Forests ▪ Gaussian Processes ▪ …

Slide 6

Some lessons learned

Slide 7

Build the offline experimentation framework first

Slide 8

When tackling a new problem ● ● ● What offline metrics can we compute that capture what online improvements we’ re actually trying to achieve? How should the input data to that evaluation be constructed (train, validation, test)? How fast and easy is it to run a full cycle of offline experimentations? ○ ● Minimize time to first metric How replicable is the evaluation? How shareable are the results? ○ ○ Provenance (see Dagobah) Notebooks (see Jupyter, Zeppelin, Spark Notebook)

Slide 9

When tackling an old problem ● Same… ○ Were the metrics designed when first running experimentation in that space still appropriate now?

Slide 10

Think about distribution from the outermost layers

Slide 11

1. For each combination of hyper-parameter (e.g. grid search, random search, gaussian processes…) 2. For each subset of the training data a. b. Multi-core learning (e.g. HogWild) Distributed learning (e.g. ADMM, distributed L-BFGS, …)

Slide 12

When to use distributed learning? ● The impact of communication overhead when building distributed ML algorithms is non-trivial ● Is your data big enough that the distribution offsets the communication overhead?

Slide 13

Example: Uncollapsed Gibbs sampler for LDA (more details here)

Slide 14

Design production code to be experimentation-friendly

Slide 15

Example development process Idea Offline Modeling (R, Python, MATLAB, …) Data Iterate Missing postprocessing logic Data discrepancies Production environment (A/B test) Final model Actual output Performance issues Implement in production system (Java, C++, …) Code discrepancies

Slide 16

Avoid dual implementations Experiment code Production code Experiment Production Shared Engine

Slide 17

To be continued...

Slide 18

We’re hiring! Yves Raimond (@moustaki)

Slide 19