From Idea to Execution: Spotify's Discover Weekly


The Presentation inside:

Slide 0

From Idea to Execution: Spotify’s Discover Weekly Or: 5 lessons in building recommendation products at scale Chris Johnson :: @MrChrisJohnson Edward Newett :: @scaladaze DataEngConf • NYC • Nov 2015


Slide 1

Who are We?? Chris Johnson Edward Newett


Slide 2

Spotify in Numbers • • • • • • • Started in 2006, now available in 58 markets 75+ Million active users, 20 Million paying subscribers 30+ Million songs, 20,000 new songs added per day 1.5 Billion user generated playlists 1 TB user data logged per day 1,700 node Hadoop cluster 10,000+ Hadoop jobs run daily


Slide 3

Challenge: 30M songs… how do we recommend music to users?


Slide 4

Discover


Slide 5

Radio


Slide 6

Related Artists


Slide 7

Discover Weekly • • • • • • • Started in 2006, now available in 58 markets 75+ Million active users, 20 Million paying subscribers 30+ Million songs, 20,000 new songs added per day 1.5 Billion user generated playlists 1 TB user data logged per day 1,700 node Hadoop cluster 10,000+ Hadoop jobs run daily


Slide 8

The Road to Discover Weekly


Slide 9

2013 :: Discover Page v1.0 • Personalized News Feed of recommendations • Artists, Album Reviews, News Articles, New Releases, Upcoming Concerts, Social Recommendations, Playlists… • Required a lot of attention and digging to engage with recommendations • No organization of content


Slide 10

2014 :: Discover Page v2.0 • Recommendations grouped into strips (a la Netflix) • Limited to Albums and New Releases • More organized than News-Feed but still requires active interaction


Slide 11

Insight: users spending more time on editorial Browse playlists than Discover.


Slide 12

Idea: combine the personalized experience of Discover with the leanback ease of Browse


Slide 13

Meanwhile… 2014 Year In Music


Slide 14

Play it forward: Same content as the Discover Page but.. a playlist


Slide 15

Lesson 1: Be data driven from start to finish


Slide 16

2008 2012 2015 Slide from Dan McKinley - Etsy


Slide 17

Define success metrics BEFORE you release your test • Reach: How many users are you reaching • Depth: For the users you reach, what is the depth of reach. • Retention: For the users you reach, how many do you retain?


Slide 18

Discover Weekly Key Success Metrics • Reach: DW WAU / Spotify WAU • Depth: DW Time Spent / Spotify WAU • Retention: DW week-over-week retention


Slide 19

2008 2012 2015 Slide from Dan McKinley - Etsy


Slide 20

Step 1: Prototype (employee test)


Slide 21

Step 1: Prototype (employee test)


Slide 22

Results of Employee Test were very positive!


Slide 23

2008 2012 2015 Slide from Dan McKinley - Etsy


Slide 24

Step 2: Release AB Test to 1% of Users


Slide 25

Google Form 1% Results


Slide 26

Personalized image resulted in 10% lift in WAU • Initial 0.5% user test • 1% Spaceman image • 1% Personalized image


Slide 27

Lesson 2: Reuse existing infrastructure in creative ways


Slide 28

Discover Weekly Data Flow


Slide 29

Recommendation Models


Slide 30

Implicit Matrix Factorization •Aggregate all (user, track) streams into a large matrix •Goal: Approximate binary preference matrix by inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of plays, context, and recency as weight Users • • • • 10001001 00100100 10100011 01000100 00100100 10001001 Songs = 1 if user = user = item streamed track latent factor vector X else 0 Y • • • = bias for user = bias for item = regularization parameter latent factor vector [1] Hu Y. & Koren Y. & Volinsky C. (2008) Collaborative Filtering for Implicit Feedback Datasets 8th IEEE International Conference on Data Mining


Slide 31

Can also use Logistic Loss! •Aggregate all (user, track) streams into a large matrix •Goal: Model probability of user playing a song as logistic, then maximize log likelihood of binary preference matrix, weighting positive observations by a function of plays, context, and recency Users • • = user = item 10001001 00100100 10100011 01000100 00100100 10001001 Songs latent factor vector latent factor vector X Y • • • = bias for user = bias for item = regularization parameter [2] Johnson C. (2014) Logistic Matrix Factorization for Implicit Feedback Data NIPS Workshop on Distributed Matrix Computations


Slide 32

NLP Models on News and Blogs


Slide 33

NLP Models work great on Playlists! Playlist itself is a document Songs in playlist are words


Slide 34

Deep Learning on Audio [3] http://benanne.github.io/2014/08/05/spotify-cnns.html


Slide 35

Songs in a Latent Space representation •normalized item-vectors


Slide 36

Songs in a Latent Space representation •user-vector in same space


Slide 37

Lesson 3: Don’t scale until you need to


Slide 38

Scaling to 100%: Rollout Challenges ‣Create and publish 75M playlists every week ‣Downloading and processing Facebook images ‣Language translations


Slide 39

Scaling to 100%: Weekly refresh ‣Time sensitive updates ‣Refresh 75M playlists every Sunday night ‣Take timezones into account


Slide 40

Discover Weekly publishing flow


Slide 41


Slide 42


Slide 43


Slide 44

What’s next? Iterating on content quality and interface enhancements


Slide 45

Iterating on quality and adding a feedback loop.


Slide 46

DW feedback comes at the expense of presentation bias.


Slide 47

Lesson 4: Users know best. In the end, AB Test everything!


Slide 48

Lesson 5 (final lesson!): Empower bottom-up innovation in your org and amazing things will happen.


Slide 49

Thank You! (btw, we’re hiring Machine Learning and Data Engineers, come chat with us!)


Slide 50


×

HTML:





Ссылка: