The Technology Behind The Spotify Wrapped Feature

Getting your Trinity Audio player ready...

Music listening has been an integral part of our lives and over the last few decades, music listening has come a long way. Gone are the days when you had to buy vinyl records, cassettes and CDs (I used to collect a lot), some of you even bought digital downloads (Eminem, Green Day, etc. were on all rage at the time), now all you have to do is to download an app that you can play at any time of the day, keeping the device in your pocket. Such apps are nothing but music streaming services that gives you access to millions of songs where you are charged with a subscription fee, billed monthly or annually so that you don’t have to purchase individual songs or albums.

Spotify is probably the best music streaming service available out there (at least for most people). Well, there are other big players like Apple Music, Amazon Music Unlimited and YouTube Music but Spotify fares better than its competitors, at least according to this article.

It is no secret that Spotify has been one of the biggest customers of Google Cloud Platform since February 2016, and in 2018, Spotify disclosed that it would spend at least $450 million on its Google Cloud infrastructure in the following three years. This was a massive investment coming from a music streaming service at that time.

Spotify’s Journey from on-premise to the Cloud

In December of 2019, Spotify dropped the Wrapped feature as an early Christmas gift to its users. This feature gave its users a look back at their listening habits. There is also a new A Decade Wrapped feature that showcases the songs, albums, artists and podcasts people tuned in to over the past 10 years.

From 2010 to 2019, you’ve likely discovered new tunes and podcasts, fallen back in love with old favorites, and maybe even grown to enjoy a new genre or two. That’s why this year, we’re not only bringing back your annual personalized “Spotify Wrapped,” but we’re also showcasing our users’ listening throughout the last decade.

Spotify

It is in 2019 when Spotify had run the largest ever Google Cloud Dataflow pipeline job. This allowed them to scale immensely with less operational overhead. Google released Cloud Dataflow in early 2015, as a cloud product based on two of the internal systems of Google for batch and streaming data processing. Dataflow introduced a unified model to batch and streaming that consolidates ideas from these previous systems, and Google later donated the model and SDK code to the Apache Software Foundation as Apache Beam.

Spotify in Google Cloud

Spotify has built and open-sourced a big data processing Scala API for Apache Beam and Google Cloud Dataflow called Scio. The Spotify Wrapped 2019 included the lists from the year and the decade, Spotify states that they worked closely with Google Cloud’s engineering teams and specialists and learned from their previous mistakes which helped them to run one of the most complex jobs they have ever written.

Apart from Dataflow, Spotify has used a technique where Bigtable is used which is ideal for storing very large amounts of single-keyed data with very low latency. This resulted in saving a lot of the expenses and the team has developed several tools which they are still using to date. These tools help them to process a huge bulk of data in parallel, which results in saving them money.

More References

Google’s Internal Applications replaced by Dataflow

FlumeJava: https://research.google/pubs/pub35650/

MillWheel: https://research.google/pubs/pub41378/

Spotify’s Big Data processing Journey

https://engineering.atspotify.com/2017/10/16/big-data-processing-at-spotify-the-road-to-scio-part-1/

Spotify’s The Top Songs, Artists, Playlists, and Podcasts of 2019—and the Last Decade

Comments are closed.