{"id":110,"date":"2020-08-05T12:35:04","date_gmt":"2020-08-05T12:35:04","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/08\/05\/netflixs-polynote-is-a-new-open-source-framework-to-build-better-data-science-notebooks\/"},"modified":"2020-08-05T12:35:04","modified_gmt":"2020-08-05T12:35:04","slug":"netflixs-polynote-is-a-new-open-source-framework-to-build-better-data-science-notebooks","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/08\/05\/netflixs-polynote-is-a-new-open-source-framework-to-build-better-data-science-notebooks\/","title":{"rendered":"Netflix\u2019s Polynote is a New Open Source Framework to Build Better Data Science Notebooks"},"content":{"rendered":"<div id=\"post-\">\n<div>\n<img src=\"https:\/\/i.ibb.co\/QKYMmnc\/rodriguez-polynote-0.png\" alt=\"Figure\" width=\"100%\"><br \/><span><\/p>\n<p><\/span>\n<\/div>\n<p>\u00a0<\/p>\n<blockquote>\n<p>\nI recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:\n<\/p>\n<\/blockquote>\n<p>\u00a0<\/p>\n<h3>TheSequence<\/h3>\n<p>\u00a0<\/p>\n<div><a href=\"https:\/\/thesequence.substack.com\/\" rel=\"noopener noreferrer\" target=\"_blank\"><img src=\"\/wp-content\/uploads\/rodriguez-the-sequence-newsletter.jpg\" alt=\"Image\" width=\"100%\"><\/a><\/div>\n<p>\u00a0<\/p>\n<p>Notebooks are the data scientist best friend and can also be a nightmare to work with. For someone accustomed to work with modern integrated develop environments(IDEs), working with notebooks feels like going back decades. Furthermore, modern notebook environments is mostly constrained to Python programs and lack first-class support for other programming languages. A few days ago,\u00a0<a href=\"https:\/\/github.com\/polynote\/polynote\" rel=\"noopener noreferrer\" target=\"_blank\">Netflix open sourced Polynote<\/a>, a new notebook environment that addresses some of those challenges.<\/p>\n<p>Polynote was born out of the necessity to accelerate data science experimentation at Netflix. Over the years, Netflix has built\u00a0<a href=\"https:\/\/www.slideshare.net\/FaisalZakariaSiddiqi\/ml-infra-for-netflix-recommendations-ai-nextcon-talk\" rel=\"noopener noreferrer\" target=\"_blank\">a world-class machine learning platform<\/a>\u00a0mostly based on JVM languages like Scala. The support for those languages in mainstream notebook technologies such as Jupyter is fundamentally basic so they needed a better solutions. Polynote was initiated by that basic requirement but incorporated the lessons learned building one of the most ambitious notebook-based experimentation platforms in the data science world.<\/p>\n<p>\u00a0<\/p>\n<h3>Inside Netflix\u2019 Notebook Drive Architecture<\/h3>\n<p>\u00a0<br \/>Over the last few years, Netflix has transformed its use of data science notebooks from an experimentation artifact to a key component of the lifecycle of machine learning solutions. Initially, Netflix adopted Jupyter Notebooks like a data exploration and analysis tools. However, the engineering team quickly realized that Jupyter offered tangible advantages in terms of runtime abstraction, extensibility, interpretability of the code and debugging that could have a major impact in data science workloads if used correctly. In order to expand the use of Jupyter as a data science runtime, the Netflix team needed to solve a few major challenges:<\/p>\n<ul>\n<li>\n<strong>The Code-Output Mismatch:<\/strong>\u00a0Notebooks are frequently changed and, many times, the output you are seeing in the environment does not correspond to the current code.\n<\/li>\n<li>\n<strong>The Server Requirement:<\/strong>\u00a0Notebooks typically require a Notebook server runtime to run which represents an architecture challenge when adopted at scale.\n<\/li>\n<li>\n<strong>Scheduling:<\/strong>\u00a0Most data science models need to be executed on a periodic basics but the tools for scheduling Notebooks are still fairly limited.\n<\/li>\n<li>\n<strong>Parametrizing:<\/strong>\u00a0Notebooks are fairly static code-environments and the processes for passing input parameters are far from trivial.\n<\/li>\n<li>\n<strong>Integration Testing:<\/strong>\u00a0Notebooks are isolated code- environments which notoriously difficult to integrate with other Notebooks. As a result, tasks like integration testing become a nightmare when using Notebooks.\n<\/li>\n<\/ul>\n<p>To address those requirements, Netflix built a very ambitious architecture that enable the operationalization of Jupyter notebooks. The initial implementation included technologies such as\u00a0<a href=\"https:\/\/github.com\/nteract\/papermill\" rel=\"noopener noreferrer\" target=\"_blank\">Papermill<\/a>\u00a0which enables the parametrization of notebooks.<\/p>\n<div>\n<img src=\"https:\/\/i.ibb.co\/yS5J5Q2\/rodriguez-polynote-1.png\" alt=\"Figure\" width=\"100%\"><br \/><span><\/p>\n<p><\/span>\n<\/div>\n<p>\u00a0<\/p>\n<p>While the initial notebook architecture at Netflix was certainly ambitious, it was also constrained Python programs. Now it was time to expand.<\/p>\n<p>\u00a0<\/p>\n<h3>Entering Polynote<\/h3>\n<p>\u00a0<br \/>Polynote is a multi-language notebook experimentation environment. In addition to Python, the current release supports languages such as SQL, Vega(visualizations) and, of course, Scala. The platform is also integrated with data science infrastructures such as Apache Spark. At its core, Polynote includes the following capabilities:<\/p>\n<p>a)\u00a0<strong>Improved Editing Experience:<\/strong>\u00a0Polynote tries to enable an editing experience closer to modern IDEs.<br \/>b)\u00a0<strong>Multi-Language Support:<\/strong>\u00a0Polynote introduces first-class support for Scala and other languages used in data science environmenhts.<br \/>c)\u00a0<strong>Data Visualization Improvements:<\/strong>\u00a0Polynote integrates native data visualizations into notebooks\u2019 dataset without the need of adding a lot of code.<br \/>d)\u00a0<strong>Configuration and Dependency Management:<\/strong>\u00a0Languages like Scala require complex package dependencies in its programs. Polynote saves the package dependency configuration within the notebook itself addressing some of the common challenges in this area experienced by JVM developers.<br \/>e)\u00a0<strong>Reproducibility:<\/strong>\u00a0The combination of code, data and execution results into a single document makes notebooks powerful, but also difficult to reproduce. Polynote includes reproducibility as a first-class capability of the framework.<\/p>\n<p>\u00a0<\/p>\n<h3>Improved Editing Experience<\/h3>\n<p>\u00a0<br \/>Polynote includes common features in IDEs such as code auto-completion or syntax error highlighting which improves the experience for data scientists and researchers building Notebooks. More of the editing capabilities are powered by the\u00a0<a href=\"https:\/\/microsoft.github.io\/monaco-editor\/\" rel=\"noopener noreferrer\" target=\"_blank\">Monaco<\/a>\u00a0editor which powers the experience of Visual Studio Code.<\/p>\n<div>\n<img src=\"https:\/\/i.ibb.co\/kxFXnW9\/rodriguez-polynote-2.png\" alt=\"Figure\" width=\"100%\"><br \/><span><\/p>\n<p><\/span>\n<\/div>\n<p>\u00a0<\/p>\n<h3>Multi-Language Support<\/h3>\n<p>\u00a0<br \/>Polynote does not only provide support for multiple languages but it also allows those languages to be combined in a single program. In Polynote, every cell can be based on a different language. When a cell is run, the kernel provides the available typed input values to the cell\u2019s language interpreter. In turn, the interpreter provides the resulting typed output values back to the kernel. This allows cells in Polynote notebooks to operate within the same context. The example below shows a Python library, to compute an isotonic regression of a dataset generated with Scala.<\/p>\n<div>\n<img src=\"https:\/\/i.ibb.co\/y0zcMS8\/rodriguez-polynote-3.gif\" alt=\"Figure\" width=\"100%\"><br \/><span><\/p>\n<p><\/span>\n<\/div>\n<p>\u00a0<\/p>\n<h3>Data Visualization Improvements<\/h3>\n<p>\u00a0<br \/>Data visualizations are a common component of most notebook environment. However, Polynote takes the visualization value proposition to another level by including it as a native component of the platform which does not require developers to write any code in order to visually explore a dataset.<\/p>\n<div>\n<img src=\"https:\/\/i.ibb.co\/SVDRtpS\/rodriguez-polynote-4.gif\" alt=\"Figure\" width=\"100%\"><br \/><span><\/p>\n<p><\/span>\n<\/div>\n<p>\u00a0<\/p>\n<h3>Configuration and Dependency Management<\/h3>\n<p>\u00a0<br \/>Most of the time, data scientists working on notebooks can enjoy the efficiency of Python\u2019s package management model to handle the dependencies of a program. However, in JVM-languages like Scala dependency management can become a total night mare. Polynote addresses that challenge by storing the configuration and dependency information directly in the notebook itself, rather than relying on external files. Additionally, Polynote provides a user-friendly Configuration section where users can set dependencies for each notebook.<\/p>\n<div>\n<img src=\"https:\/\/i.ibb.co\/PQPJcNR\/rodriguez-polynote-5.png\" alt=\"Figure\" width=\"100%\"><br \/><span><\/p>\n<p><\/span>\n<\/div>\n<p>\u00a0<\/p>\n<h3>Reproducibility<\/h3>\n<p>\u00a0<br \/>With Polynote, Netflix a new code interpretation block instead of relying on a\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Read%E2%80%93eval%E2%80%93print_loop\" rel=\"noopener noreferrer\" target=\"_blank\">REPL<\/a>\u00a0model like a traditional notebook. One of the key capabilities of the new interpretation model is that it removes hidden states which allows data scientists to copy cells within a notebook without introducing any state from the previous position.<\/p>\n<div>\n<img src=\"https:\/\/i.ibb.co\/YZqc51k\/rodriguez-polynote-6.gif\" alt=\"Figure\" width=\"100%\"><br \/><span><\/p>\n<p><\/span>\n<\/div>\n<p>\u00a0<\/p>\n<p>Polynote is a new release in the ambitious competitive of data science notebooks but one that stands in its own merits. The support for JVM-based languages could make Polynote a favorite of developers working on Spark infrastructures. Also the editing and reproducatility capabilities are definitely welcomed enhancements to traditional notebook environments. Polynote is\u00a0<a href=\"https:\/\/github.com\/polynote\/polynote\" rel=\"noopener noreferrer\" target=\"_blank\">available in Github<\/a>\u00a0and you can also follow the\u00a0<a href=\"https:\/\/polynote.org\/\" rel=\"noopener noreferrer\" target=\"_blank\">project\u2019s website<\/a>.<\/p>\n<p>\u00a0<br \/><a href=\"https:\/\/medium.com\/dataseries\/netflixs-polynote-is-a-new-open-source-framework-to-build-better-data-science-notebooks-4bdab6b8d0ae\" target=\"_blank\" rel=\"noopener noreferrer\">Original<\/a>. Reposted with permission.<\/p>\n<p><b>Related:<\/b><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2020\/08\/netflix-polynote-open-source-framework-better-data-science-notebooks.html<\/p>\n","protected":false},"author":0,"featured_media":111,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/110"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=110"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/110\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/111"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=110"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=110"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=110"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}