{"id":1577,"date":"2020-09-15T14:37:26","date_gmt":"2020-09-15T14:37:26","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/09\/15\/heres-what-you-need-to-look-for-in-a-model-server-to-build-ml-powered-services\/"},"modified":"2020-09-15T14:37:26","modified_gmt":"2020-09-15T14:37:26","slug":"heres-what-you-need-to-look-for-in-a-model-server-to-build-ml-powered-services","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/09\/15\/heres-what-you-need-to-look-for-in-a-model-server-to-build-ml-powered-services\/","title":{"rendered":"Here\u2019s what you need to look for in a model server to build ML-powered services"},"content":{"rendered":"<div id=\"post-\">\n<p><b>By <a href=\"https:\/\/twitter.com\/bigdata\" target=\"_blank\" rel=\"noopener noreferrer\">Ben Lorica<\/a> (helping organize #Raysummit) and <a href=\"https:\/\/people.eecs.berkeley.edu\/~istoica\/\" target=\"_blank\" rel=\"noopener noreferrer\">Ion Stoica<\/a> (Berkeley, Anyscale)<\/b><\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/anyscale.com\/wp-content\/uploads\/2020\/08\/Harpa-Kalkofnsvegur-Reykjavi%CC%81k-Detail-1-crop.jpg\" width=\"90%\"><\/p>\n<p>Machine learning is being embedded in applications that involve many data types and data sources. This means that software developers from different backgrounds need to work on projects that involve ML. In our\u00a0<a href=\"https:\/\/anyscale.com\/blog\/five-key-features-for-a-machine-learning-platform\/\" target=\"_blank\" rel=\"noopener noreferrer\">previous post<\/a>, we listed key features that machine learning platforms need to have in order to meet current and future workloads. We also described MLOps, a set of practices focused on productionizing the machine learning lifecycle.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/lh4.googleusercontent.com\/UhAyncKNC02LpYrgL0FWlHkEGbMHnCWMXSt5e8i0YcgLULhyNw9iu799AFqgFKpE7WvRGIxtaJ_UI3K_Lk6T3ujYZss8nurPK_deYEbCf7bykxOmWrjcBHz-ROJv6M_DM54YGo6A\" width=\"90%\"><\/p>\n<p>In this post, we focus on model servers, software at the heart of machine learning services that operate in real-time or offline. There are two common approaches used for serving machine learning models. The first approach embeds model evaluation in a web server (e.g., Flask) as an API service endpoint dedicated to a prediction service.<\/p>\n<p>The second approach offloads model evaluation to a separate service. This is an active area for startups, and there are a growing number of options that fall into this category. Offerings include services from\u00a0<em>cloud providers<\/em>\u00a0(<a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/how-it-works-hosting.html\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker<\/a>,\u00a0<a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/machine-learning\/how-to-deploy-and-where\" target=\"_blank\" rel=\"noopener noreferrer\">Azure<\/a>,\u00a0<a href=\"https:\/\/cloud.google.com\/ai-platform\/prediction\/docs\/deploying-models\" target=\"_blank\" rel=\"noopener noreferrer\">Google Cloud<\/a>),\u00a0<em>open source projects for model serving<\/em>\u00a0(<a href=\"https:\/\/docs.ray.io\/en\/master\/serve\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\">Ray Serve<\/a>, Seldon, TorchServe, TensorFlow Serving, etc.),\u00a0<em>proprietary software<\/em>\u00a0(SAS, Datatron, ModelOp, etc.), and bespoke solutions usually written in some generic framework.<\/p>\n<p>While machine learning can be used for one-off projects, most developers seek to embed machine learning across their products and services. Model servers are important components of software infrastructure for productionizing machine learning, and as such, companies need to carefully evaluate their options. This post focuses on key features companies should look for in a model server.<\/p>\n<p>\u00a0<\/p>\n<h3>Support for popular toolkits<\/h3>\n<p>\u00a0<\/p>\n<p>Your model server is probably separate from your model training system. Choose a model server that is able to use a trained model artifact produced using a number of popular tools. Developers and machine learning engineers build models using many different libraries, including ones for deep learning (PyTorch, TensorFlow), machine learning, and statistics (scikit-learn, XGBoost, SAS,\u00a0<a href=\"https:\/\/www.statsmodels.org\/stable\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\">statsmodel<\/a>). Model builders also continue to use a variety of programming languages. While Python has emerged as the dominant language for machine learning, other languages like R, Java, Scala, Julia, SAS, still have many users as well. More recently, many companies have implemented data science workbenches like Databricks, Cloudera, Dataiku, Domino Data Lab, and others.<\/p>\n<p>\u00a0<\/p>\n<h3>A GUI for model deployment and more<\/h3>\n<p>\u00a0<\/p>\n<p>Developers may use a command line interface, but enterprise users will want a graphical user interface that guides them through the process of deploying models and highlights the different stages of the machine learning lifecycle. As the deployment processes mature, they might migrate more to scripting and automation. Model servers with user interfaces include\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=iTVY4GI1bhs\" target=\"_blank\" rel=\"noopener noreferrer\">Seldon Deploy<\/a>,\u00a0<a href=\"https:\/\/www.sas.com\/en_us\/software\/model-manager.html\" target=\"_blank\" rel=\"noopener noreferrer\">SAS Model Manager<\/a>,\u00a0<a href=\"https:\/\/www.datatron.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Datatron<\/a>, and others that target enterprise users.<\/p>\n<p>\u00a0<\/p>\n<h3>Easy to operate and deploy, but with high-performance and scalability<\/h3>\n<p>\u00a0<\/p>\n<p>As machine learning gets embedded in critical applications, companies will need low-latency model servers that can power large-scale prediction services. Companies like Facebook and Google have machine learning services that provide real-time responses\u00a0<a href=\"https:\/\/engineering.fb.com\/ml-applications\/transitioning-entirely-to-neural-machine-translation\/\" target=\"_blank\" rel=\"noopener noreferrer\">billions of times each day<\/a>. While these might be extreme cases, many companies also deploy applications like\u00a0<a href=\"https:\/\/www.sigarch.org\/deep-learning-its-not-all-about-recognizing-cats-and-dogs\/\" target=\"_blank\" rel=\"noopener noreferrer\">recommendation and personalization systems<\/a>\u00a0that interact with many users on a daily basis. With the availability of open source software like Ray Serve, companies now have access to low-latency model servers that can scale to many machines.<\/p>\n<p>Most model servers use a microservice architecture and are accessible through a REST or gRPC API.\u00a0 This makes it easier to integrate machine learning (\u201crecommender\u201d) with other services (\u201cshopping cart\u201d). Depending on your setup, you may want a model server that lets you deploy models on the cloud, on-premise, or both. Your model server has to participate in infrastructure features like auto-scaling, resource management, and hardware provisioning.<\/p>\n<p>Some model servers added recent innovations that reduce complexity, boost performance, and provide flexible options for integrating with other services. With the introduction of a new Tensor data type, RedisAI supports\u00a0<em>data locality<\/em>\u00a0\u2013 a feature that enables users to get and set Tensors from their favorite client and \u201crun their AI model where their data lives.\u201d Ray Serve\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=fABgQ5hA4qI&amp;feature=youtu.be&amp;t=1361\" target=\"_blank\" rel=\"noopener noreferrer\">brings model evaluation logic closer to business logic\u00a0<\/a>by giving developers end-to-end control from the API endpoint to model evaluation, and back to the API endpoint. In addition, Ray Serve is easy to operate and is as easy to deploy as a simple web server.<\/p>\n<p>\u00a0<\/p>\n<h3>Includes tools for testing, deployment, and rollouts<\/h3>\n<p>\u00a0<\/p>\n<p>Once a model is trained, it has to be reviewed and tested before it gets deployed. Seldon Deploy, Datatron, and other model servers have some interesting capabilities that let you test models with a single prediction or using a load test. To facilitate error identification and testing, these model servers also let you upload test data and visualize test predictions.<\/p>\n<p>Once your model has been reviewed and tested, your model server should give you the ability to safely promote and demote models. Other popular rollout patterns include:<\/p>\n<ul>\n<li>\n<a href=\"https:\/\/rollout.io\/blog\/canary-deployment\/\" target=\"_blank\" rel=\"noopener noreferrer\">Canary<\/a>: A small part of requests are sent to the new model, while the bulk of requests get routed to an existing model.<\/li>\n<li>\n<a href=\"https:\/\/www.getambassador.io\/docs\/latest\/topics\/using\/shadowing\/\" target=\"_blank\" rel=\"noopener noreferrer\">Shadowing<\/a>: Production traffic is copied to a non-production service to test the model before running it in production.<\/li>\n<\/ul>\n<p>Ideally, rollout tools are completely automatable, so your deployments tools can be plugged into your CI\/CD or MLOps process.<\/p>\n<p>\u00a0<\/p>\n<h3>Support for complex deployment patterns<\/h3>\n<p>\u00a0<\/p>\n<p>As you increase your usage of machine learning, your model server should be able to support many models in production. Your model server should also support complex deployment patterns that involve deploying more than one model at a time. It should support a variety of patterns, including:<\/p>\n<ul>\n<li>\n<strong>A\/B tests<\/strong>: A fraction of predictions use one model, and the rest go to another model.<\/li>\n<li>\n<strong>Ensembles<\/strong>: Multiple models are combined to form a more powerful predictive model.<\/li>\n<li>\n<strong>Cascade<\/strong>: If a baseline model produces a prediction with low confidence, traffic is routed to an alternative model. Another use case is refinement: detect whether there is a car in the picture, and, if there is one, send the picture to a model that reads the car\u2019s license plate.<\/li>\n<li>\n<strong>Multi-arm bandit<\/strong>:\u00a0 A form of reinforcement learning, bandits allocate traffic across several competing models.<\/li>\n<\/ul>\n<p>\u00a0<\/p>\n<h3>Out of the box metrics and monitoring<\/h3>\n<p>\u00a0<\/p>\n<p>Machine learning models can degrade over time, and it\u2019s important to have systems in place that indicate when models become less accurate or begin demonstrating bias and other unexpected behavior. Your model server should emit performance, usage, and other custom metrics that can be consumed by visualization and real-time monitoring tools. Some model servers are beginning to provide advanced capabilities, including anomaly detection and alerts. There are even startups (<a href=\"https:\/\/superwise.ai\/\" target=\"_blank\" rel=\"noopener noreferrer\">Superwise<\/a>,\u00a0<a href=\"https:\/\/techcrunch.com\/2020\/02\/18\/tubemogul-execs-launch-arize-ai-for-ai-troublehsooting\/\" target=\"_blank\" rel=\"noopener noreferrer\">Arize<\/a>) that focus on using \u201cmachine learning to monitor machine learning.\u201d While these are currently specialized tools that are separate from and need to be integrated with model servers, it\u2019s quite likely that some model servers will build advanced monitoring and observability capabilities into their offerings.<\/p>\n<p>\u00a0<\/p>\n<h3>Integrates with model management tools<\/h3>\n<p>\u00a0<\/p>\n<p>As you deploy more models to production, your model server will need to integrate with your model management tools. These tools come under many labels \u2013 access control, model catalog, model registry, model governance dashboard \u2013 but in essence, they provide you with a 360-degree view of past and current models.<\/p>\n<p>Because models will need to be periodically inspected, your model server should interface with services for auditing and reproducing models.\u00a0<em>Model versioning<\/em>\u00a0is now standard and comes with most of the model servers we examined. Datatron has a model governance dashboard that provides tools for auditing underperforming models. Many model servers have\u00a0<em>data lineage<\/em>\u00a0services that record when requests were sent and what the model inputs and outputs were. Debugging and auditing models also require a refined understanding of their key drivers. Seldon Deploy integrates with\u00a0<a href=\"https:\/\/github.com\/SeldonIO\/alibi\" target=\"_blank\" rel=\"noopener noreferrer\">an open source tool<\/a>\u00a0for model inspection and explainability.<\/p>\n<p>\u00a0<\/p>\n<h3>Unifies batch and online scoring<\/h3>\n<p>\u00a0<\/p>\n<p>Suppose you updated your model, or that you were sent a large number of new records. In both of these examples, you may need to apply your model to a large dataset. You will need a model server that can score large datasets efficiently in mini-batches, as well as provide low-latency, online scoring (e.g., Ray Serve supports batch and online scoring).<\/p>\n<p>\u00a0<\/p>\n<h3>Summary<\/h3>\n<p>\u00a0<\/p>\n<p>As machine learning gets embedded in more software applications, companies need to select their model servers carefully. While\u00a0<a href=\"https:\/\/docs.ray.io\/en\/master\/serve\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\">Ray Serve<\/a>\u00a0is a relatively new open source model server, it already has many of the features we\u2019ve listed in this post. Ray Serve is a scalable, simple, and flexible tool for deploying, operating, and monitoring machine learning models. As we noted in our\u00a0<a href=\"https:\/\/anyscale.com\/blog\/five-key-features-for-a-machine-learning-platform\/\" target=\"_blank\" rel=\"noopener noreferrer\">previous post<\/a>, we believe that Ray and Ray Serve will be foundations of many ML platforms in the future.<\/p>\n<p><a href=\"https:\/\/anyscale.com\/blog\/heres-what-you-need-to-look-for-in-a-model-server-to-build-ml-powered-services\/\" target=\"_blank\" rel=\"noopener noreferrer\">Original<\/a>. Reposted with permission.<\/p>\n<p>\u00a0<\/p>\n<p><strong>Bio:<\/strong> <a href=\"https:\/\/twitter.com\/bigdata\" target=\"_blank\" rel=\"noopener noreferrer\">Ben Lorica<\/a>\u00a0organizes #SparkAISummit and #raysummit, and has been the Program Chair of Strataconf and OReillyAI.<\/p>\n<p><b>Related:<\/b><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2020\/09\/model-server-build-ml-powered-services.html<\/p>\n","protected":false},"author":0,"featured_media":1578,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/1577"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=1577"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/1577\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/1578"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=1577"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=1577"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=1577"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}