{"id":2931,"date":"2020-10-07T12:09:10","date_gmt":"2020-10-07T12:09:10","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/10\/07\/5-challenges-to-scaling-machine-learning-models\/"},"modified":"2020-10-07T12:09:10","modified_gmt":"2020-10-07T12:09:10","slug":"5-challenges-to-scaling-machine-learning-models","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/10\/07\/5-challenges-to-scaling-machine-learning-models\/","title":{"rendered":"5 Challenges to Scaling Machine Learning Models"},"content":{"rendered":"<div id=\"post-\">\n<p><b>By <a href=\"https:\/\/www.sigmoid.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Sigmoid Analyitcs<\/a><\/b><\/p>\n<p><img class=\"aligncenter\" data-sizes=\"(max-width: 800px) 100vw, 1140px\" src=\"https:\/\/www.sigmoid.com\/wp-content\/uploads\/2020\/08\/sigmoid-blogs-putting_ml_models_production.jpg\" width=\"100%\"><br \/>\u00a0<\/p>\n<p>Machine Learning (ML) models are designed for defined business goals. ML model\u00a0productionizing refers to hosting, scaling, and running an ML Model on top of\u00a0relevant datasets. ML models in production also need to be resilient and flexible for\u00a0future changes and feedback. A recent study by\u00a0<a href=\"https:\/\/www.rtinsights.com\/forrester-ml-development\/\" rel=\"noopener noreferrer\" target=\"_blank\">Forrester<\/a>\u00a0states that improving customer experience, improving profitability &amp; revenue growth as the key goals organizations plan to achieve specifically using ML initiatives.<\/p>\n<p>Though gaining worldwide acclaim, ML models are hard to be translated into active\u00a0business gains. A plethora of engineering, data, and business concerns become\u00a0bottlenecks while handling live data and putting ML models into production.\u00a0As per our poll, 43% of people said they get roadblocked in ML model\u00a0production and integration. It is important to ensure that ML models deliver their end objectives as intended by businesses as their adoption across organizations globally is increasing at an unprecedented rate, thanks to robust and inexpensive open source infrastructure.\u00a0\u00a0<a href=\"https:\/\/www.gartner.com\/smarterwithgartner\/gartner-predicts-the-future-of-ai-technologies\/\" rel=\"noopener noreferrer\" target=\"_blank\">Gartner\u00a0<\/a>predicts that over 40% of world\u2019s leading organizations plan to actually deploy AI solutions by the end of 2020. In order to understand the common pitfalls in productionizing ML models, let\u2019s dive into the top 5 challenges that organizations face.<\/p>\n<p>\u00a0<\/p>\n<h3><b>1. Complexities with Data<\/b><\/h3>\n<p>\u00a0<br \/>One would need about a million relevant records to train an ML model on top of the\u00a0data. And it cannot be just any data.\u00a0Data feasibility and predictability risks jump into the picture. Assessing if we have\u00a0relevant data sets and do we get them fast enough to do predictions on top isn\u2019t straightforward. Getting contextual data is also a problem. In one of Sigmoid\u2019s ML scaling with Yum\u00a0Brands, some of the company\u2019s products like KFC (with a new royalty program)\u00a0didn\u2019t have enough customer data.\u00a0Having data isn\u2019t enough either. Most ML teams start with a non data-lake approach\u00a0and train ML models on top of their traditional data warehouses. With traditional\u00a0data systems, data scientists often spend 80% of their time in cleaning and\u00a0managing data rather than training models.\u00a0A strong governance system and data cataloging are also required so that data is\u00a0shared transparently and gets cataloged well to be leveraged again.\u00a0Due to the data complexity, the cost of maintaining and running an ML model\u00a0relative to the return diminishes over time.<\/p>\n<p>\u00a0<\/p>\n<h3><b>2. Engineering and Deployment<\/b><\/h3>\n<p>\u00a0<br \/>Once the data is available, the infrastructure and technical stacks have to be\u00a0finalized as per the use case and future resilience.\u00a0ML systems can be quite difficult to engineer. A wide breadth of technology is\u00a0available in the machine learning space. Standardizing different technology stacks in\u00a0different areas while choosing each one such that it wouldn\u2019t make productionizing\u00a0harder is crucial for the model\u2019s success.\u00a0For instance, Data scientists may use tools like Pandas and code in Python. But these\u00a0don\u2019t necessarily translate well to a production environment where Spark or Pyspark\u00a0is more desirable.\u00a0Improperly engineered technical solutions can cost quite a bit. And then the lifecycle\u00a0challenges and managing and stabilizing multiple models in production can become\u00a0unwieldy too.<\/p>\n<p>\u00a0<\/p>\n<h3><b>3. Integration Risks<\/b><\/h3>\n<p>\u00a0<br \/>A scalable production environment that is well integrated with different datasets\u00a0and modeling technologies is crucial for the ML model to be successful.\u00a0Integrating different teams and operational systems is always challenging.\u00a0Complicated codebases have to made into well-structured systems ready to be\u00a0pushed into production. In the absence of a standardized process to take a model to\u00a0production, the team can get stuck at any stage.\u00a0Workflow automation is necessary for different teams to integrate into the\u00a0workflow system and test. If the model isn\u2019t tested at the right stage, the entire\u00a0ecosystem would have to be fixed at the end.\u00a0Technology stacks have to be standardized else integration could be a real\u00a0nightmare.\u00a0Integration is also a crucial time to make sure that the Machine Learning\u00a0experimentation framework isn\u2019t a one-time wonder. Else if the business\u00a0environment changes or during a catastrophic event, the model would cease to\u00a0provide value.<\/p>\n<p>\u00a0<\/p>\n<h3><b>4. Testing and Model Sustenance<\/b><\/h3>\n<p>\u00a0<br \/>Testing machine learning models is difficult but is as important, if not more, as other\u00a0steps of the production process.\u00a0Understanding results, running health checks, monitoring model performance,\u00a0watching out for data anomalies, and retraining the model together close the entire\u00a0productionizing cycle.\u00a0Even after running the tests, a proper machine learning lifecycle management tool\u00a0might be needed to watch out for issues that are invisible in tests.<\/p>\n<p>\u00a0<\/p>\n<h3><b>5. Assigning Roles and Communication<\/b><\/h3>\n<p>\u00a0<br \/>Maintaining transparent communication across data science, data engineering,\u00a0DevOps, and other relevant teams is pivotal to ML models\u2019 success. But assigning\u00a0roles, giving detailed access, and monitoring for every team is complex.\u00a0Strong collaboration and an overdose of communication are essential to identify risk\u00a0across different areas at an early stage. Keeping data scientists deeply involved can\u00a0also decide the future of the ML model.<\/p>\n<p>In addition to the above challenges, unforeseen events such as the COVID-19 have\u00a0to be watched out for. When the customer\u2019s buying behaviors suddenly change, the\u00a0solutions from the past cease to apply and the absence of new data to adequately\u00a0train models becomes a roadblock.\u00a0Scaling ML models isn\u2019t easy. Watch out for our next piece on the best practices to\u00a0productionize ML models at scale.<\/p>\n<p>\u00a0<br \/><a href=\"https:\/\/www.sigmoid.com\/blogs\/5-challenges-to-be-prepared-for-before-scaling-machine-learning-models\/\" target=\"_blank\" rel=\"noopener noreferrer\">Original<\/a>. Reposted with permission.<\/p>\n<p><b>Related:<\/b><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2020\/10\/5-challenges-scaling-machine-learning-models.html<\/p>\n","protected":false},"author":0,"featured_media":2932,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/2931"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=2931"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/2931\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/2932"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=2931"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=2931"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=2931"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}