{"id":8188,"date":"2021-04-05T00:18:08","date_gmt":"2021-04-05T00:18:08","guid":{"rendered":"https:\/\/wealthrevelation.com\/data-science\/2021\/04\/05\/easy-automl-in-python\/"},"modified":"2021-04-05T00:18:08","modified_gmt":"2021-04-05T00:18:08","slug":"easy-automl-in-python","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2021\/04\/05\/easy-automl-in-python\/","title":{"rendered":"Easy AutoML in Python"},"content":{"rendered":"<div id=\"post-\">\n   <!-- post_author Dylan Sherry -->  <\/p>\n<p><b>By <a href=\"https:\/\/www.linkedin.com\/in\/dylansherry\/\" target=\"_blank\" rel=\"noopener\">Dylan Sherry<\/a>, EvalML Team Lead<\/b><\/p>\n<p><img alt=\"Easy, Open-Source AutoML in Python with EvalML\" class=\"aligncenter\" src=\"https:\/\/innovation.alteryx.com\/content\/images\/size\/w2000\/2021\/02\/evalml_opengraph-3.png\" width=\"100%\"><\/p>\n<p>Alteryx hosts two open-source projects for modeling.<\/p>\n<p><strong><a href=\"https:\/\/featuretools.alteryx.com\/en\/stable\/\" rel=\"noopener\" target=\"_blank\">Featuretools<\/a><\/strong>\u00a0is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.<\/p>\n<p><strong><a href=\"https:\/\/github.com\/alteryx\/compose\" rel=\"noopener\" target=\"_blank\">Compose<\/a><\/strong>\u00a0is a tool for automated prediction engineering. It allows you to structure prediction problems and generate labels for supervised learning.<\/p>\n<p>We\u2019ve seen Featuretools and Compose enable users to easily combine multiple tables into transformed and aggregated features for machine learning, and to define time series supervised machine learning use-cases.<\/p>\n<p>The question we then asked was: what happens next? How can users of Featuretools and Compose build machine learning models in a simple and flexible way?<\/p>\n<p>We\u2019re excited to announce that a new open-source project has joined the Alteryx open-source ecosystem.\u00a0<strong><a href=\"https:\/\/github.com\/alteryx\/evalml\" rel=\"noopener\" target=\"_blank\">EvalML<\/a><\/strong>\u00a0is a library for automated machine learning (AutoML) and model understanding, written in Python.<\/p>\n<div>\n<pre><code>import evalml\r\n\r\n# obtain features, a target and a problem type for that target\r\nX, y = evalml.demos.load_breast_cancer()\r\nproblem_type = 'binary'\r\nX_train, X_test, y_train, y_test = evalml.preprocessing.split_data(\r\n    X, y, problem_type=problem_type, test_size=.2)\r\n\r\n# perform a search across multiple pipelines and hyperparameters\r\nautoml = AutoMLSearch(X=x, y=y, problem_type=problem_type)\r\nautoml.search()\r\n\r\n# the best pipeline is already refitted on the entire training data\r\nbest_pipeline = automl.best_pipeline\r\nbest_pipeline.predict(X_test)<\/code><\/pre>\n<\/div>\n<div><img src=\"https:\/\/innovation.alteryx.com\/content\/images\/2021\/02\/automl_standard.gif\" alt=\"Figure\" width=\"100%\"><br \/><span><\/p>\n<p>EvalML&#8217;s AutoML search in action<\/p>\n<p><\/span><\/div>\n<p>\u00a0<\/p>\n<p>EvalML provides a simple, unified interface for building machine learning models, using those models to generate insights and to make accurate predictions. EvalML provides access to multiple modeling libraries under the same API. EvalML supports a variety of supervised machine learning problem types including regression, binary classification and multiclass classification. Custom objective functions let users phrase their search for a model directly in terms of what they value. Above all we\u2019ve aimed to make EvalML stable and performant, with ML performance testing on every release.<\/p>\n<p>\u00a0<\/p>\n<h3>What\u2019s Cool about EvalML<\/h3>\n<p>\u00a0<br \/><strong>1. Simple Unified Modeling API<\/strong><\/p>\n<p>EvalML reduces the amount of effort it takes to get to an accurate model, saving time and complexity.<\/p>\n<p>EvalML pipelines produced by AutoML include preprocessing and feature engineering steps out of the box. Once users have identified the target column of the data which they\u2019d like to model, EvalML\u2019s AutoML will run a search algorithm to train and score a collection of models, will enable users to select one or more models from that collection, and to then use those models for insight-driven analysis or to generate predictions.<\/p>\n<p>EvalML was designed to work well with\u00a0<a href=\"https:\/\/featuretools.com\/?__hstc=142826602.43730bd3179999cf11c14fbc47b01062.1613430843886.1613430843886.1613430843886.1&amp;__hssc=142826602.1.1613430843886&amp;__hsfp=264117289\" rel=\"noopener\" target=\"_blank\">Featuretools<\/a>, which can integrate data from multiple tables and generate features to turbocharge ML models, and with\u00a0<a href=\"https:\/\/compose.alteryx.com\/\" rel=\"noopener\" target=\"_blank\">Compose<\/a>, a tool for label engineering and time series aggregation. EvalML users can easily control how EvalML will treat each inputted feature, as a numeric feature, a categorical feature, text, date-time, etc.<\/p>\n<div><img src=\"https:\/\/innovation.alteryx.com\/content\/images\/2021\/02\/image-1.png\" alt=\"Figure\" width=\"100%\"><br \/><span><\/p>\n<p>You can use Compose and Featuretools with EvalML to build machine learning models<\/p>\n<p><\/span><\/div>\n<p>\u00a0<\/p>\n<p>EvalML models are represented using a pipeline data structure, composed of a graph of components. Every operation applied to your data by AutoML is recorded in the pipeline. This makes it easy to turn from selecting a model to deploying a model. It&#8217;s also easy to define custom components, pipelines and objectives in EvalML, whether for use in AutoML or as standalone elements.<\/p>\n<p>\u00a0<br \/><strong>2. Domain-Specific Objective Functions<\/strong><\/p>\n<p>EvalML supports defining custom objective functions which you can tailor to match your data and your domain. This allows you to articulate what makes a model valuable in your domain, and to then use AutoML to find models which deliver that value.<\/p>\n<p>The custom objectives are used to rank models on the AutoML leaderboard during and after the search process. Using a custom objective will help guide the AutoML search towards models which are the highest impact. Custom objectives will also be used by AutoML to tune the classification threshold of binary classification models.<\/p>\n<p>The EvalML documentation provides\u00a0<a href=\"https:\/\/evalml.alteryx.com\/en\/v0.18.1\/demos\/lead_scoring.html\" rel=\"noopener\" target=\"_blank\">examples of custom objectives<\/a>\u00a0and how to use them effectively.<\/p>\n<p>\u00a0<br \/><strong>3. Model Understanding<\/strong><\/p>\n<p>EvalML grants access to a variety of models and tools for model understanding. Currently supported are feature importance and permutation importance, partial dependence, precision-recall, confusion matrices, ROC curves, prediction explanations, and binary classifier threshold optimization.<\/p>\n<div><img src=\"https:\/\/innovation.alteryx.com\/content\/images\/2021\/02\/Screen-Shot-2021-02-04-at-7.56.29-PM.png\" alt=\"Figure\" width=\"100%\"><br \/><span><\/p>\n<p><\/span><\/div>\n<p>\u00a0<\/p>\n<p>\u00a0<br \/><strong>4. Data Checks<\/strong><\/p>\n<p>EvalML&#8217;s data checks can catch common problems with your data prior to modeling, before they cause model quality problems or mysterious bugs and stack traces. Current data checks include a simple approach to detecting\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Leakage_(machine_learning)\" rel=\"noopener\" target=\"_blank\">target leakage<\/a>, where the model is given access to information during training which won\u2019t be available at prediction-time, detection of invalid datatypes, high class imbalance, highly null columns, constant columns, and columns which are likely an ID and not useful for modeling.<\/p>\n<p><img alt=\"\" class=\"aligncenter\" src=\"https:\/\/innovation.alteryx.com\/content\/images\/2021\/02\/target_leakage_2.gif\" width=\"100%\"><\/p>\n<p>\u00a0<\/p>\n<h3>Getting Started Using EvalML<\/h3>\n<p>\u00a0<br \/>You can get started with EvalML by visiting\u00a0<a href=\"http:\/\/evalml.alteryx.com\/\" rel=\"noopener\" target=\"_blank\">our documentation page<\/a>, where we have\u00a0<a href=\"https:\/\/evalml.alteryx.com\/en\/stable\/install.html\" rel=\"noopener\" target=\"_blank\">installation instructions<\/a>\u00a0as well as\u00a0<a href=\"https:\/\/evalml.alteryx.com\/en\/stable\/tutorials.html\" rel=\"noopener\" target=\"_blank\">tutorials<\/a>\u00a0which show examples of how to use EvalML,\u00a0<a href=\"https:\/\/evalml.alteryx.com\/en\/stable\/user_guide.html\" rel=\"noopener\" target=\"_blank\">a user guide<\/a>\u00a0which describes the components and core concepts of EvalML,\u00a0<a href=\"https:\/\/evalml.alteryx.com\/en\/stable\/api_reference.html\" rel=\"noopener\" target=\"_blank\">API reference<\/a>\u00a0and more. The EvalML codebase lives at\u00a0<a href=\"https:\/\/github.com\/alteryx\/evalml\" rel=\"noopener\" target=\"_blank\">https:\/\/github.com\/alteryx\/evalml<\/a>. To get in touch with the team, check out our\u00a0<a href=\"https:\/\/join.slack.com\/t\/alteryx-oss\/shared_invite\/zt-6inxevps-RSbpr9lsACE1kObXz4rIuA\" rel=\"noopener\" target=\"_blank\">open-source slack<\/a>. We are actively contributing to the repository and will respond to any issues you post.<\/p>\n<p>\u00a0<\/p>\n<h3>What\u2019s Next?<\/h3>\n<p>\u00a0<br \/>EvalML has an active feature roadmap, including time series modeling, parallel evaluation of pipelines during AutoML, upgrades to the AutoML algorithm, new model types and preprocessing steps, tools for model debugging and model deployment, support for anomaly detection, and much more.<\/p>\n<p>Want to hear more? If you\u2019re interested in hearing about updates as the project continues, please take a moment to follow this blog, star\u00a0<a href=\"https:\/\/github.com\/alteryx\/evalml\" rel=\"noopener\" target=\"_blank\">our repo in GitHub<\/a>, and stay tuned for more features and content on the way!<\/p>\n<p>\u00a0<br \/><b>Bio: <a href=\"https:\/\/www.linkedin.com\/in\/dylansherry\/\" target=\"_blank\" rel=\"noopener\">Dylan Sherry<\/a><\/b> is a software engineer who leads the team building the EvalML AutoML package. Dylan has a decade of experience working on automated modeling technologies.<\/p>\n<p><a href=\"https:\/\/innovation.alteryx.com\/introducing-evalml\/\" target=\"_blank\" rel=\"noopener\">Original<\/a>. Reposted with permission.<\/p>\n<p><b>Related:<\/b><\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2021\/04\/easy-automl-python.html<\/p>\n","protected":false},"author":0,"featured_media":8189,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8188"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=8188"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8188\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/8189"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=8188"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=8188"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=8188"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}