{"id":8081,"date":"2021-01-11T17:31:33","date_gmt":"2021-01-11T17:31:33","guid":{"rendered":"https:\/\/wealthrevelation.com\/data-science\/2021\/01\/11\/5-tools-for-effortless-data-science\/"},"modified":"2021-01-11T17:31:33","modified_gmt":"2021-01-11T17:31:33","slug":"5-tools-for-effortless-data-science","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2021\/01\/11\/5-tools-for-effortless-data-science\/","title":{"rendered":"5 Tools for Effortless Data Science"},"content":{"rendered":"<div id=\"post-\">\n   <!-- post_author Nicole Janeway Bills -->  <\/p>\n<div><img src=\"https:\/\/miro.medium.com\/max\/4512\/1*qj2QMbiaJAm64-gNE0j7SA.jpeg\" alt=\"Figure\" width=\"100%\"><br \/><span><\/p>\n<p><\/span><\/div>\n<p>\u00a0<\/p>\n<p>In Stephen Covey\u2019s masterful\u00a0<em>7 Habits of Highly Effective People<\/em>, the seventh habit is \u201csharpen the saw.\u201d This refers to enhancing our assets to seek continuous improvement in our work. As Abe Lincoln said,<\/p>\n<blockquote>\n<p>\nGive me eight hours to chop down a tree, and I will spend the first six sharpening the saw.\n<\/p>\n<\/blockquote>\n<p>\u00a0<\/p>\n<p>Better tools to\u00a0<strong>structure, simplify, and broaden<\/strong>\u00a0our Data Science work will make us more effective thinkers, decisionmakers, and practitioners.<\/p>\n<p>In this article, we\u2019ll explore how to sharpen our Data Science saws \u2014 and also investigate the unanswered question of who is handing out saws to so many motivational speakers.<\/p>\n<p>Here are five tools for the practice of effortless Data Science.<\/p>\n<p>\u00a0<\/p>\n<h3>#1 \u2014 Cookiecutter<\/h3>\n<p>\u00a0<br \/><strong>Usecase<\/strong>: structure the repository of your Data Science project with this pre-built file structure setup.<\/p>\n<p>Data scientists should be organized in order to gather insights through repeatable projects.\u00a0<a href=\"https:\/\/drivendata.github.io\/cookiecutter-data-science\/\" rel=\"noopener\" target=\"_blank\">Cookiecutter by DrivenData<\/a>\u00a0helps us share and execute Data Science tasks with an organized repository structure. To get started, simply run\u00a0<code>cookiecutter\u00a0<a href=\"https:\/\/github.com\/drivendata\/cookiecutter-data-science\" rel=\"noopener\" target=\"_blank\">https:\/\/github.com\/drivendata\/cookiecutter-data-science<\/a><\/code>\u00a0from the command line. This creates the Cookiecutter file structure.<\/p>\n<p>Beginners benefit from the expertise of the DrivenData team in building best practices into this repo structure. Experts can use this template as a flexible jump-start to their projects.<\/p>\n<div><img src=\"https:\/\/miro.medium.com\/max\/572\/1*BKdsm-KNLzGr_tNFJst8nA.png\" alt=\"Figure\" width=\"80%\"><br \/><span><\/p>\n<p><\/span><\/div>\n<p>\u00a0<\/p>\n<p>Ultimately, Cookiecutter promotes logical standardization. That makes it easy for you, your collaborators, and project stakeholders to find data, notebooks, reports, visualizations, etc. Cookiecutter promotes reproducibility and code quality. Setting up your Data Science experiment with Cookiecutter is fast and supremely useful.<\/p>\n<p>Two additional tools referenced in the directory structure:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.sphinx-doc.org\/en\/master\/\" rel=\"noopener\" target=\"_blank\">Sphinx<\/a>\u00a0\u2014 documentation generator that will translate a set of plain text source files into various output formats, automatically generating cross-references\n<\/li>\n<li><a href=\"https:\/\/tox.readthedocs.io\/en\/latest\/\" rel=\"noopener\" target=\"_blank\">Tox<\/a>\u00a0\u2014<a href=\"https:\/\/towardsdatascience.com\/10-underrated-python-skills-dfdff5741fdf\" rel=\"noopener\" target=\"_blank\">\u00a0virtualenv<\/a>\u00a0management and test command line tool to ensure that packages will install correctly with different Python versions and interpreters; it can also act as a frontend to Continuous Integration servers\n<\/li>\n<\/ul>\n<p><strong>How to use:\u00a0<\/strong>start your next project with\u00a0<code>cookiecutter\u00a0<a href=\"https:\/\/github.com\/drivendata\/cookiecutter-data-science\" rel=\"noopener\" target=\"_blank\">https:\/\/github.com\/drivendata\/cookiecutter-data-science<\/a><\/code>.<\/p>\n<p>\u00a0<\/p>\n<h3>#2 \u2014 Deon<\/h3>\n<p>\u00a0<br \/><strong>Usecase<\/strong>: address ethical considerations of your Data Science project and document your findings.<\/p>\n<p>Checklists are a proven way to limit blindspots and reduce errors. As an ethics checklist for responsible Data Science,\u00a0<a href=\"https:\/\/deon.drivendata.org\/\" rel=\"noopener\" target=\"_blank\">Deon<\/a>\u00a0represents a promising starting point for any project. Teams should use this tool to evaluate considerations ranging from data collection through machine learning model deployment.<\/p>\n<p>Running\u00a0<code>deon -o ETHICS.md<\/code>\u00a0from the root of your project file structure will generate a markdown file where you can document your review of the ethical considerations of your model.<\/p>\n<div><img src=\"https:\/\/miro.medium.com\/max\/525\/1*Psf1_gTv_Tx-764dl6kYMw.png\" alt=\"Figure\" width=\"100%\"><br \/><span><\/p>\n<p><\/span><\/div>\n<p>\u00a0<\/p>\n<p>The nuanced discussions spurred by Deon can ensure that risks inherent to machine learning technology do not adversely impact the subjects of the model or the reputation of the organization.\u00a0<em>Read more<\/em>:<\/p>\n<p><a href=\"https:\/\/medium.com\/atlas-research\/ethical-ai-tools-b9d276a49fea\" rel=\"noopener\" target=\"_blank\"><b>3 Open Source Tools for Ethical AI<\/b><\/a><br \/>Before integrating artificial intelligence into your organization\u2019s workflow, consider these tools to prevent machine\u2026<br \/>\u00a0<\/p>\n<p><strong>How to use:\u00a0<\/strong>add the checklist markdown file to your root folder by running\u00a0<code>deon -o ETHICS.md<\/code>, then schedule conversations with your stakeholders to fill out the checklist.<\/p>\n<p>\u00a0<\/p>\n<h3>#3 \u2014 PyCaret<\/h3>\n<p>\u00a0<br \/><strong>Usecase:\u00a0<\/strong>in just a few lines of code, exponentiate your potential with the PyCaret library for simplified Data Science.<\/p>\n<p>Pycaret is great for beginners or seasoned coders looking to increase their efficiency. This library helps you implement the typical steps of a Data Science workflow in fewer lines of code.<\/p>\n<p><strong>How to use:<\/strong>\u00a0leverage PyCaret\u2019s functionality for preprocessing and modeling \u2014 e.g.<\/p>\n<div>\n<pre><code>from pycaret.regression import *\r\nexp_name = setup(data = boston,  target = 'medv', train_size = 0.7)<\/code><\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<h3>#4 \u2014 ktrain<\/h3>\n<p>\u00a0<br \/><strong>Usecase:\u00a0<\/strong>a low-code wrapper for Keras that enshines machine learning best practices into the hyperparameter and model training pipeline.<\/p>\n<p><a href=\"https:\/\/medium.com\/u\/4581d07591d5?source=post_page-----f16ecd91c95d--------------------------------\" rel=\"noopener\" target=\"_blank\">Arun Maiya<\/a>, a machine learning researcher and data science team lead, has compiled the recent advancements from arXiv into functions that can be effortlessly deployed across computer vision, natural language processing, and graph-based approaches.<\/p>\n<p><strong>How to use:<\/strong>\u00a0simplify the training, inspection, and application of state-of-the-art machine learning models \u2014 e.g.<\/p>\n<div>\n<pre><code>model = txt.text_classifier ('bert', trn , preproc = preproc)<\/code><\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<h3>#5 \u2014 MLFlow<\/h3>\n<p>\u00a0<br \/><strong>Usecase:\u00a0<\/strong>move your experiment tracking from manual Excel logs to this automated platform.<\/p>\n<p><a href=\"https:\/\/mlflow.org\/docs\/latest\/index.html\" rel=\"noopener\" target=\"_blank\">ML Flow<\/a>\u00a0enables the automatic tracking of parameters, code versions, metrics, and output files. The MlflowClient function creates and manages experiments, pipeline runs, and model versions. Log artifacts (e.g. datasets), metrics, and hyperparameters with\u00a0<code>mlflow.log_artifact<\/code>,\u00a0<code>.log_metric()<\/code>\u00a0and\u00a0<code>.log_param()<\/code>.<\/p>\n<p>You can easily view all metadata and results across experiments in a local host browser with the\u00a0<code>mlflow ui<\/code>command.<\/p>\n<p><strong>How to use:<\/strong>\u00a0set up MLFlow with\u2026<\/p>\n<div>\n<pre><code><strong>if<\/strong> __name__ <strong>==<\/strong> \"__main__\":\r\n    <em># Log a parameter (key-value pair)<\/em>\r\n    log_param(\"param1\", randint(0, 100))<\/code><\/pre>\n<\/div>\n<p>\u2026then run existing projects with the\u00a0<code>mlflow run<\/code>\u00a0command, which runs a project from either a local directory or a GitHub URL.<\/p>\n<p>\u00a0<\/p>\n<h3>Summary<\/h3>\n<p>\u00a0<br \/>Okay, I may have\u00a0<a href=\"https:\/\/quoteinvestigator.com\/2014\/03\/29\/sharp-axe\/\" rel=\"noopener\" target=\"_blank\">prevaricated slightly<\/a>\u00a0about Abe Lincoln\u2019s pithy lumberjack quote, but I hope you still enjoyed the article. Having the right tool does make the task so much easier. Hopefully, you\u2019re now well-equipped with some new means to connect data to strategic outcomes.<\/p>\n<p><strong>If you enjoyed this writeup<\/strong>, follow me on\u00a0<a href=\"https:\/\/medium.com\/@nicolejaneway\" rel=\"noopener\" target=\"_blank\">Medium<\/a>,\u00a0<a href=\"http:\/\/www.linkedin.com\/in\/nicole-janeway-bills\" rel=\"noopener\" target=\"_blank\">LinkedIn<\/a>,\u00a0<a href=\"https:\/\/www.youtube.com\/channel\/UCO6JE24WY82TKabcGI8mA0Q?view_as=subscriber\" rel=\"noopener\" target=\"_blank\">YouTube<\/a>, and\u00a0<a href=\"https:\/\/twitter.com\/Nicole_Janeway\" rel=\"noopener\" target=\"_blank\">Twitter<\/a>\u00a0for more ideas to improve your Data Science skills.<\/p>\n<p>\u00a0<\/p>\n<h3>More resources<\/h3>\n<p>\u00a0<br \/><a href=\"https:\/\/towardsdatascience.com\/10-underrated-python-skills-dfdff5741fdf\" rel=\"noopener\" target=\"_blank\"><b>10 Underrated Python Skills<\/b><\/a><br \/>Up your Data Science game with these tips for improving your Python coding for better EDA, target analysis, feature\u2026<br \/>\u00a0<\/p>\n<p><a href=\"https:\/\/towardsdatascience.com\/must-read-data-science-papers-487cce9a2020\" rel=\"noopener\" target=\"_blank\"><b>5 Must-Read Data Science Papers (and How to Use Them)<\/b><\/a><br \/>Foundational ideas to keep you on top of the data science game.<br \/>\u00a0<\/p>\n<p><a href=\"https:\/\/towardsdatascience.com\/10-python-skills-419e5e4c4d66\" rel=\"noopener\" target=\"_blank\"><b>10 Python Skills They Don\u2019t Teach in Bootcamp<\/b><\/a><br \/>Ascend to new heights in Data Science and Machine Learning with this list of coding tips.<br \/>\u00a0<\/p>\n<p><a href=\"https:\/\/towardsdatascience.com\/model-selection-and-deployment-cf754459f7ca\" rel=\"noopener\" target=\"_blank\"><b>How to Future-Proof Your Data Science Project<\/b><\/a><br \/>5 critical elements of ML model selection &amp; deployment<br \/>\u00a0<\/p>\n<p><a href=\"https:\/\/towardsdatascience.com\/10-python-skills-beginners-3066305f0d3c\" rel=\"noopener\" target=\"_blank\"><b>10 Python Skills for Beginners<\/b><\/a><br \/>Python is the fastest growing, most-beloved programming language. Get started with these Data Science tips.<br \/>\u00a0<\/p>\n<p>\u00a0<br \/><strong>Bio: <a href=\"https:\/\/www.linkedin.com\/in\/nicole-janeway-bills\/\" target=\"_blank\" rel=\"noopener\">Nicole Janeway Bills<\/a><\/strong>\u00a0is Data Scientist with experience in commercial and federal consulting. She helps organizations leverage their top asset: a simple and robust Data Strategy. <a href=\"https:\/\/page.co\/ahje9p\" rel=\"noopener\" target=\"_blank\"><strong>Sign up for more of her writing<\/strong><\/a>.<\/p>\n<p><a href=\"https:\/\/towardsdatascience.com\/data-science-tools-f16ecd91c95d\" target=\"_blank\" rel=\"noopener\">Original<\/a>. Reposted with permission.<\/p>\n<p><b>Related:<\/b><\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2021\/01\/5-tools-effortless-data-science.html<\/p>\n","protected":false},"author":0,"featured_media":8082,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8081"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=8081"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8081\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/8082"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=8081"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=8081"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=8081"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}