{"id":8044,"date":"2020-12-29T00:26:49","date_gmt":"2020-12-29T00:26:49","guid":{"rendered":"https:\/\/healinglifespan.com\/data-science\/2020\/12\/29\/how-to-easily-check-if-your-machine-learning-model-is-fair\/"},"modified":"2020-12-29T00:26:49","modified_gmt":"2020-12-29T00:26:49","slug":"how-to-easily-check-if-your-machine-learning-model-is-fair","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/12\/29\/how-to-easily-check-if-your-machine-learning-model-is-fair\/","title":{"rendered":"How to easily check if your Machine Learning model is fair?"},"content":{"rendered":"<div id=\"post-\">\n<p><b>By <a href=\"https:\/\/medium.com\/@jakwisn\/about\" target=\"_blank\" rel=\"noopener noreferrer\">Jakub Wi\u015bniewski<\/a>, data science student and research software engineer in MI2 DataLab<\/b>.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/1250\/1*WHxWnEoQDZAdxgKxzBxO2A.jpeg\" width=\"90%\"><\/p>\n<p><em>Photo by\u00a0<a class=\"cm gp\" href=\"https:\/\/unsplash.com\/@ekrull?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">Eric Krull<\/a>\u00a0on\u00a0<a class=\"cm gp\" href=\"https:\/\/unsplash.com\/s\/photos\/robot?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">Unsplash<\/a>.<\/em><\/p>\n<p>We live in a world that is getting more divided each day. In some parts of the world, the differences and inequalities between races, ethnicities, and sometimes sexes are aggravating. The data we use for modeling is, in the major part, a reflection of the world it derives from. And the world can be biased, so data and therefore the model will likely reflect that.\u00a0<strong>We propose a way in which ML engineers can easily check if their model is biased.\u00a0<\/strong>Our fairness tool now works only with classification models.<\/p>\n<p>\u00a0<\/p>\n<h3>Case study<\/h3>\n<p>\u00a0<\/p>\n<p>To showcase the abilities of the\u00a0<a href=\"https:\/\/dalex.drwhy.ai\/\" target=\"_blank\" rel=\"noopener noreferrer\">dalex fairness module<\/a>, we will be using the well-known\u00a0<a href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/statlog+(german+credit+data\" target=\"_blank\" rel=\"noopener noreferrer\">German Credit Data dataset<\/a>\u00a0to assign risk for each credit-seeker. This simple task may require using an interpretable\u00a0<em>decision tree classifier<\/em>.<\/p>\n<p>Once we have\u00a0<em>dx.Explainer<\/em>\u00a0we need to execute the method\u00a0<em>model_fairness()<\/em>, so it can calculate all necessary metrics among the subgroups from the\u00a0<em>protected\u00a0<\/em>vector, which is an array or a list with sensitive attributes denoting sex, race, nationality, etc., for each observation (individual). Apart from that, we will need to point which subgroup (so which unique element of\u00a0<em>protected<\/em>) is the most privileged, and it can be done through\u00a0<em>privileged\u00a0<\/em>parameter, which in our case will be older males.<\/p>\n<p>This object has many attributes, and we will not go through each and every one of them. A more detailed overview can be found in this\u00a0<a href=\"http:\/\/dalex.drwhy.ai\/python-dalex-fairness.html\" target=\"_blank\" rel=\"noopener noreferrer\">tutorial<\/a>. Instead, we will focus on one method and two plots.<\/p>\n<p>\u00a0<\/p>\n<h3>So is our model biased or not?<\/h3>\n<p>\u00a0<\/p>\n<p>This question is simple, but because of the nature of bias, the response will be: it depends. But this method measuring bias from different perspectives so that no bias model can go through. To check fairness, one has to use\u00a0<em>fairness_check()<\/em>\u00a0method.<\/p>\n<div>\n<pre>fobject.fairness_check(epsilon = 0.8) # default epsilon\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>The following chunk is the console output from the code above.<\/p>\n<div>\n<pre>Bias detected in 1 metric: FPR\r\n\r\nConclusion: your model cannot be called fair because 1 metric score exceeded acceptable limits set by epsilon.\r\nIt does not mean that your model is unfair but it cannot be automatically approved based on these metrics.\r\n\r\nRatios of metrics, based on 'male_old'. Parameter 'epsilon' was set to 0.8 and therefore metrics should be within (0.8, 1.25)\r\n                   TPR       ACC       PPV       FPR       STP\r\nfemale_old    1.006508  1.027559  1.000000  0.765051  0.927739\r\nfemale_young  0.971800  0.937008  0.879594  0.775330  0.860140\r\nmale_young    1.030369  0.929134  0.875792  0.998532  0.986014\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>The bias was spotted in metric FPR, which is the False Positive Rate. The output above suggests that the model cannot be\u00a0<strong>automatically\u00a0<\/strong>approved (like said in the output above). So it is up to the user to decide. In my opinion, it is not a fair model. Lower FPR means that the privileged subgroup is getting False Positives more frequently than the unprivileged.<\/p>\n<p><strong>More on\u00a0<em>fairness_check()<\/em><\/strong><\/p>\n<p>We get the information about bias, the conclusion, and metrics ratio raw DataFrame. There are metrics TPR (True Positive Rate), ACC (Accuracy), PPV (Positive Predictive Value), FPR (False Positive Rate), STP(Statistical parity). The metrics are derived from a\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Confusion_matrix\" target=\"_blank\" rel=\"noopener noreferrer\">confusion matrix<\/a>\u00a0for each unprivileged subgroup and then divided by metric values based on the privileged subgroup. There are 3 types of possible conclusions:<\/p>\n<div>\n<pre># not fair\r\nConclusion: your model is not fair because 2 or more metric scores exceeded acceptable limits set by epsilon.\r\n\r\n# neither fair or not\r\nConclusion: your model cannot be called fair because 1 metric score exceeded acceptable limits set by epsilon.It does not mean that your model is unfair but it cannot be automatically approved based on these metrics.\r\n\r\n# fair\r\nConclusion: your model is fair in terms of checked fairness metrics.\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>A DA true fair model would not exceed any metric, but when true values (target) are dependent on sensitive attributes, then things get complicated and out of scope for this blog. In short, some metrics will not be equal, but they will not necessarily exceed the user&#8217;s threshold. If you want to know more, then I strongly suggest checking out the\u00a0<a href=\"https:\/\/fairmlbook.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">Fairness and machine learning<\/a>\u00a0book, especially chapter 2.<\/p>\n<p><strong>But one could ask why our model is not fair, on what grounds are we deciding?<\/strong><\/p>\n<p>The answer to this question is tricky, but the method of judging fairness seems to be the best so far. Generally, the score for each subgroup should be close to the score of the privileged subgroup. To put it in a more mathematical perspective, the ratios between scores of privileged and unprivileged metrics should be close to 1. The closer it is to 1, the more fair the model is. But to relax this criterion a little bit, it can be written more thoughtfully:<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/563\/0*lQZs30P10n27PB4H.png\" width=\"90%\"><\/p>\n<p>\u00a0<\/p>\n<p>Where the\u00a0<em>epsilon\u00a0<\/em>is a value between 0 and 1, it should be a minimum acceptable value of the ratio. By default, it is 0.8, which adheres to the\u00a0<a href=\"https:\/\/www.hirevue.com\/blog\/hiring\/what-is-adverse-impact-and-why-measuring-it-matters\" target=\"_blank\" rel=\"noopener noreferrer\">four-fifths<\/a>\u00a0rule (80% rule) often looked at in hiring. It is hard to find a non-arbitrary boundary between a fair and discriminative difference in metrics, and checking if the ratios of the metrics are exactly 1 would be pointless because what if the ratio is 0.99? This is why we decided to choose 0.8 as our default\u00a0<em>epsilon\u00a0<\/em>as it is the only known value to be a tangible threshold for the acceptable amount of discrimination. Of course, a user may change this value to their needs.<\/p>\n<p>\u00a0<\/p>\n<h3>Bias can also be plotted<\/h3>\n<p>\u00a0<\/p>\n<p>There are two bias detection plots available (however, there are more ways to visualize bias in the package)<\/p>\n<ul>\n<li><em>fairness_check<\/em>\u2014 visualization of\u00a0<em>fairness_check()<\/em>\u00a0method<\/li>\n<li><em>metric_scores<\/em>\u2014 visualization of\u00a0<em>metric_scores\u00a0<\/em>attribute which is raw scores of metrics.<\/li>\n<\/ul>\n<p>The types just need to be passed to the\u00a0<em>type\u00a0<\/em>parameter of the <em>plot\u00a0<\/em>method.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*JdUnopLblUdENEyV0JbySg.png\" width=\"90%\"><\/p>\n<p><em>fobject.plot()<\/em><\/p>\n<p>The plot above shows similar things to the fairness check output. Metric names are changed to more standard fairness equivalents, but the formulas point to which metrics we are referring to. Looking at the plot above, intuition is simple\u2014if the bars are reaching the red fields, then it means the metrics exceed the epsilon-based range. The length of the bar is equivalent to the |1-M| where M is the unprivileged Metric score divided by the privileged Metric score (so just like in fairness check before).<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*duvFDZ6Qn5wi0U2RxHS-pw.png\" width=\"90%\"><\/p>\n<p><em>fobject.plot(type=\u2019metric_scores\u2019)<\/em><\/p>\n<p>The <em>Metric Scores<\/em>\u00a0plot paired with the\u00a0<em>Fairness Check\u00a0<\/em>give good intuition about metrics and their ratios. Here the points are raw (not divided) metric scores. The vertical line symbolizes a privileged metric score. The closer to that line, the better.<\/p>\n<p><strong>Multiple models<\/strong>\u00a0can be put into one plot so they can be easily compared with each other. Let\u2019s add some models and visualize the\u00a0<em>metric_scores:<\/em><\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*tz3iBhQ0oM2_X5fMlXktpg.png\" width=\"90%\"><\/p>\n<p><em>Output of the code above.<\/em><\/p>\n<p>Now let\u2019s check the plot based on <em>fairness_check:<\/em><\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*IJKlnchdXykH-bsqf35XZg.png\" width=\"90%\"><\/p>\n<p>We Can see that\u00a0<em>RandomForestClassifier<\/em>\u00a0is within the green zone, and therefore in terms of these metrics, it is fair. On the other hand, the\u00a0<em>LogisticRegression<\/em>\u00a0is reaching red zones in three metrics and, therefore, cannot be called fair.<\/p>\n<p>Every plot is interactive, made with the python visualization package\u00a0<em>plotly<\/em>.<\/p>\n<p>\u00a0<\/p>\n<h3>Summary<\/h3>\n<p>\u00a0<\/p>\n<p>The fairness module in\u00a0<em>dalex\u00a0<\/em>is a unified and accessible way to ensure that the models are fair. There are other ways to visualize bias in models, be sure to check it out! In the future, bias mitigation methods will be added. There is a long term plan to add support for\u00a0<em>individual fairness<\/em>\u00a0and\u00a0<em>fairness in regression<\/em>.<\/p>\n<p>Be sure to check it out. You can install\u00a0dalex\u00a0with:<\/p>\n<p>\u00a0<\/p>\n<p>If you want to learn more about fairness, the I really recommend:<\/p>\n<p><a href=\"https:\/\/medium.com\/responsibleml\/how-to-easily-check-if-your-ml-model-is-fair-2c173419ae4c\" target=\"_blank\" rel=\"noopener noreferrer\">Original<\/a>. Reposted with permission.<\/p>\n<p>\u00a0<\/p>\n<p><b>Related:<\/b><\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2020\/12\/machine-learning-model-fair.html<\/p>\n","protected":false},"author":0,"featured_media":8045,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8044"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=8044"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8044\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/8045"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=8044"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=8044"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=8044"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}