{"id":8205,"date":"2021-04-08T15:53:26","date_gmt":"2021-04-08T15:53:26","guid":{"rendered":"https:\/\/wealthrevelation.com\/data-science\/2021\/04\/08\/why-machine-learning-struggles-with-causality\/"},"modified":"2021-04-08T15:53:26","modified_gmt":"2021-04-08T15:53:26","slug":"why-machine-learning-struggles-with-causality","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2021\/04\/08\/why-machine-learning-struggles-with-causality\/","title":{"rendered":"Why machine learning struggles with causality"},"content":{"rendered":"<div id=\"post-\">\n   <!-- post_author Ben Dickson -->  <\/p>\n<p><b>By\u00a0<a href=\"https:\/\/bdtechtalks.com\/author\/bendee983\/\" target=\"_blank\" rel=\"noopener\">Ben Dickson<\/a>, a software engineer and the founder of TechTalks<\/b>.<\/p>\n<p><img class=\"aligncenter size-full wp-image-125357\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/machine-learning-struggles-causality.jpg\" alt=\"\" width=\"90%\"><\/p>\n<p>When you look at the following short video sequence, you can make inferences about causal relations between different elements. For instance, you can see the bat and the baseball player\u2019s arm moving in unison, but you also know that it is the player\u2019s arm that is causing the bat\u2019s movement and not the other way around. You also don\u2019t need to be told that the bat is causing the sudden change in the ball\u2019s direction.<\/p>\n<p>Likewise, you can think about counterfactuals, such as what would happen if the ball flew a bit higher and didn\u2019t hit the bat.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/05\/baseball-bat.gif?resize=640%2C360&amp;ssl=1\" width=\"80%\"><\/p>\n<p>Such inferences come to us humans intuitively. We learn them at a very early age, without being explicitly instructed by anyone and just by observing the world. But for\u00a0<a href=\"https:\/\/bdtechtalks.com\/2017\/08\/28\/artificial-intelligence-machine-learning-deep-learning\/\" target=\"_blank\" rel=\"noopener\">machine learning<\/a>\u00a0algorithms, which have managed to outperform humans in complicated tasks such as go and chess, causality remains a challenge. Machine learning algorithms, especially\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/08\/05\/what-is-artificial-neural-network-ann\/\" target=\"_blank\" rel=\"noopener\">deep neural networks<\/a>, are especially good at ferreting out subtle patterns in huge sets of data. They can transcribe audio in real-time, label thousands of images and video frames per second, and examine x-ray and MRI scans for cancerous patterns. But they struggle to make simple causal inferences like the ones we just saw in the baseball video above.<\/p>\n<p>In a\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2102.11107\" target=\"_blank\" rel=\"noopener\">paper<\/a>\u00a0titled \u201cTowards Causal Representation Learning,\u201d researchers at the Max Planck Institute for Intelligent Systems, the Montreal Institute for Learning Algorithms (Mila), and Google Research discuss the challenges arising from the lack of causal representations in machine learning models and provide directions for creating artificial intelligence systems that can learn causal representations.<\/p>\n<p>This is one of several efforts that aim to explore and solve machine learning\u2019s lack of causality, which can be key to overcoming some of the\u00a0<a href=\"https:\/\/bdtechtalks.com\/2018\/02\/27\/limits-challenges-deep-learning-gary-marcus\/\" target=\"_blank\" rel=\"noopener\">major challenges the field faces today<\/a>.<\/p>\n<p>\u00a0<\/p>\n<h3>Independent and identically distributed data<\/h3>\n<p>\u00a0<\/p>\n<p>Why do machine learning models fail at generalizing beyond their narrow domains and training data?<\/p>\n<p>\u201cMachine learning often disregards information that animals use heavily: interventions in the world, domain shifts, temporal structure \u2014 by and large, we consider these factors a nuisance and try to engineer them away,\u201d write the authors of the causal representation learning paper. \u201cIn accordance with this, the majority of current successes of machine learning boil down to large scale pattern recognition on suitably collected\u00a0<em>independent and identically distributed (i.i.d.)<\/em>\u00a0data.\u201d<\/p>\n<p>i.i.d. is a term often used in machine learning. It supposes that random observations in a problem space are not dependent on each other and have a constant probability of occurring. The simplest example of i.i.d. is flipping a coin or tossing a die. The result of each new flip or toss is independent of previous ones, and the probability of each outcome remains constant.<\/p>\n<p>When it comes to more complicated areas such as\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/01\/14\/what-is-computer-vision\/\" target=\"_blank\" rel=\"noopener\">computer vision<\/a>, machine learning engineers try to turn the problem into an i.i.d. domain by training the model on very large corpora of examples. The assumption is that, with enough examples, the machine learning model will be able to encode the general distribution of the problem into its parameters. But in the real world, distributions often change due to factors that cannot be considered and controlled in the training data. For instance,\u00a0<a href=\"https:\/\/bdtechtalks.com\/2020\/01\/06\/convolutional-neural-networks-cnn-convnets\/\" target=\"_blank\" rel=\"noopener\">convolutional neural networks<\/a>\u00a0trained on millions of images can fail when they see objects under new lighting conditions or from slightly different angles or against new backgrounds.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2019\/12\/objectnet_controls_table.png?ssl=1\" width=\"90%\"><\/p>\n<p><em>Objects in training datasets vs. objects in the real world (source: objectnet.dev).<\/em><\/p>\n<p>Efforts to address these problems mostly include training machine learning models on more examples. But as the environment grows in complexity, it becomes impossible to cover the entire distribution by adding more training examples. This is especially true in domains where AI agents must interact with the world, such as robotics and self-driving cars. Lack of causal understanding makes it very hard to make predictions and deal with novel situations. This is why you see\u00a0<a href=\"https:\/\/bdtechtalks.com\/2020\/07\/29\/self-driving-tesla-car-deep-learning\/\" target=\"_blank\" rel=\"noopener\">self-driving cars make weird and dangerous mistakes<\/a>\u00a0even after having trained for millions of miles.<\/p>\n<\/p>\n<p>\u201cGeneralizing well outside the i.i.d. setting requires learning not mere statistical associations between variables, but an underlying causal model,\u201d the AI researchers write.<\/p>\n<p>Causal models also allow humans to repurpose previously gained knowledge for new domains. For instance, when you learn a real-time strategy game such as Warcraft, you can quickly apply your knowledge to other similar games StarCraft and Age of Empires. Transfer learning in machine learning algorithms, however, is limited to very superficial uses, such as fine-tuning an image classifier to detect new types of objects. In more complex tasks, such as learning video games, machine learning models need huge amounts of training (thousands of years\u2019 worth of play) and respond poorly to minor changes in the environment (e.g., playing on a new map or with a slight change to the rules).<\/p>\n<p>\u201cWhen learning a causal model, one should thus require fewer examples to adapt as most knowledge, i.e., modules can be reused without further training,\u201d the authors of the causal machine learning paper write.<\/p>\n<p>\u00a0<\/p>\n<h3>Causal learning<\/h3>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/03\/causal-graph.jpg?resize=768%2C432&amp;ssl=1\" width=\"90%\"><\/p>\n<p>So, why has i.i.d. remained the dominant form of machine learning despite its known weaknesses? Pure observation-based approaches are scalable. You can continue to achieve incremental gains in accuracy by adding more training data, and you can speed up the training process by adding more compute power. In fact, one of the key factors behind the recent success of deep learning is the\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/11\/25\/ai-research-neural-networks-compute-costs\/\" target=\"_blank\" rel=\"noopener\">availability of more data and stronger processors<\/a>.<\/p>\n<p>i.i.d.-based models are also easy to evaluate: take a large dataset, split it into training and test sets, tune the model on the training data, and validate its performance by measuring the accuracy of its predictions on the test set. Continue the training until you reach the accuracy you require. There are already many public datasets that provide such benchmarks, such as ImageNet, CIFAR-10, and MNIST. There are also task-specific datasets such as the COVIDx dataset for COVID-19 diagnosis and the Wisconsin Breast Cancer Diagnosis dataset. In all cases, the challenge is the same: develop a machine learning model that can predict outcomes based on statistical regularities.<\/p>\n<p>But as the AI researchers observe in their paper, accurate predictions are often not sufficient to inform decision-making. For instance, during the coronavirus pandemic, many\u00a0<a href=\"https:\/\/bdtechtalks.com\/2020\/05\/25\/coroanavirus-artificial-intelligence-mistakes\/\" target=\"_blank\" rel=\"noopener\">machine learning systems began to fail<\/a>\u00a0because they had been trained on statistical regularities instead of causal relations. As life patterns changed, the accuracy of the models dropped.<\/p>\n<p>Causal models remain robust when interventions change the statistical distributions of a problem. For instance, when you see an object for the first time, your mind will subconsciously factor out lighting from its appearance. That\u2019s why, in general, you can recognize the object when you see it under new lighting conditions.<\/p>\n<p>Causal models also allow us to respond to situations we haven\u2019t seen before and think about counterfactuals. We don\u2019t need to drive a car off a cliff to know what will happen. Counterfactuals play an important role in cutting down the number of training examples a machine learning model needs.<\/p>\n<p>Causality can also be crucial to dealing with\u00a0<a href=\"https:\/\/bdtechtalks.com\/2020\/07\/15\/machine-learning-adversarial-examples\/\" target=\"_blank\" rel=\"noopener\">adversarial attacks<\/a>, subtle manipulations that force machine learning systems to fail in unexpected ways. \u201cThese attacks clearly constitute violations of the i.i.d. assumption that underlies statistical machine learning,\u201d the authors of the paper write, adding that adversarial vulnerabilities are proof of the differences in the robustness mechanisms of human intelligence and machine learning algorithms. The researchers also suggest that causality can be a possible defense against adversarial attacks.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2019\/02\/ai-adversarial-example-panda-gibbon.png?resize=768%2C299&amp;ssl=1\" width=\"90%\"><\/p>\n<p><em>Adversarial attacks target machine learning\u2019s sensitivity to i.i.d. In this image, adding an imperceptible layer of noise to this panda picture causes a convolutional neural network to mistake it for a gibbon.<\/em><\/p>\n<p>In a broad sense, causality can address machine learning\u2019s lack of generalization. \u201cIt is fair to say that much of the current practice (of solving i.i.d. benchmark problems) and most theoretical results (about generalization in i.i.d. settings) fail to tackle the hard open challenge of generalization across problems,\u201d the researchers write.<\/p>\n<p>\u00a0<\/p>\n<h3>Adding causality to machine learning<\/h3>\n<p>\u00a0<\/p>\n<p>In their paper, the AI researchers bring together several concepts and principles that can be essential to creating causal machine learning models.<\/p>\n<p>Two of these concepts include \u201cstructural causal models\u201d and \u201cindependent causal mechanisms.\u201d In general, the principles state that instead of looking for superficial statistical correlations, an AI system should be able to identify causal variables and separate their effects on the environment.<\/p>\n<p>This is the mechanism that enables you to detect different objects regardless of the view angle, background, lighting, and other noise. Disentangling these causal variables will make AI systems more robust against unpredictable changes and interventions. As a result, causal AI models won\u2019t need huge training datasets.<\/p>\n<p>\u201cOnce a causal model is available, either by external human knowledge or a learning process,\u00a0<em>causal reasoning<\/em>\u00a0allows drawing conclusions on the effect of interventions, counterfactuals, and potential outcomes,\u201d the authors of the causal machine learning paper write.<\/p>\n<p>The authors also explore how these concepts can be applied to different branches of machine learning, including\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/05\/28\/what-is-reinforcement-learning\/\" target=\"_blank\" rel=\"noopener\">reinforcement learning<\/a>, which is crucial to problems where an intelligent agent relies a lot on exploring environments and discovering solutions through trial and error. Causal structures can help make the training of reinforcement learning more efficient by allowing them to make informed decisions from the start of their training instead of taking random and irrational actions.<\/p>\n<p>The researchers provide ideas for AI systems that combine machine learning mechanisms and structural causal models: \u201cTo combine structural causal modeling and representation learning, we should strive to embed an SCM into larger machine learning models whose inputs and outputs may be high-dimensional and unstructured, but whose inner workings are at least partly governed by an SCM (that can be parameterized with a neural network). The result may be a modular architecture, where the different modules can be individually fine-tuned and repurposed for new tasks.\u201d<\/p>\n<p>Such concepts bring us closer to the modular approach the human mind uses (at least as far as we know) to link and reuse knowledge and skills across different domains and areas of the brain.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/03\/causal-machine-learning-model.jpg?w=936&amp;ssl=1\" width=\"90%\"><\/p>\n<p><em>Combining causal graphs with machine learning will enable AI agents to create modules that can be applied to different tasks without much training.<\/em><\/p>\n<p>It is worth noting, however, that the ideas presented in the paper are at the conceptual level. As the authors acknowledge, implementing these concepts faces several challenges: \u201c(a) in many cases, we need to infer abstract causal variables from the available low-level input features; (b) there is no consensus on which aspects of the data reveal causal relations; (c) the usual experimental protocol of training and test set may not be sufficient for inferring and evaluating causal relations on existing data sets, and we may need to create new benchmarks, for example with access to environment information and interventions; (d) even in the limited cases we understand, we often lack scalable and numerically sound algorithms.\u201d<\/p>\n<p>But what\u2019s interesting is that the researchers draw inspiration from much of the parallel work being done in the field. The paper contains references to the work done by Judea Pearl, a Turing Award-winning scientist best known for his work on\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/12\/09\/judea-pearl-the-book-of-why-ai-causality\/\" target=\"_blank\" rel=\"noopener\">causal inference<\/a>. Pearl is a vocal critic of pure deep learning methods. Meanwhile, Yoshua Bengio, one of the co-authors of the paper and another Turing Award winner, is one of the pioneers of deep learning.<\/p>\n<p>The paper also contains several ideas that overlap with the idea of\u00a0<a href=\"https:\/\/bdtechtalks.com\/2020\/03\/04\/gary-marcus-hybrid-ai\/\" target=\"_blank\" rel=\"noopener\">hybrid AI models<\/a>\u00a0proposed by Gary Marcus, which combines the reasoning power of\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/11\/18\/what-is-symbolic-artificial-intelligence\/\" target=\"_blank\" rel=\"noopener\">symbolic systems<\/a>\u00a0with the pattern recognition power of neural networks. The paper does not, however, make any direct reference to hybrid systems.<\/p>\n<p>The paper is also in line with\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/12\/23\/yoshua-bengio-neurips-2019-deep-learning\/\" target=\"_blank\" rel=\"noopener\">system 2 deep learning<\/a>, a concept first proposed by Bengio in a talk at the NeurIPS 2019 AI conference. The idea behind system 2 deep learning is to create a type of neural network architecture that can learn higher representations from data. Higher representations are crucial to causality, reasoning, and transfer learning.<\/p>\n<p>While it\u2019s not clear which of the several proposed approaches will help solve machine learning\u2019s causality problem, the fact that ideas from different\u2014and often conflicting\u2014schools of thought are coming together is guaranteed to produce interesting results.<\/p>\n<p>\u201cAt its core, i.i.d. pattern recognition is but a mathematical abstraction, and causality may be essential to most forms of animate learning,\u201d the authors write. \u201cUntil now, machine learning has neglected a full integration of causality, and this paper argues that it would indeed benefit from integrating causal concepts.\u201d<\/p>\n<p>\u00a0<\/p>\n<p><a href=\"https:\/\/bdtechtalks.com\/2021\/03\/15\/machine-learning-causality\/\">Original<\/a>. Reposted with permission.<\/p>\n<p><strong>Bio:<\/strong>\u00a0<a href=\"https:\/\/bdtechtalks.com\/author\/bendee983\/\" target=\"_blank\" rel=\"noopener\">Ben Dickson<\/a>\u00a0is a software engineer and the founder of TechTalks. He writes about technology, business and politics.<\/p>\n<p><b>Related:<\/b><\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2021\/04\/machine-learning-struggles-causality.html<\/p>\n","protected":false},"author":0,"featured_media":8206,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8205"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=8205"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8205\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/8206"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=8205"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=8205"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=8205"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}