{"id":8432,"date":"2021-08-18T00:14:20","date_gmt":"2021-08-18T00:14:20","guid":{"rendered":"https:\/\/wealthrevelation.com\/data-science\/2021\/08\/18\/why-your-data-science-team-needs-separate-testing-validation-training-sets\/"},"modified":"2021-08-18T00:14:20","modified_gmt":"2021-08-18T00:14:20","slug":"why-your-data-science-team-needs-separate-testing-validation-training-sets","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2021\/08\/18\/why-your-data-science-team-needs-separate-testing-validation-training-sets\/","title":{"rendered":"Why Your Data Science Team Needs Separate Testing, Validation &amp; Training Sets"},"content":{"rendered":"<div>\n<p>Automated testing of machine learning models can help to dramatically reduce the amount of time and effort that your team has to dedicate to debugging them. Not applying these techniques properly, however, could potentially do a great deal of harm if the process is completely unmonitored and doesn\u2019t adhere to a list of best practices. Some tests can be applied to models after the training stage is over while others should be applied directly to test the assumptions that the model is operating under.<\/p>\n<p>Traditionally, it\u2019s proven somewhat difficult to test machine learning models because they\u2019re very complex containers that often play host to a number of learned operations that can\u2019t be clearly decoupled from the underlying software.. Conventional software can be broken up into individual units that each accomplish a specific task. The same can\u2019t be said of ML models, which are often solely the product of training and therefore can\u2019t be decomposed into smaller parts.<\/p>\n<p>Testing and <a href=\"https:\/\/www.methodspace.com\/three-stages-data-analysis-evaluating-raw-data\/\">evaluating the data sets<\/a> that you use for training could be the first step in unraveling this problem.<\/p>\n<h2>Monitoring Data Sets in a Training Environment<\/h2>\n<p>ML testing is very different from testing application software because anyone performing checks on ML models will find that they\u2019re attempting to test something that is probabilistic as opposed to deterministic. An otherwise perfect model could occasionally make mistakes and still be considered the best possible model that someone could develop. Engineers working on a spreadsheet or database program wouldn\u2019t be able to tolerate even the slightest rounding errors, but it\u2019s at least somewhat acceptable to find the occasional flaw in the output of a program that processes data by way of learned responses. The<a href=\"https:\/\/data-science-blog.com\/blog\/2020\/11\/02\/bias-and-variance-in-machine-learning\/\"> level of variance<\/a> will differ somewhat depending on the tasks that a particular model is being trained to accomplish, but it may always be there regardless.<\/p>\n<p>As a result, it\u2019s important to at least examine the initial data that\u2019s being used to train ML models. If this data doesn\u2019t accurately represent the kind of information that a real-world environment would thrust onto a model, then said model couldn\u2019t ever hope to perform adequately when such input is finally given. Decent input specifications will help to ensure that the model comes away with a somewhat accurate representation of natural variability in whatever industry it\u2019s performing a study in.<\/p>\n<p>Pure performance measurements can come from essentially any test set, but data scientists will normally want to specify the hyperparameters of their model to provide a clear metric by which to judge performance while taking said measurements. Those who consistently use one model over another solely for its performance on a particular test set may end up fitting test sets to models to find something that performs exactly as they want it to.<\/p>\n<p>Those who are working with smaller data sets will need to find some way to evaluate them in spite of their diminutive size.<\/p>\n<h2>Managing a Smaller Set of Data Safely<\/h2>\n<p>Those working with particularly large data sets have typically gone with 60-20-20 or 80-10-10 splits. This has helped to strike a decent balance between the competing needs of<a href=\"https:\/\/medium.com\/atoti\/how-to-reduce-machine-learning-bias-eb24923dd18e\"> reducing potential bias<\/a> while also ensuring that the simulations run fast enough to be repeated several times over.<\/p>\n<p>Those working with a smaller data set might find that it simply isn\u2019t representative enough, but for whatever reason it isn\u2019t possible to increase the amount of information put into the test set. Cross-validation might be the best option for those who find themselves in this sort of situation. This is normally used by those in the applied ML field to compare and select models since it\u2019s relatively easy to understand.<\/p>\n<p>K-fold cross-validation algorithms are often used to estimate the skill of a particular ML model on new data irrespective of the size of the data in question. No matter what method you decide to try, though, your team needs to keep the concepts of testing and validation data separate when training your ML model. When you square these off in a<a href=\"https:\/\/deepchecks.com\/training-validation-and-test-sets-what-are-the-differences\/\"> training data vs. test data<\/a> free-for-all, the results should come out something like this:<\/p>\n<ul>\n<li>Test sets are essentially examples that are only ever used to judge the performance of a classifier that\u2019s been completely specified.<\/li>\n<li>Validation sets are deployed when data scientists are tuning the parameters of a classifier. You might start using a validation set to find the total number of hidden units that exist in a predefined neural network.<\/li>\n<li>Training sets are used exclusively for learning. Many experts will define these as sets that are designed to fit the parameters of the initial classifier.<\/li>\n<\/ul>\n<p>Segmenting testing, validation and training sets might not seem natural to those who are used to relying on one long inclusive data set in order to ensure that their ML models work in any scenario. Nevertheless, it\u2019s vital to have them separated as much as possible.<a href=\"https:\/\/data-science-blog.com\/blog\/2020\/09\/09\/test-data-management-support-in-test-automation-development\/\"> Test data management<\/a> should always be part of your QA workflows. On top of this, it\u2019s important to keep an eye on how a model responds as it learns from the data even if it does appear that accuracy increases over time. This is because there are several high-quality insights an operator can derive from the learning process.<\/p>\n<h2>Taking a Closer Look at Weights During the Training Process<\/h2>\n<p>An ideal model will enjoy lower losses and a higher degree of accuracy over time, which is often more than enough to please the data scientists that develop them. However, you can learn more by taking a close look at what areas are receiving the heaviest weights during training. A buggy piece of code could produce outcomes where different potential choices aren\u2019t given different weights. Alternatively, it could be that they\u2019re not really stacked against one another at all. In these cases, the overall results might end up looking realistic when they\u2019re actually errant. Finding bugs in this way is especially important in a world where ML agents are being used to debug conventional software applications.<\/p>\n<p>Taking a closer look at the weights themselves can help specialists to discover these problems before a model ever makes its way out into the wild. Debugging ML models as though they were application software will never work simply because so many aspects of their neural networks come from exclusively learned behaviors that aren\u2019t possible to decompose into something that could be<a href=\"https:\/\/data-science-blog.com\/blog\/2020\/10\/27\/10360\/\"> mapped on a flowchart<\/a>. However, it should be possible to detect certain types of problems by paying close attention to these weights.<\/p>\n<p>Developing any piece of software takes quite a bit of time, and the fact that ML models have to be trained means that they\u2019ll often take even longer. Give yourself enough lead time and you should find that your testing, validation and training sets split evenly into different neat packages that make the process much simpler.<\/p>\n<div id=\"author-bio-box\">\n<h3><a href=\"https:\/\/data-science-blog.com\/en\/blog\/author\/piletic\/\" title=\"All posts by Philip Piletic\" rel=\"author\">Philip Piletic<\/a><\/h3>\n<div class=\"bio-gravatar\"><img alt=\"\" src=\"https:\/\/secure.gravatar.com\/avatar\/8505182aa12d8365c7a44de38a9e28de?s=70&amp;d=mm&amp;r=g\" class=\"avatar avatar-70 photo\" height=\"70\" width=\"70\" loading=\"lazy\"><\/div>\n<p><a target=\"_blank\" rel=\"nofollow noopener noreferrer\" href=\"https:\/\/lockedon.com\/\" class=\"bio-icon bio-icon-website\"><\/a><a target=\"_blank\" rel=\"nofollow noopener noreferrer\" href=\"https:\/\/au.linkedin.com\/in\/philippiletic\" class=\"bio-icon bio-icon-linkedin\"><\/a><\/p>\n<p class=\"bio-description\">My primary focus is a fusion of technology, small business, and marketing. I\u2019m a writer, marketing consultant and guest author at several authority websites. In love with startups, the latest tech trends and helping others get their ideas off the ground. You can find me on LinkedIn.<\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/data-science-blog.com\/en\/blog\/2021\/08\/15\/why-your-data-science-team-needs-separate-testing-validation-training-sets\/<\/p>\n","protected":false},"author":0,"featured_media":8433,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8432"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=8432"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8432\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/8433"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=8432"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=8432"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=8432"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}