{"id":296,"date":"2020-08-11T21:31:30","date_gmt":"2020-08-11T21:31:30","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/08\/11\/data-science-internship-interview-questions\/"},"modified":"2020-08-11T21:31:30","modified_gmt":"2020-08-11T21:31:30","slug":"data-science-internship-interview-questions","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/08\/11\/data-science-internship-interview-questions\/","title":{"rendered":"Data Science Internship Interview Questions"},"content":{"rendered":"<div id=\"post-\">\n<p><b>By <a href=\"https:\/\/www.linkedin.com\/in\/jay-feng-ab66b049\/\" target=\"_blank\" rel=\"noopener noreferrer\">Jay Feng<\/a>, Head of Data Science and Co-Founder <a href=\"https:\/\/www.interviewquery.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Interview Query<\/a><\/b>.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/www.interviewquery.com\/content\/images\/size\/w2000\/2020\/07\/artificial-intelligence-3382507_1280.jpg\" width=\"90%\"><\/p>\n<p>Data science is an attractive field. It\u2019s lucrative, you get opportunities to work on interesting projects, and you\u2019re always learning new things. Hence, breaking into the world of data science is extremely competitive. One of the best ways to start your data science career is through a data science internship.<\/p>\n<p>In this article, we\u2019ll look at the<strong>\u00a0general\u00a0<\/strong>level of knowledge that\u2019s required, the components of a typical interview process, and some example interview questions. Note that the term \u2018general\u2019 is emphasized because the specifics differ company by company.<\/p>\n<p>\u00a0<\/p>\n<h3>What&#8217;s expected in a data science internship interview?<\/h3>\n<p>\u00a0<\/p>\n<p>The biggest difference between a data science internship interview and a full-time data scientist is that you typically won\u2019t be expected to know extremely specific details regarding machine learning or deep learning concepts.<\/p>\n<p>However, you will be expected to have the fundamental building blocks to be able to build on them \u2014 this includes Python, R, or SQL, statistics and probability basics, and basic machine learning concepts.<\/p>\n<p>Below is a list of essential knowledge and skills that will make you an attractive candidate:<\/p>\n<p><strong>Python or R<\/strong><\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/blog.interviewquery.com\/content\/images\/2020\/07\/image-7.png\" width=\"90%\"><\/p>\n<p><em>Python data science libraries from\u00a0<a href=\"https:\/\/techvidvan.com\/tutorials\/python-for-data-science\/\" target=\"_blank\" rel=\"noopener noreferrer\">TechVidan<\/a>.<\/em><\/p>\n<p>You should have programming experience in a scripting language, ideally Python or R. If you\u2019re a Python programmer, you should also have a basic understanding of popular libraries like\u00a0<strong>Scikit-learn\u00a0<\/strong>and\u00a0<strong>Pandas.<\/strong><\/p>\n<p><strong>What you should know:<\/strong>\u00a0You should know how to write basic functions and have a fundamental understanding of various data structures and their uses. You should also know about Scikit-learn\u2019s basic (yet essential) capacities, like test_train_split, and StandardScaler. For Pandas, you should be comfortable manipulating DataFrames similar to how you would write a query using SQL.<\/p>\n<p>For example, you may be required to build a simple machine learning model to predict the quantity sold for a product. In this case, if you\u2019re a Python user, it would be extremely useful to understand the Scikit-Learn library, as it provides a number of prebuilt functions already, like the ones mentioned above.<\/p>\n<p><strong>How to prepare:<\/strong>\u00a0Try data science projects on Kaggle or take-home assignments on Interview Query to get an idea of what projects you might need to complete.<\/p>\n<p>To get a better idea of Scikit-Learn, it would be a good idea to build a simple machine learning model using it or walk through a few data science projects that other people have completed.<\/p>\n<p>Lastly, try practicing Python problems on Interview Query to get a sense of what they might ask you.<\/p>\n<p><strong>SQL<\/strong><\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/blog.interviewquery.com\/content\/images\/size\/w1000\/2020\/07\/image-9.png\" width=\"90%\"><\/p>\n<p><em>SQL database from\u00a0<a href=\"https:\/\/hackersandslackers.com\/welcome-to-sql-modifying-databases-and-tables\/\" target=\"_blank\" rel=\"noopener noreferrer\">HackersAndSlackers<\/a>.<\/em><\/p>\n<p><strong>\u00a0<\/strong><strong>You won&#8217;t be expected to have too much experience in relational databases, but at the minimum, you should know how SQL works.<\/strong>\u00a0If you\u2019re vying for a data scientist internship, then you\u2019ll most likely be working for a company that has an immense amount of data. You\u2019ll be expected to navigate through that data yourself to solve problems.<\/p>\n<p><strong>What you should know:\u00a0<\/strong>You should be able to write basic queries, and you should know how to manipulate data using SQL queries. It\u2019s very common for companies to incorporate SQL in their take-home case studies, so it\u2019s essential that you know SQL well.<\/p>\n<p>Example Question:<\/p>\n<p><em>Write an SQL query to get the second highest salary from the\u00a0<\/em><em>Employee<\/em><em>\u00a0table. For example, given the Employee table below, the query should return\u00a0<\/em><em>200<\/em><em>\u00a0as the second highest salary. If there is no second highest salary, then the query should return\u00a0<\/em><em>null<\/em><em>.<\/em><\/p>\n<p>+&#8212;-+&#8212;&#8212;&#8212;-+<br \/>| Id | Salary |<br \/>+&#8212;-+&#8212;&#8212;&#8212;-+<br \/>| 1 \u00a0| 100\u00a0 \u00a0 \u00a0|<br \/>| 2 \u00a0| 200\u00a0 \u00a0 \u00a0|<br \/>| 3 \u00a0| 300\u00a0 \u00a0 \u00a0|<br \/>+&#8212;-+&#8212;&#8212;&#8212;-+<\/p>\n<p><strong>How to prepare:<\/strong>\u00a0Mode provides a great resource for learning basic SQL, which can be found\u00a0<a href=\"https:\/\/mode.com\/sql-tutorial\/introduction-to-sql\/\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>. Additionally, there are tons of SQL practice problems and practice case studies that you can find online.<\/p>\n<p><strong>Statistics &amp; Probability<\/strong><\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/blog.interviewquery.com\/content\/images\/size\/w1000\/2020\/07\/image-8.png\" width=\"90%\"><\/p>\n<p><em>Image from\u00a0<a href=\"https:\/\/unsplash.com\/photos\/jrh5lAq-mIs\" target=\"_blank\" rel=\"noopener noreferrer\">Unsplash<\/a>.<\/em><\/p>\n<p>You should have an understanding of basic\u00a0<strong>statistics\u00a0and probability<\/strong>. These concepts serve as the base for most machine learning and data science concepts. As well, many of the interview questions asked for data science positions are related to statistics.<\/p>\n<p><strong>What you should know:<\/strong>\u00a0You should have a solid understanding of fundamental concepts including but not limited to probability basics, probability distributions, estimation, and hypothesis testing. A very common application of statistics is conditional probability \u2014 for example, what is the probability that a customer will purchase product B given that they purchased product C?<\/p>\n<p><strong>How to prepare:\u00a0<\/strong>If any of these concepts sound foreign to you, there are a number of free resources that you can leverage, like Khan Academy or Georgia Institute of Technology.<\/p>\n<p><strong>Machine Learning Concepts<\/strong><\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/blog.interviewquery.com\/content\/images\/2020\/07\/image-10.png\" width=\"90%\"><\/p>\n<p><em>Machine learning from\u00a0<a href=\"https:\/\/www.forbes.com\/sites\/kalevleetaru\/2019\/01\/15\/why-machine-learning-needs-semantics-not-just-statistics\/#3322ac3b77b5\" target=\"_blank\" rel=\"noopener noreferrer\">Forbes<\/a>.<\/em><\/p>\n<p>While you\u2019re not expected to be an expert, you should have a good understanding of fundamental\u00a0<strong>machine learning models<\/strong>\u00a0and concepts. This is especially the case if the job description says that you\u2019ll be working on building models.<\/p>\n<p><strong>What you should know:<\/strong>\u00a0This includes but is not limited to concepts like linear regression, support vector machines, and clustering. Ideally, you should have a fundamental understanding of these concepts and understand when it\u2019s appropriate to use various machine learning methods.<\/p>\n<p>For example, you may be required to implement linear regression on a product\u2019s price point to determine the quantity sold. That being said, you won\u2019t be required to productionize or deploy a machine learning model as an intern.<\/p>\n<p><strong>Domain Knowledge<\/strong><\/p>\n<p>You should have\u00a0<strong>domain knowledge of the field<\/strong>\u00a0that you are applying for (and if you don\u2019t have it, you should learn it).<\/p>\n<p>For example, if you are applying for a data science position in the marketing department, it would be a good idea to learn about the different marketing channels (e.g., social media, affiliate, TV) as well as core metrics (e.g., LTV, CAC).<\/p>\n<p>\u00a0<\/p>\n<h3>Data Science Internship Interview Process<\/h3>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/blog.interviewquery.com\/content\/images\/size\/w1000\/2020\/07\/image-6.png\" width=\"90%\"><\/p>\n<p><em>Image from\u00a0<a href=\"https:\/\/unsplash.com\/photos\/7aakZdIl4vg\" target=\"_blank\" rel=\"noopener noreferrer\">Unsplash<\/a>.<\/em><\/p>\n<p>Again, the interview process ultimately depends on the company that you are applying for. But generally, there are general steps within the process that most (if not all) companies have in their interview process, which I\u2019ll explain below.<\/p>\n<p>The\u00a0<strong>worst thing you can do as an intern is not do your research<\/strong>\u00a0into what the company does, and it&#8217;s cultural mission and values.<\/p>\n<p><strong>Initial Screening<\/strong><\/p>\n<p>Typically, there\u2019s an initial screening (usually a phone screen) conducted by a recruiter or the hiring manager of the company. The purpose of this is so that the interviewee gets a better understanding of the role, and the interviewer can get a better understanding of the interviewee.<\/p>\n<p>You should expect them to ask about your interest in the role and company, why you think you\u2019d be a good fit, and questions related to your past experiences. In the rare case, you may also be asked one or two simple technical questions.<\/p>\n<p>The interviewer is simply making sure that you\u2019re genuinely interested in the company, that you\u2019re a good communicator, and that you raise no red flags.<\/p>\n<p><strong>Take-home case<\/strong><\/p>\n<p>For many data science internships now, companies will require you to complete a take-home challenge. What this means is that they\u2019ll give you a certain time period to complete a case study that they give you, which is typically reflective of the kind of problems you\u2019d encounter in the actual role.<\/p>\n<p>This is done to see how you would approach a problem (i.e., your thought process) and whether you have the basic knowledge that\u2019s required to complete the problem. Examples of cases include\u00a0<strong>cleaning a dataset<\/strong>\u00a0and\u00a0<strong>building a machine learning model<\/strong>\u00a0to make a given prediction, or\u00a0<strong>querying a dataset<\/strong>\u00a0and\u00a0<strong>analyzing the data<\/strong>, or a combination of the two.<\/p>\n<p><strong>On-site Interview<\/strong><\/p>\n<p>Lastly is the on-site interview, which can consist of one to as many as six rounds of interviews. These interviews are composed of a\u00a0<strong>mixture of behavioral and technical\u00a0<\/strong>interview questions. You may also be required to complete a case on the spot for one of the rounds.<\/p>\n<p>While they are trying to make sure that you have a strong understanding of the fundamental knowledge that\u2019s required to be successful in the role, they\u2019re also assessing your behavior, your motives, and ultimately whether you\u2019d be a good fit for the team or not. Make sure you\u2019re on your best behavior but don\u2019t forget to be yourself!<\/p>\n<p>\u00a0<\/p>\n<h3>Interview Questions<\/h3>\n<p>\u00a0<\/p>\n<p>Below are 10 examples of some interview questions that you are expected to know:<\/p>\n<ul>\n<li>What is a p-value?<\/li>\n<li>What is regularization, and what problem does it try to solve?<\/li>\n<li>How can you the relationship between, say age and income, into a linear model?<\/li>\n<li>What is the probability of getting a sum of 4 if you have two equally weight dice?<\/li>\n<li>What are some of the steps that you take when wrangling and cleaning a dataset?<\/li>\n<li>What is cross-validation, and why is it necessary?<\/li>\n<li>Give an example of when accuracy is not the best metric in determining the effectiveness of a machine learning model.<\/li>\n<li>What&#8217;s the difference between an INNER and OUTER JOIN?<\/li>\n<\/ul>\n<p><a href=\"https:\/\/www.interviewquery.com\/blog-data-science-internship-interview\/\" target=\"_blank\" rel=\"noopener noreferrer\">Original<\/a>. Reposted with permission.<\/p>\n<p>\u00a0<\/p>\n<p><b>Related:<\/b><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2020\/08\/data-science-internship-interview-questions.html<\/p>\n","protected":false},"author":0,"featured_media":297,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/296"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=296"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/296\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/297"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=296"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=296"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=296"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}