{"id":8424,"date":"2021-07-28T17:55:52","date_gmt":"2021-07-28T17:55:52","guid":{"rendered":"https:\/\/wealthrevelation.com\/data-science\/2021\/07\/28\/dbt-for-data-transformation-hands-on-tutorial\/"},"modified":"2021-07-28T17:55:52","modified_gmt":"2021-07-28T17:55:52","slug":"dbt-for-data-transformation-hands-on-tutorial","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2021\/07\/28\/dbt-for-data-transformation-hands-on-tutorial\/","title":{"rendered":"dbt for Data Transformation \u2013 Hands-on Tutorial"},"content":{"rendered":"<div id=\"post-\">\n   <!-- post_author Essi Alizadeh -->  <\/p>\n<p><b>By <a href=\"https:\/\/www.linkedin.com\/in\/alizadehesmaeil\/\" target=\"_blank\" rel=\"noopener\">Essi Alizadeh<\/a> is an engineer and a senior data scientist, in permanent beta<\/b>.<\/p>\n<p><img class=\"aligncenter size-full wp-image-130553\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/dbt-data-transformation-tutorial.jpg\" alt=\"\" width=\"90%\"><\/p>\n<p>dbt (data build tool) is a data transformation tool that uses select SQL statements. It allows you to create complex models, use variables and macros (aka functions), run tests, generate documentation, and many more features.<\/p>\n<p>dbt does not extract or load data, but it\u2019s powerful at transforming data that\u2019s already available in the database \u2014dbt does the <strong>T<\/strong> in ELT (Extract, Load, Transform) processes.<\/p>\n<p>In this post, you will learn how to:<\/p>\n<ul>\n<li>Configure a dbt project.<\/li>\n<li>Create dbt models (SELECT statements).<\/li>\n<li>Build complex dbt models using global variables and macros.<\/li>\n<li>Build complex models by referring to other dbt models.<\/li>\n<li>Run tests.<\/li>\n<li>Generate documentation.<\/li>\n<\/ul>\n<p>\u00a0<\/p>\n<h2>Pre-requisite<\/h2>\n<p>\u00a0<\/p>\n<h3>Signup<\/h3>\n<p>You can sign up at <a href=\"https:\/\/cloud.getdbt.com\/\" target=\"_blank\" rel=\"noopener\">getdbt.com<\/a>. The free plan is a great plan for small projects and testing.<\/p>\n<h3>Database with populated data<\/h3>\n<p>You can check my post on <a href=\"https:\/\/ealizadeh.com\/blog\/deploy-postgresql-db-heroku\" target=\"_blank\" rel=\"noopener\">how to deploy a <em>free <\/em>PostgreSQL database on Heroku<\/a>. The post provides step-by-step instructions on how to do it. You can also check the <a href=\"https:\/\/github.com\/e-alizadeh\/sample_dbt_project\/blob\/master\/data\/data_ingestion.py\" target=\"_blank\" rel=\"noopener\">data ingestion script<\/a> in the <a href=\"https:\/\/github.com\/e-alizadeh\/sample_dbt_project\" target=\"_blank\" rel=\"noopener\">GitHub repo<\/a> accompanying this article.<\/p>\n<p>Following the above, we generated two tables in a PostgreSQL database that we are going to use in this post. There are two tables in the database, named covid_latest and population_prosperity. You can find the ingestion script on the GitHub repo for this post.<\/p>\n<h3>dbt CLI Installation<\/h3>\n<p>You can install the dbt command-line interface (CLI) by following the instructions on the following <a href=\"https:\/\/docs.getdbt.com\/dbt-cli\/installation\/\" target=\"_blank\" rel=\"noopener\">dbt documentation page<\/a>.<\/p>\n<p>\u00a0<\/p>\n<h2>Basics of a dbt project<\/h2>\n<p>\u00a0<\/p>\n<p>There are three main things to know about in order to use the dbt tool:<\/p>\n<ul>\n<li>dbt project<\/li>\n<li>database connection<\/li>\n<li>dbt commands<\/li>\n<\/ul>\n<h3>How to use dbt?<\/h3>\n<p>A dbt project is a directory containing <em>.sql<\/em> and <em>.yml<\/em> files. The minimum required files are:<\/p>\n<ul>\n<li>A project file named <em>dbt_project.yml<\/em>: This file contains configurations of a dbt project.<\/li>\n<li>Model(s) (<em>.sql<\/em> files): A model in dbt is simply a single <em>.sql<\/em> file containing a <strong>single <\/strong><em><strong>select<\/strong><\/em><strong> statement<\/strong>.<\/li>\n<\/ul>\n<p>Every dbt project needs a <em>dbt_project.yml<\/em> file \u2014 this is how dbt knows a directory is a dbt project. It also contains important information that tells dbt how to operate on your project.<\/p>\n<p>You can find more information about dbt projects <a href=\"https:\/\/docs.getdbt.com\/docs\/introduction#dbt-projects\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<blockquote>\n<p>A <strong>dbt model<\/strong> is basically a <em>.sql<\/em> file with a <strong>SELECT<\/strong> statement.<\/p>\n<\/blockquote>\n<h3>dbt Commands<\/h3>\n<p>dbt commands start with <em>dbt<\/em> and can be executed using one of the following ways:<\/p>\n<ul>\n<li>dbt Cloud (the command section at the bottom of the dbt Cloud dashboard),<\/li>\n<li>dbt CLI<\/li>\n<\/ul>\n<p>Some commands can only be used in dbt CLI like <em>dbt init<\/em>. Some dbt commands we will use in this post are<\/p>\n<ul>\n<li><em>dbt init<\/em> (only in dbt CLI)<\/li>\n<li><em>dbt run<\/em><\/li>\n<li><em>dbt test<\/em><\/li>\n<li><em>dbt docs generate<\/em><\/li>\n<\/ul>\n<p>\u00a0<\/p>\n<h2>dbt Project Setup<\/h2>\n<p>\u00a0<\/p>\n<h3>Step 1: Initialize a dbt project (sample files) using dbt CLI<\/h3>\n<p>You can use <a href=\"https:\/\/docs.getdbt.com\/reference\/commands\/init\" target=\"_blank\" rel=\"noopener\">dbt init<\/a> to generate sample files\/folders. In particular, <em>dbt init project_name<\/em> will create the following:<\/p>\n<ul>\n<li>a\u00a0<em>~\/.dbt\/profiles.yml\u00a0<\/em>file if one does not already exist<\/li>\n<li>a new folder called\u00a0<em>[project_name]<\/em><\/li>\n<li>directories and sample files necessary to get started with dbt<\/li>\n<\/ul>\n<blockquote>\n<p><strong>Caution<\/strong>: Since <em>dbt init<\/em> generates a directory named<em> project_name<\/em>, and in order to avoid any conflict, you should <em>not have any existing folder with an identical name<\/em>.<\/p>\n<\/blockquote>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/ealizadeh.com\/_next\/image?url=https%3A%2F%2Fsuper-static-assets.s3.amazonaws.com%2F8f45149e-e437-4fac-b1f5-df8c7f6255ec%2Fimages%2Fff12c9fc-af84-48be-8896-8d484f163037.png&amp;w=1920&amp;q=80\" width=\"90%\"><\/p>\n<p><em>dbt init &lt;project_name&gt;<\/em><\/p>\n<p>The result is a directory with the following sample files.<\/p>\n<div>\n<pre>sample_dbt_project\r\n\u251c\u2500\u2500 README.md\r\n\u251c\u2500\u2500 analysis\r\n\u251c\u2500\u2500 data\r\n\u251c\u2500\u2500 dbt_project.yml\r\n\u251c\u2500\u2500 macros\r\n\u251c\u2500\u2500 models\r\n\u2502   \u2514\u2500\u2500 example\r\n\u2502       \u251c\u2500\u2500 my_first_dbt_model.sql\r\n\u2502       \u251c\u2500\u2500 my_second_dbt_model.sql\r\n\u2502       \u2514\u2500\u2500 schema.yml\r\n\u251c\u2500\u2500 snapshots\r\n\u2514\u2500\u2500 tests\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>For this post, we will just consider the minimum files and remove the extra stuff.<\/p>\n<div>\n<pre>sample_dbt_project\r\n\u251c\u2500\u2500 README.md\r\n\u251c\u2500\u2500 dbt_project.yml\r\n\u2514\u2500\u2500 models\r\n    \u251c\u2500\u2500 my_first_dbt_model.sql\r\n    \u251c\u2500\u2500 my_second_dbt_model.sql\r\n    \u2514\u2500\u2500 schema.yml\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<h3>Step 2: Set Up a Git Repository<\/h3>\n<p>You can use an existing repo, as specified during the setup. You can configure the repositories by following the dbt documentation <a href=\"https:\/\/docs.getdbt.com\/docs\/dbt-cloud\/cloud-configuring-dbt-cloud\/cloud-configuring-repositories\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p><strong>Or, if you want to create a new repo&#8230;<\/strong><\/p>\n<p>You can create a new repository from inside the created directory. You can do that as below:<\/p>\n<div>\n<pre>git init\r\ngit add .\r\ngit commit -m \"first commit\"\r\ngit remote add origing \r\ngit push -u origin master\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<h3>Step 3: Set Up a New Project on dbt Cloud Dashboard<\/h3>\n<p>In the previous step, we created a sample dbt project containing sample models and configurations. Now, we want to create a new project and connect our database and repository on the dbt Cloud dashboard.<\/p>\n<p>Before we continue, you should have<\/p>\n<ul>\n<li>some data already available in a database,<\/li>\n<li>a repository with the files generated at the previous step<\/li>\n<\/ul>\n<p>You can follow the steps below to set up a new project in dbt Cloud (keep in mind this step is different than the previous step in that we only generated some sample files).<\/p>\n<\/p>\n<p>The <em>dbt_project.yml<\/em> file for our project is shown below (you can find the complete version in the <a href=\"https:\/\/github.com\/e-alizadeh\/sample_dbt_project.git\" target=\"_blank\" rel=\"noopener\">GitHub repo<\/a> to this post).<\/p>\n<div>\n<pre>name: 'my_new_project'\r\nversion: '1.0.0'\r\nconfig-version: 2\r\n\r\nvars:\r\n  selected_country: USA\r\n\tselected_year: 2019\r\n\r\n# This setting configures which \"profile\" dbt uses for this project.\r\nprofile: 'default'\r\n\r\n# There are other stuff that are generated automatically when you run `dbt init`\r\n\r\n<\/pre>\n<\/div>\n<p><em>dbt_project.yml<\/em><\/p>\n<p>\u00a0<\/p>\n<h2>dbt Models and Features<\/h2>\n<p>\u00a0<\/p>\n<h3>dbt models<\/h3>\n<p>Let&#8217;s create simple dbt models that retrieve few columns of the tables.<\/p>\n<div>\n<pre>select \"iso_code\", \"total_cases\", \"new_cases\" from covid_latest\r\n\r\n<\/pre>\n<\/div>\n<p><em><strong>covid19_latest_stats <\/strong>dbt model (models\/covid19_latest_stats.sql)<\/em><\/p>\n<p>\u00a0<\/p>\n<div>\n<pre>select \"code\", \"year\", \"continent\", \"total_population\" from population_prosperity\r\n\r\n<\/pre>\n<\/div>\n<p><em><strong>population <\/strong>dbt model (models\/population.sql)<\/em><\/p>\n<blockquote>\n<p><strong>Caution:<\/strong> The dbt model name is the filename of the sql file in the <em>models<\/em> directory. The model name may differ from the table name in the database. For instance, in the above, the dbt model <em>population<\/em> is the result of a <em>SELECT<\/em> statement on <em>population_prosperity<\/em> table in the database.<\/p>\n<\/blockquote>\n<h3>Run models<\/h3>\n<p>You can run all models in your dbt project by executing <em>dbt run<\/em>. A sample dbt run output is shown below. You can see a summary or detailed log of running all dbt models. This helps a lot to debug any issue you may have in the queries. For instance, you can see a failed model that throws a Postgres error.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/ealizadeh.com\/_next\/image?url=https%3A%2F%2Fsuper-static-assets.s3.amazonaws.com%2F8f45149e-e437-4fac-b1f5-df8c7f6255ec%2Fimages%2Fbcbb7dce-222a-455f-a186-57aa13577aac.png&amp;w=1920&amp;q=80\" width=\"90%\"><\/p>\n<p><em>Detailed log of failed <strong>jinja_and_variable_usage <\/strong>dbt model.<\/em><\/p>\n<h3>Jinja &amp; Macros<\/h3>\n<p>dbt uses the <a href=\"https:\/\/jinja.palletsprojects.com\/\" target=\"_blank\" rel=\"noopener\">Jinja<\/a> templating language, which makes a dbt project an ideal programming environment for SQL. With Jinja, you can do transformations that are not normally possible in SQL, like using environment variables or macros \u2014 abstract snippets of SQL, analogous to functions in most programming languages. Whenever you see a<em> {{ &#8230; }}<\/em>, you&#8217;re already using Jinja. For more information about Jinja and additional Jinja-style functions defined, you can check <a href=\"https:\/\/docs.getdbt.com\/docs\/building-a-dbt-project\/jinja-macros\/\" target=\"_blank\" rel=\"noopener\">dbt documentation<\/a>.<\/p>\n<p>Later in this post, we will cover custom macros defined by dbt.<\/p>\n<h3>Using Variables<\/h3>\n<p><strong>Define a variable<\/strong><\/p>\n<p>You can define your variables under the <em>vars<\/em> section in your <em>dbt_project.yml<\/em>. For instance, let&#8217;s define a variable called <em>selected_country<\/em> whose default value is <em>USA<\/em> and another one called <em>selected_year<\/em> whose default value is <em>2019<\/em>.<\/p>\n<div>\n<pre>name: 'my_new_project'\r\nversion: '1.0.0'\r\nconfig-version: 2\r\n\r\nvars:\r\n  selected_country: USA\r\n\tselected_year: 2019\r\n<\/pre>\n<\/div>\n<p><em>dbt_project.yml<\/em><\/p>\n<p><strong>Use a Variable<\/strong><\/p>\n<p>You can use variables in your dbt models via the\u00a0<a href=\"https:\/\/docs.getdbt.com\/reference\/dbt-jinja-functions\/var\" target=\"_blank\" rel=\"noopener\"><em>var()<\/em><\/a> Jinja function (<em>{{ var(&#8220;var_key_name&#8221;) }}<\/em>) .<\/p>\n<h3>Macros<\/h3>\n<p>There are many useful transformations and useful macros in <em>dbt_utils<\/em> that can be used in your project. For a list of all available macros, you can check their <a href=\"https:\/\/hub.getdbt.com\/dbt-labs\/dbt_utils\/latest\/\" target=\"_blank\" rel=\"noopener\">GitHub repo<\/a>.<\/p>\n<p>Now, let&#8217;s add dbt_utils to our project and install it by following the below steps:<\/p>\n<ol>\n<li>Add dbt_utils macro to your <em>packages.yml<\/em> file, as follows:<\/li>\n<\/ol>\n<div>\n<pre>packages:\r\n  - package: dbt-labs\/dbt_utils\r\n    version: 0.6.6\r\n<\/pre>\n<\/div>\n<p><em>Add <strong>dbt_utils <\/strong>package to packages.yml.<\/em><\/p>\n<ol start=\"2\">\n<li>Run\u00a0<em>dbt deps<\/em> to install the package.<\/li>\n<\/ol>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/ealizadeh.com\/_next\/image?url=https%3A%2F%2Fsuper-static-assets.s3.amazonaws.com%2F8f45149e-e437-4fac-b1f5-df8c7f6255ec%2Fimages%2F031613cd-c16b-4d55-a6c7-df806cea42ae.png&amp;w=1920&amp;q=80\" width=\"90%\"><\/p>\n<p><em>Install packages using <strong>dbt deps<\/strong>.<\/em><\/p>\n<h3>Complex dbt models<\/h3>\n<p>The models (selects) are usually stacked on top of one another. For building more complex models, you will have to use <a href=\"https:\/\/docs.getdbt.com\/reference\/dbt-jinja-functions\/ref\" target=\"_blank\" rel=\"noopener\">ref()<\/a> macro. <em>ref()<\/em> is the most important function in dbt as it allows you to refer to other models. For instance, you may have a model (aka SELECT query) that does multiple stuff, and you don\u2019t want to use it in other models. It will be difficult to build a complex model without using macros introduced earlier.<\/p>\n<p><strong>dbt model using <em>ref()<\/em> and global variables<\/strong><\/p>\n<p>We can build more complex models using the two dbt models defined earlier in the post. For instance, let&#8217;s create a new dbt model that joins the above two tables on the country code and then filters based on selected country and year.<\/p>\n<div>\n<pre>select *\r\nfrom {{ref('population')}} \r\ninner join {{ref('covid19_latest_stats')}} \r\non {{ref('population')}}.code = {{ref('covid19_latest_stats')}}.iso_code \r\nwhere code='{{ var(\"selected_country\") }}' AND year='{{ var(\"selected_year\") }}'\r\n\r\n<\/pre>\n<\/div>\n<p><em><strong>jinja_and_variable_usage <\/strong>dbt model (models\/jinja_and_variable_usage.sql).<\/em><\/p>\n<p>A few points about the query above:<\/p>\n<ul>\n<li><em>{{ref(&#8216;dbt_model_name&#8217;)}}<\/em> is used to refer to dbt models available in the project.<\/li>\n<li>You can get a column from the model like <em>{{ref(&#8216;dbt_model_name&#8217;)}}.column_name<\/em>.<\/li>\n<li>You can use variables defined in <em>dbt_project.yml<\/em> file by <em>{{var(&#8220;variable_name)}}<\/em>.<\/li>\n<\/ul>\n<p>The above code snippet joins the data from population and covid19_latest_stats models on the country code and filters them based on the selected_country=USA and selected_year=2019. The output of the model is shown below.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/ealizadeh.com\/_next\/image?url=https%3A%2F%2Fsuper-static-assets.s3.amazonaws.com%2F8f45149e-e437-4fac-b1f5-df8c7f6255ec%2Fimages%2F21544905-8203-423c-a060-176003295383.png&amp;w=1920&amp;q=80\" width=\"90%\"><\/p>\n<p><em>The output of the <strong>jinja_and_variable_usage <\/strong>dbt model.<\/em><\/p>\n<p>\u00a0<\/p>\n<p>You can also see the compiled SQL code snippet by clicking on <strong>compile sql<\/strong> button. This is very useful, particularly if you want to run the query outside the dbt tool.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/ealizadeh.com\/_next\/image?url=https%3A%2F%2Fsuper-static-assets.s3.amazonaws.com%2F8f45149e-e437-4fac-b1f5-df8c7f6255ec%2Fimages%2Fb6d9a58e-73fd-48c3-af59-1cefecfcef3e.png&amp;w=1920&amp;q=80\" width=\"90%\"><\/p>\n<p><em>Compiled SQL code for <strong>jinja_and_variable_usage <\/strong>dbt model.<\/em><\/p>\n<p><strong>dbt model using dbt_utils package and macros<\/strong><\/p>\n<p>The <em>dbt_utils<\/em> package contains macros (aka functions) you can use in your dbt projects. A list of all macros is available on <a href=\"https:\/\/github.com\/dbt-labs\/dbt-utils\/\" target=\"_blank\" rel=\"noopener\">dbt_utils&#8217; GitHub page<\/a>.<\/p>\n<p>Let&#8217;s use dbt_utils <a href=\"https:\/\/github.com\/dbt-labs\/dbt-utils\/#pivot-source\" target=\"_blank\" rel=\"noopener\">pivot()<\/a> and <a href=\"https:\/\/github.com\/dbt-labs\/dbt-utils\/#get_column_values-source\" target=\"_blank\" rel=\"noopener\">get_column_values()<\/a> macros in a dbt model as below:<\/p>\n<div>\n<pre>select\r\n  continent,\r\n  {{ dbt_utils.pivot(\r\n      \"population.year\",\r\n      dbt_utils.get_column_values(ref('population'), \"year\")\r\n  ) }}\r\nfrom {{ ref('population') }}\r\ngroup by continent\r\n\r\n<\/pre>\n<\/div>\n<p><em><strong>using_dbt_utils_macros <\/strong>dbt model (models\/using_dbt_utils_macros.sql).<\/em><\/p>\n<p>The above dbt model will compile to the following SQL query in dbt.<\/p>\n<div>\n<pre>select\r\n  continent,\r\n    sum(case when population.year = '2015' then 1 else 0 end) as \"2015\",\r\n\t\tsum(case when population.year = '2017' then 1 else 0 end) as \"2017\",\r\n\t\tsum(case when population.year = '2017' then 1 else 0 end) as \"2016\",\r\n\t\tsum(case when population.year = '2017' then 1 else 0 end) as \"2018\",\r\n\t\tsum(case when population.year = '2017' then 1 else 0 end) as \"2019\"\r\nfrom \"d15em1n30ihttu\".\"dbt_ealizadeh\".\"population\"\r\ngroup by continent\r\nlimit 500\r\n\/* limit added automatically by dbt cloud *\/\r\n\r\n<\/pre>\n<\/div>\n<p><em>Compiled SQL query from <strong>using_dbt_utils_macros <\/strong>dbt model.<\/em><\/p>\n<p>\u00a0<\/p>\n<h2>Run Tests in dbt<\/h2>\n<p>\u00a0<\/p>\n<p>Another benefit of using dbt is the ability to test your data. Out of the box, dbt has the following generic tests: <em>unique<\/em>, <em>not_null<\/em>, <em>accepted_values<\/em>, and <em>relationships<\/em>. An example of these tests on the model is shown below:<\/p>\n<div>\n<pre>version: 2\r\n\r\nmodels:\r\n    - name: covid19_latest_stats\r\n      description: \"A model of latest stats for covid19\"\r\n      columns:\r\n          - name: iso_code\r\n            description: \"The country code\"\r\n            tests:\r\n                - unique\r\n                - not_null\r\n\r\n<\/pre>\n<\/div>\n<p><em>schema.yml (dbt testing).<\/em><\/p>\n<p>You can run the tests via<em> dbt test<\/em>. You can see the output below.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/ealizadeh.com\/_next\/image?url=https%3A%2F%2Fsuper-static-assets.s3.amazonaws.com%2F8f45149e-e437-4fac-b1f5-df8c7f6255ec%2Fimages%2F06102e3c-c6e8-4c61-83a4-4568c44adaf2.png&amp;w=1920&amp;q=80\" width=\"90%\"><\/p>\n<p><em>Results of running dbt test on the dbt Cloud dashboard.<\/em><\/p>\n<p>For more information on testing in dbt, you can visit <a href=\"https:\/\/docs.getdbt.com\/docs\/building-a-dbt-project\/tests\" target=\"_blank\" rel=\"noopener\">dbt documentation<\/a>.<\/p>\n<p>\u00a0<\/p>\n<h2>Generate Documentation in dbt<\/h2>\n<p>\u00a0<\/p>\n<p>You can generate documentation for your dbt project by simply running <em>dbt docs generate<\/em> in the command section as shown below.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/ealizadeh.com\/_next\/image?url=https%3A%2F%2Fsuper-static-assets.s3.amazonaws.com%2F8f45149e-e437-4fac-b1f5-df8c7f6255ec%2Fimages%2F01305cdb-c062-4477-bcba-c916e2ee18d7.png&amp;w=1920&amp;q=80\" width=\"90%\"><\/p>\n<p><em>Generate documentation for a dbt project.<\/em><\/p>\n<p>You can browse through the generated documentation by clicking on<strong> view docs<\/strong>. You can see an overview of the generated docs below.<\/p>\n<\/p>\n<p>In addition to dbt docs generate, dbt docs can also serve a webserver with the generated documentation. To do so, you need to simply run dbt docs serve. More information about generating docs for your dbt project is available <a href=\"https:\/\/docs.getdbt.com\/docs\/building-a-dbt-project\/documentation\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p>\u00a0<\/p>\n<h2>Other Features<\/h2>\n<p>\u00a0<\/p>\n<h3>Database administration using Hooks &amp; Operations<\/h3>\n<p>There are database management tasks that require running additional SQL queries, such as:<\/p>\n<ul>\n<li>Create user-defined functions<\/li>\n<li>Grant privileges on a table<\/li>\n<li>and many more<\/li>\n<\/ul>\n<p>dbt has two interfaces (hooks and operations) for executing these tasks, and, importantly, version controls them. Hooks and operations are briefly introduced here. For more info, you can check <a href=\"https:\/\/docs.getdbt.com\/docs\/building-a-dbt-project\/hooks-operations\" target=\"_blank\" rel=\"noopener\">dbt documentation<\/a>.<\/p>\n<h3>Hooks<\/h3>\n<p>Hooks are simply SQL snippets that are executed at different times. Hooks are defined in the <em>dbt_project.yml<\/em> file. Different hooks are:<\/p>\n<ul>\n<li><em>pre-hook<\/em>: executed before a model is built<\/li>\n<li><em>post-hook<\/em>: executed after a model is built<\/li>\n<li><em>on-run-start<\/em>: executed at the start of dbt run<\/li>\n<li><em>on-run-end<\/em>: executed at the end of dbt run<\/li>\n<\/ul>\n<h3>Operations<\/h3>\n<p>Operations are a convenient way to invoke a macro without running a model. Operations are triggered using <a href=\"https:\/\/docs.getdbt.com\/reference\/commands\/run-operation\" target=\"_blank\" rel=\"noopener\">dbt run-operation<\/a> command. Note that, unlike hooks, you need to explicitly execute the SQL in a <a href=\"https:\/\/docs.getdbt.com\/docs\/building-a-dbt-project\/hooks-operations#operations\" target=\"_blank\" rel=\"noopener\">dbt operation<\/a>.<\/p>\n<p>\u00a0<\/p>\n<h2>Conclusion<\/h2>\n<p>\u00a0<\/p>\n<p>dbt is a nice tool that is definitely worth giving a try as it may simplify your data ELT(or ETL) pipeline. In this post, we learned how to set up and use dbt for data transformation. I walked you through the different features of this tool. In particular, I provided a step-by-step guide on:<\/p>\n<ul>\n<li>Configuring a dbt project<\/li>\n<li>Creating dbt models (SELECT statements)<\/li>\n<li>Build complex dbt models using global variables and macros<\/li>\n<li>Building complex models by referring to other dbt models<\/li>\n<li>Running tests<\/li>\n<li>Generating documentation<\/li>\n<\/ul>\n<p>You can find the <a href=\"https:\/\/github.com\/e-alizadeh\/sample_dbt_project\" target=\"_blank\" rel=\"noopener\">GitHub repo<\/a> containing all scripts (including the data ingestion script) below. <em>Feel free to fork the source code of this article.<\/em><\/p>\n<p>\u00a0<\/p>\n<p><strong>Bio:<\/strong> <a href=\"https:\/\/ealizadeh.com\/\" target=\"_blank\" rel=\"noopener\">Essi Alizadeh<\/a>\u00a0(<a href=\"https:\/\/twitter.com\/es_alizadeh\" target=\"_blank\" rel=\"noopener\">@es_alizadeh<\/a>) is an engineer and a senior data scientist who is in permanent beta. He likes to write about different technologies, Statistics, Time Series, and Machine Learning.<\/p>\n<p><b>Related:<\/b><\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2021\/07\/dbt-data-transformation-tutorial.html<\/p>\n","protected":false},"author":0,"featured_media":8425,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8424"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=8424"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8424\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/8425"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=8424"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=8424"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=8424"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}