DBT is an open-source tool to help data analyst to transform data in their warehouse more easily, efficiently and effectively. It was started at RJMetrics in 2016 with basic transformation capabilities. DBT was open source, in 2018, DBT Labs team released a commercial product on top of exiting DBT. DBT application is written in Python and is open source. DBT can be installed using pip (python installer) command.
.It doesn’t follow ETL; It is designed to transform data already inside a warehouse. More than 280 companies are running DBT in production currently. With the help of DBT you can run your analytical queries and select statements to transform your data transformation, these select statement are used to create tables and views.
A typical DBT project consists of a directory of .sql and .yml files. The directory must contain:
Models: A model is a .sql file with a select statement.
Project file: dbt_project.yml file which contains the configurations of your DBTproject, apart from that it may have snapshots, and seed files.
You can work with DBT using Command Line Interface or web-based Integrated Development Environment in DBT Cloud. The DBT command line offers functionalities to execute commands to run tests, compile, generate documentation, etc. DBT supports connections to Presto, BigQuery, Snowflake, Redshift and Postgres databases; it also supports other data base application. Similar to other data base tools in DBT also you need to specify which will be used by DBT for creating Models (tables and view). When you define your connection, you’ll also be able to specify the target schema where DBT should create your models as tables and views. See Managing environments for more information on picking target schema names.
The output of models can be stored in different ways-
Materialize a table — full refresh
Append to table — incrementally load data
Ephemeral — the output will be temporarily stored(not in DW) but it can be used as data source from other models.