Metabase Query Builder

Written by Impact Atlas
Updated 8 months ago

What is the query builder?

The Metabase query builder is a graphical interface that allows users to create and construct queries in the Metabase analytics platform. It provides a user-friendly environment for designing and customizing queries without the need for writing complex SQL code.

How do I access the query builder?

To access the Metabase Homepage, start by navigating to the "Analytics" submenu in the left menu of the Impact Atlas.

Next, click the "+New" button located at the top right of the page. Once clicked, a drop-down menu will appear, and from there, select the first option labeled "*Question."

Upon choosing the initial option, you will be prompted to pick the database corresponding to the cohort. Following that, select one of the tables associated with the cohort, which corresponds to the forms or surveys. Refer to the screenshots below for a visual guide outlining these steps.

After choosing the table, Metabase will redirect you to the query builder interface.

The interface of the query builder offers seven options to customize the query:

  1. Picking data
  2. Joining data
  3. Creating custom columns
  4. Filtering
  5. Summarizing and grouping by
  6. Sorting Results
  7. Setting a row limit

To the right of the box that picks up the data is a Preview button (which looks like a Play button—a triangle pointing to the right) that shows you the first 10 rows of the results of your question.

After clicking the Preview button, the initial 10 results from the registration table will be shown, and you have the option to close the preview list if desired.

    Now, we will examine each option currently present in the query builder and offer some examples.

    Picking data

    The data section is where you select the data you want to work with. In this section, you will select a table from a database or a saved question. You can click on the table to select which columns you want to include in your results.

    EXAMPLE 1.

    Let's say we want to create a query that displays all columns (survey questions) of the registration table (form). To achieve this, we should select the "down arrow" beside the registration table, revealing all the available columns. Subsequently, we must mark each column present in the drop-down menu.

    Preview Result:

    EXAMPLE 2.

    Let's consider that we want to display the basic information of participants, such as the first name, last name, gender, age, national ID, neighborhood, and section. In this case, we should select these specific columns in the drop-down menu located next to the registration table.

    Preview Result:

    Joining data

    In metabase, joins allow you to combine data from two or more tables in a database based on a related column between them.

    Once you click the "Join data" button to create a join, you will need to choose the data table that you wish to combine.

    Keep in mind that you can only choose tables from the same database as your joined data. It's important to note that the first table will always be the one selected in the data section — in our case, the registration table

    Refer to the screenshot below for a visual representation of the previously explained step.

    Afterward, you'll need to identify a common column between the tables being joined. This involves selecting a column from the first table and another from the second table. Additionally, you must choose the type of join between the two tables, which can be left, right, or inner. By default, Metabase will perform a LEFT JOIN.

    Left join - Retrieves all rows from the left table and matching rows from the right table. If there is no match, NULL values are returned for columns from the right table.

    Right join - Retrieves all rows from the right table and matching rows from the left table. If there is no match, NULL values are returned for columns from the left table.

    Inner join - Retrieves only the rows where there is a match in both tables.

    EXAMPLE 1.

    Let's say we want to create a query that combines data from two tables: registration and case notes. The registration table contains participant-related data, while the case notes table holds various notes associated with a participant.

    The data in the registration table is:

    The data in the case notes table is:

    To achieve our goal, we need to find the link between these two tables. By analyzing the data from the two tables, we can easily see that the common and unique data is the participant ID.

    Below, you can see the screenshot with the join relationship between the two tables.

    Next, you need to select the data to be displayed in the query result and specify the type of join. In this particular example, I chose two columns from the registration table—first name and last name—and three columns from the case note table—note type, note description, and note participant ID. I selected INNER JOIN as the type of JOIN.

    The result is as follows:

    The displayed query result (shown in the screenshot above) presents three participant rows identified by the IDs 114, 115, and 117. These specific participants are present in the data from both tables. Participants from the registration table not found in the case notes table will not be included in the result.

    Let's update the type of JOIN to LEFT. The result would then be:

    In the screenshot above, you can observe all the rows from the initial result. Additionally, participants from the left table (registration) are included, even if they do not have a corresponding match in the right table (case note). In such instances, the respective columns will display as empty.

    For example, participant Daniel Brown has all the columns from the Case Note survey as empty (Participant ID, Note Type, and Note Description).

    Let's update the type of JOIN to RIGHT. The result would then be:

    The result is the same as for the INNER JOIN type. All the participants from the right table (Case Note table) need to appear in our result as mandatory, and then the data from the registration table can be omitted, but in this specific example, all three participants from the right table (Case Note) have data also in the left table (Participant Registration).

    Creating custom columns

    Custom columns in Metabase refer to the ability to create new calculated or derived columns in the query results. These "custom columns" allow users to perform various operations on existing columns, such as mathematical calculations or string manipulations, to obtain new information.

    Clicking the "Custom Column" button in Metabase opens a dialog or field, allowing you to input the expression for your custom column. Below this dialog, you can give a name to the newly calculated column. See the screenshot below:

    There is a list of pre-defined expressions that you can use in the expression field. Please see the list documented at the following link.

    EXAMPLE 1.

    Let's suppose we want to generate dynamic values that combine the participant's name and age. For this, we can use the pre-defined concat function. After entering the "concat" function into the expression field, we can access its definition and usage instructions. See the description below:

    The function will look like below:

    The result is:

    EXAMPLE 2.

    Suppose we want to calculate the total of birth-registered children and children attending school by adding them up. If we are unsure of the exact column name to input in the expression field, simply typing a partial column name will prompt Metabase to provide a list of suggested columns from the current table. See below:

    The expression will look like the following when entered into the expression field in Metabase:

    The result is:

    Filtering

    In Metabase, filtering refers to the process of narrowing down the dataset based on specific conditions or criteria. The query builder supports one or more filters.

    You will encounter different types of filters depending on the selected data type. There are three types of columns, each with its own set of filtering options:

    1. Text type - allows you to limit the results by searching for substrings within the column data or exclude empty results from your query. Additionally, you can enable the case-sensitive option to search for case-sensitive substrings.

    2. Date type - allows you to filter the results by specific date ranges, relative date ranges, or more granular time intervals.

    3. Numeric type - allows you to perform various mathematical operations on the dataset or exclude empty results.

    EXAMPLE 1

    Let's filter out the participants located in the 'Lagwas' neighborhood. See the applied filter below:

    The result is:

    EXAMPLE 2

    Let's filter out participants between the ages of 20 and 30, excluding those who are 25 years old. See the two filters applied below:

    The result is:

    Sorting Results

    In Metabase, sorting refers to the process of arranging data in a specific order based on one or more columns. For each column you select, you can choose whether to sort it in ascending or descending order.

    By clicking on it, you will get a drop-down list of all columns (questions) from the selected table. Then, you need to choose the columns by which the data will be sorted.

    EXAMPLE 1

    Let's consider sorting participants by their income in descending order. The current descending sort is displayed below, with the downward arrow indicating the descending order. To switch to ascending order, simply click on the arrow to move it upward.

    EXAMPLE 2

    Let's sort participants by neighborhood in ascending order and by last name in descending order. To achieve this, we need to select two columns in the sorting area and update the sorting type for each column. See the screenshot below:

    Setting a row limit

    Setting a row limit in Metabase involves specifying the maximum number of rows to be displayed in the query results or visualizations. It is a feature that allows users to control the amount of data presented in their analysis, particularly when dealing with large datasets.

    Clicking on it will open a numeric field where you can enter the limit number for the query result.

    EXAMPLE 1

    Let's suppose we want to showcase the initial two results from the registration table. The corresponding setting to be applied is represented in the image below:

    Summarizing and grouping by

    In Metabase, "summarizing and grouping by" refers to the process of aggregating and organizing data based on specific criteria. It involves creating summaries of data within specified groups, which can provide valuable insights for understanding the data and generating various types of reports.

    You can pick one or more metrics and optionally group those metrics by one or more dimensions (columns). When picking your metrics, you can choose from basic functions like sumaverage, and count, or you can create a custom expression by writing a formula.

    Click on "Pick the metric you want to see" to view the basic metrics available in the query builder:

    EXAMPLE 1.

    Let's assume we want to see the number of participants in each neighborhood. To achieve this, we will use the "count" metric and group the data by the neighborhood field or column. See the screenshot below:

    EXAMPLE 2

    Let's find out who is the oldest participant in our database. To achieve this, we will select the "maximum of" metric and then select the participant age column. See the screenshot below:

    After completing the query creation in the builder tool, the next step is to select the appropriate format for visualizing the data results. Various types of visualizations are available, including bar charts, pie charts, line charts, scatter plots, heat maps, histograms, and more. Each type should be chosen based on the nature of the data and the specific insights you want to emphasize.

    Did this answer your question?