Skip to main content
Version: 0.16.8

How to connect to a SQL database

Introduction

In this guide we will demonstrate how to connect Great Expectations to SQL databases. GX uses SQLAlchemy to connect to SQL data, and therefore supports most SQL dialects that SQLAlchemy does. For more information on the SQL dialects supported by SQLAlchemy, please see SQLAlchemy's official documentation on dialects.

Prerequisites

This guide assumes you have:
  • An installation of GX set up to work with SQL
  • Source data stored in a SQL database
  • A passion for data quality

Steps

1. Import GX and instantiate a Data Context

The code to import Great Expectations and instantiate a Data Context is:

import great_expectations as gx

context = gx.get_context()

2. Determine your connection string

GX supports a variety of different SQL source data systems. However, most SQL dialects have their own specifications for how to define a connection string. You should reference the corresponding dialect's official documentation to determine the connection string for your SQL Database.

Some examples of different connection strings:

Here are some examples of connection strings for various SQL dialects. GX also has dialect-specific guides on setting up any extra dependencies, configuring credentials, and using the advanced block-config method of connecting to these particular SQL database types. These guides are included as the links in the following list of connection string formats.

For purposes of this guide's examples, we will connect to a PostgreSQL database. Here is an example of our connection string, stored in the variable sql_connection_string with plain text credentials:

Python code
sql_connection_string = "postgresql+psycopg2://username:my_password@localhost/test"
Is there a more secure way to include my credentials?

You can use either environment variables or a key in config_variables.yml to safely store any passwords needed by your connection string. After defining your password in one of those ways, you can reference it in your connection string like this:

Python code
connection_string="postgresql+psycopg2://<USERNAME>:${MY_PASSWORD}@<HOST>:<PORT>/<DATABASE>"

In the above example MY_PASSWORD would be the name of the environment variable or the key to the value in config_variables.yml that corresponds to your password.

If you include a password as plain text in your connection string when you define your Datasource, GX will automatically strip it out, add it to config_variables.yml and substitute it in the Datasource's saved configuration with a variable as was shown above.

3. Create a SQL Datasource

Creating a SQL Datasource is as simple as providing the add_sql(...) method a name by which to reference it in the future and the connection_string with which to access it.

Python code
datasource = context.sources.add_sql(name="my_datasource", connection_string=sql_connection_string)

Next steps

Now that you have connected to a SQL database, next you will want to: