This post follows up on our live webinar, "Top data quality tactics to boost trust and improve collaboration." In this webinar, I shared issues that data practitioners often face and how GX can help them build solutions to solve those issues. I also showed a live demo of GX Cloud, with special emphasis on the collaboration features that allow data practitioners to test their data and share the results with their teams.
You can view the webinar on-demand here.
At the end of the webinar, we had a few minutes for an audience Q&A. In this post, I've collected all the questions that were asked and their answers—including the questions we weren't able to get to live.
Expectations
How many Expectations does GX Cloud have?
GX Cloud has 44 Expectations built in. If none of these meet your needs, you can also add your own custom SQL Expectations to GX Cloud.
Most GX Core Expectations are available in GX Cloud. However, you can also optionally use the GX Core framework to access three additional Expectations that have not yet been implemented in GX Cloud directly.
Does GX Cloud support custom Expectations?
Yes, GX Cloud supports creating custom Expectations with SQL.
Using GX Cloud
Where can I see more examples of how I might use GX Cloud?
You can see example use cases in our documentation’s "Learn" section.
Who is the typical user of GX Cloud, and where do they sit in the organization?
We often see members of data engineering teams using GX Cloud, as well as members of other data teams such as data analysts. Because of our emphasis on collaboration with GX Cloud, teams are able to include nontechnical data consumers such as stakeholders in GX Cloud, sharing results with them and collaborating on creating and managing Expectations.
GX Cloud and data privacy
What is the data privacy position of GX Cloud?
GX Cloud does not store data from the source that it connects to, and it has read only access when it connects. You can find additional information about our data privacy position in our Trust Center.
Does GX Cloud store data it accesses?
GX Cloud does not move or change data from the source that it connects to, and it has read only access. GX Cloud's overview tab for a Data Asset contains statistics it gathers.
GX Cloud also retains metadata about your data validation in order to display the results of your validation. By default, this includes unexpected values found for failing Expectations. You can change the scope of this metadata by using GX Core to create your original Cloud Data Context with a different result format. For more information, see this link.
GX Cloud and data sources
Does GX Cloud work with Redshift?
Currently, you can’t directly add a Redshift data source using GX Cloud, although this is on our roadmap for early 2025.
To work with Redshift Data Source in GX Cloud right now, you can use GX Core and the GX API to create and connect to a generic SQL Data Source that’s backed by Redshift. You can then send validation results for this Data Source to GX Cloud. For more information, see our documentation and the SQLAlchemy documentation on dialects.
Does GX support DataFrames?
GX Core supports data frames directly. GX Cloud does not currently connect directly to data frames, but you can use GX Core to integrate data frames with GX Cloud. For more information, please see our documentation.
How does GX Cloud differ from dbt?
We’re big fans of dbt here, but there are definitely ways that GX offers additional value in data quality testing. Even in single-source, single-team deployments within a SQL warehouse:
GX has a richer vocabulary of tests (Expectations) and tools for developing tests than dbt does
GX creates more metadata around the test runs
GX’s validation results are also documentation artifacts which update automatically
Once a pipeline has multiple pieces, languages, and/or teams, GX’s higher expressivity and documentation really start to shine. At that point, GX becomes a translation layer for data quality and documentation:
GX’s test suites (Expectation Suites) let users easily run the same tests against data in different tables and different backends (ex. Snowflake and many other SQL databases)
GX’s validation results are readable by both machines and people (even nontechnical ones) so non-engineering SMEs and stakeholders can work with them right out of the box
Collaboration is such a critical component of a successful data quality project—GX making it easier for you to communicate with nontechnical stakeholders is one of the big reasons that you’d choose it over dbt.
We also have a full tutorial on how to integrate dbt with GX Core.
Other
Does GX Cloud have more than one Expectation Suite or Checkpoint?
Under the hood, GX Cloud provides a default Expectation Suite and Checkpoint for each Data Asset. Those are "gx-managed" Expectation Suites and Checkpoints and are not modifiable by the user. Users who wish to create additional Expectation Suites and Checkpoints can do so via API, using GX Core.
Is the GX library available to be used by data engineers?
Yes. The heart of the GX platform is GX Core, which is the world's most popular data quality framework. GX Cloud always uses GX Core behind the scenes to connect to data, create Expectations, validate data, etc. GX Cloud adds on a user interface and a variety of collaboration-focused features.
Interacting directly with GX Core as part of GX Cloud usage is an option for interested users. The GX Core repository is available free and open source on GitHub.
Further resources
If you have other questions about GX Cloud that aren't answered here, we invite you to check out our GX Cloud FAQ. You're also welcome to email our support team at support@greatexpectations.io or reach out via our community Slack channel or Discourse forum.