backgroundImage

The next step for GX OSS: 1.0

Big updates, semantic versioning, and more coming in 2024

James Campbell
December 05, 2023
'1.0' spelled out in orange balloons on a background of dark blue

It’s finally happening: Great Expectations is going to 1.0.

In some important ways, we’ve been acting as a 1.0 product for a while now:

  • We have a clear deprecation policy.

  • We avoid breaking changes.

  • We have a strong support presence through our developer relations team and the community at large.

When you’re building a product you always need to be listening to your users, but the beginning in particular is a time to be taking in a lot of feedback. Which we have! 

And we’re going to turn a whole lot of that feedback into changes that we’ll implement in 1.0.

Long story short

GX 1.0 is going to be a lot simpler to use, contribute to, and maintain, with most of that coming from a significantly streamlined API. We expect to release GX 1.0 within four months.

Long story long

As part of our commitment to our open source community, we want to be as transparent as possible about what’s led us to take this (big, exciting) step.

What we learned

One of GX’s driving principles from day one was to empower developers to work any way they wanted. It turns out it’s possible to take this too far.

GX doesn’t dictate any particular process or approach, and it supports all kinds of workflows. That’s because the GX API currently supports many ways of accomplishing most things, with lots of configuration options, most of which you can also override at runtime. Avoiding breaking changes while adding capabilities meant that GX developed a huge surface area.

From the point of view of someone highly experienced who's fully implemented their pipeline, having been allowed to do that implementation any way they wanted to is great.

From the point of view of anyone learning GX, or actually in the midst of an implementation, these layers of options on top of options are less great.

We starting hearing very consistent feedback from the community:

  • The plethora of options made the learning curve too steep. 

  • The constant need to pick how to accomplish the next step left users feeling lost even when they used our documentation. 

  • People struggled to understand what settings belonged in the configuration versus in code, and to grasp their asset and test definitions.

We started to hear similar refrains internally, too:

  • Building full test coverage and maintaining the codebase was increasingly difficult, and subtle bugs in configuration management and reconciliation ate up engineering time. 

  • To meet community demand for clearer start-to-finish guidance, our technical writers were facing the prospect of documenting a nearly-infinite number of potential workflows. 

  • During troubleshooting, our developer advocates had to spend a lot of time reasoning about the state of the user’s GX implementation.

We’ve taken some steps toward addressing this feedback already: for example, with our Fluent Data Sources. But it’s pretty conclusive that piecemeal work isn’t going to cut it. 

We need to take bold, decisive action.

What we’re going to do

One reason that we wanted to give developers so much control at GX’s inception is that our founders didn’t necessarily know which of the many workflows that data scientists or engineers use for data quality would be the best ones.

But times have changed and so have we. We’ve talked to many (many) people looking for data quality, both GX users and not. Our collective institutional knowledge has grown by orders of magnitude—from an original team of just two, there are now more than 50 people on the GX team!

In summary: we now have much more informed opinions about how things should be done, and we’re going to reflect them in the GX codebase.

We’re going to streamline the API in a big way. At the end, GX will have a clear expected path with a right way to accomplish common data quality tasks and centralized settings. Configuration will be hugely simplified, with fewer ways to accomplish each thing.

These changes, and the time and energy that will be freed up on the GX engineering and developer relations team as a result, will have a huge positive impact on the GX experience across the board:

  • GX will be easier to use, with configuration that’s much more straightforward.

  • Documentation will be less complex and have more substantive guidance.

  • Contributors will find it simpler to understand and implement their ideas.

  • New releases will be able to focus on adding functionality more than on fixing bugs.

You might be wondering: how will the changes do these things? Here are some examples:

To improve the interactive Expectation experience, we’re going to add stronger typing for Expectations, ensuring you can reason about (or be reminded of) the Expectations’ arguments without having to refer to the documentation. We’re also going to completely rework the Validator, moving it into the background and clearly differentiating the workflow for building Expectations from validating them in a pipeline.

To improve the Batch and Checkpoint experience, we’re going to radically simplify the configuration options. For example, there are currently 19 parameters for

run_checkpoint
, including 6 dictionaries or lists of dictionaries. This is too many parameters, and we plan to reduce them by a factor of… 19 (well, almost). We’re also going to be giving a lot of attention to the usability of
result_format
.

To improve the Data Source experience, we’re going to remove block configuration for Data Sources and say farewell to

test_yaml_config
. We’ve been transparent that Fluent Data Sources—shorter, clearer, and easier to use—are the future ever since we introduced them, and the move to 1.0 is the right time to cut the old approach loose. We're also going to improve the way users can connect to their data and use splitters to validate just what they need.

To improve the data profiling experience, we’re going to remove Data Assistants. They were an ambitious project we were excited about, but after a lot of feedback from the community it’s clear we missed the mark on execution, especially around contributor experience and performance.

We’ve learned a lot about what we can do better and have plenty of ideas, so we’re taking GX 1.0 as an opportunity to reset data profiling in GX. While Data Assistants will be departing, profiling is critical and will return!

Finally, we’re going to officially adopt semantic versioning, as per semver.org. Formalizing semantic versioning is a natural step toward making it easier to introduce GX into new environments, and it will let us use version numbers to more clearly indicate when breaking changes do occur.

How we’re going to do it

We want GX 1.0 to be in your hands within four months. This is going to take a ton of work from our engineering team, so we’re going to take a few steps to maximize their effectiveness.

First of all, we have created an 0.18 branch, which will be the final 0.x version of GX OSS. We’ll push critical fixes to the 0.18 branch over the next several months, but we won’t be adding new features or fixing non-critical bugs in it. 

We remain committed to working in the open, so we’re also going to create and publish 1.0 pre-releases from the https://github.com/great-expectations/great_expectations develop branch. You’ll be able to see our work in progress there anytime.


We’re incredibly excited to bring you GX 1.0 early next year!

To be among the first to hear future updates about GX 1.0, grab an invite to our monthly community meetups: https://greatexpectations.io/meetup.

More blogs

A large fuzzy bee visits some purple flowers
Blog
GX does not move or change your data
Erin Kapp
May 09, 2024
We take a deep dive into GX’s interaction with your data to put some misconceptions to rest...

Search our blog for the latest on data quality.


©2024 Great Expectations. All Rights Reserved.