#47 - Efficient Data Management Strategies

How ETL and Reverse ETL Transform Knowledge Work and Improve Efficiency

Nov 05, 2024

Introduction

I recently engaged in discussions around data strategies like ETL, reverse ETL and ELT. These buzzwords play a significant role in the data world. When you think about it, data is all around us—it shows up in project tracking, pull requests, and online courses. Given that nearly everything we do involves some form of data exchange, what can we learn from these strategies to improve how we handle data and ultimately become more effective?

Decision-Making with ETL

When we need to make a quick decision, we usually know what outcome we’re aiming for and what data we need to rely on. In these cases, we can use ETL, the most basic and common approach. ETL stands for Extract, Transform, and Load. It consumes more time because extraction and transformation take effort, but it’s effective in reaching a decision.

Take, for example, making a decision about a specific project. Extract would mean gathering all relevant data about the project—goals, tasks, estimates, and potential impacts—into one place. Transform would involve creating a brief or summary that’s digestible for others. Finally, Load could be presenting this information in a meeting or sharing it in an offline document for everyone to review. This one-time process helps in scenarios where you know exactly what you need to make an informed decision.

Gather Insights for Future Decisions with Reverse ETL

Sometimes, the decision you’re gathering insights for is still in the future, or you don’t yet need to make it. In these cases, you might have insights and data points you want to consider and save for later, even if you’re unsure what they’ll be used for or when. This is where reverse ETL becomes very useful.

You can start by extracting and transforming the data point into something digestible—perhaps a table where you log the insight, link, impact, and potential future relevance. Then, later, when you need to make a decision, you have a resource to refer back to. To demonstrate, imagine you collect customers' feedback on some unmet need over time, and once the moment comes and the company wants to kick off a new solution in that area, you already have a bunch of insights to rely on.

Enriching Data Insights with ELT

In some situations, you want to enrich data insights with additional dimensions and data points. For example, when someone opens a Jira ticket, you might want to capture details like priority, team, component, size, and related documents. However, if opening a Jira ticket requires filling out 15 different fields, it becomes time-consuming, overly complex, and ultimately less efficient.

In such cases, it might make sense to use an ELT approach. ELT stands for Extract, Load, Transform. Imagine that you only provide the title and description when opening the Jira ticket. In the description, you might mention priority and components in free-text format. A separate process then transforms this initial input into the structured fields.

Another example could be feature requests. When someone opens a feature request for your product, you don’t want to create obstacles by requiring extensive details. They can simply explain what they want. Later, the product team can transform this information by adding custom fields, assigning categories, or rephrasing the feature request in standard terms. This approach is ideal for cases where you need a more efficient, fast, and scalable way to ingest data.

Schedule Sync vs. Real-Time

One approach to gathering product insights, such as customer requests, friction points, or feature usage, is Schedule Sync. With this method, you review various insights from multiple sources on a recurring basis (e.g., weekly). You collect relevant insights and push them into a chosen destination—whether it’s a database, a document, or a spreadsheet.

The other approach is Real-Time. Whenever you encounter a data point related to the product, like a support chat, a public review, or a feature request, you immediately extract, transform, and load it into the destination.

The difference is that the Real-Time approach ensures data freshness but can compromise consistency. On the other hand, Schedule Sync offers more consistency and a reliable view of the data, though it comes with a delay before insights are available.

Choosing between these depends on the use case. Real-Time is usually preferred when the timing is critical, and you must act quickly. This is often the case with new competitors in the space, strategic feature requests, or urgent bugs.

Stay Up-to-Date with CDC

Imagine you’re tracking a major feature release and monitoring where you stand on a weekly basis. You need information on the engineering tasks, marketing campaign plan, and the design studio collateral. So, you gather the marketing tasks from Monday, you pull in Gantt charts from JIRA, and you build your own view of the project and its expected launch date.

As you review this week by week, you need to ensure that your destination—whether a slide or a spreadsheet—stays up-to-date. Instead of rechecking all tasks and repeating the entire process, you can use a Change Data Capture (CDC) approach. With CDC, you can query and extract only the items that have changed and perform ETL on those updates.

For instance, you could set up a view to show JIRA tickets updated in the last seven days or retrieve the latest updates from Monday boards. This makes keeping your destination data current much more efficient and less time-consuming.

Conclusions

As knowledge workers, much of our work involves data ingestion, manipulation, and management. Understanding the patterns and strategies used to handle data in the software world can enhance our productivity and improve our day-to-day workflows.

Harmonizing Complexity

Discussion about this post