The Software/Data Engineering Pendulum

If you've read The Engineer/Manager Pendulum by mipsytipsy, you'd already be familiar with engineering pendulum: going back and forth between roles. On her post, it's about EM/IC pendulum. Not sticking to one role for the rest of your life, but moving from an IC to an EM to IC to EM to IC and so on.

If you haven't read her post, I really recommend you to read it. It's very interesting. I agree with her post and I do the pendulum myself. I started as an IC (of course), and then "promoted" to EM, and then I got an opportunity to be an IC again.

But not only EM/IC pendulum, I'm also doing the other axis of the pendulum: Software/Data Engineering pendulum.

This isn't a pendulum that has been discussed a lot before, so I want to tell it in this post. Because it's been fun for me!

Software/Data Engineering Differences

Some of you might think "what's so pendulum about that? aren't they both the same, like... coding?".

Yeah, they're both coding. But also, differently. When I first got the Data Engineering role, I learned from lots of materials and I went "what is that? what's that for? we need to do that? what's that doing???" Like it was a whole new different world for me. My software engineering coding skill was like helping just a little.

For software engineers: Data engineering is building the systems that collect, process, and prepare data so it can actually be used. It's pipelines, ETL jobs, data quality checks, managing data catalog, handling late-arriving data, dealing with schema changes, handling "wrong" numbers, handling "data not updated" complaints, making sure data and dashboards are accessible by everyone in the company when we have a spike of usage every end of month, quarter, year. It's batch jobs that run at 3am, backfills that take days, and finding out three weeks later that you accidentally used data that was actually smoke test data.

For data engineers: Software engineering is building the product that users actually see and use. It's applications, internal tools, APIs, frontends, real-time systems, handling user input, making things fast and responsive. It's deploys that happen 10 times a day, feature flags, A/B tests, and finding out immediately when something breaks because users are screaming.

The key difference is software engineering is right now, data engineering is everything that already happened.

BUT

From the user's perspective, from the business perspective, it's all one system.

They are different, but they should be ONE

A user clicks "buy" on your app. That's software engineering. Handle the request, process payment, update inventory, show confirmation.

But then the business wants to know. How many people bought today? Which products are trending? What's our conversion rate? That's data engineering. The click event got logged, went through a pipeline, got transformed, ended up in a warehouse, and now someone can query it.

If one side fails, the whole thing fails. From the eyes of the users, they don't care whether it's software engineering side or data engineering side. What they care is it fails. An error is an error.

The gap between those two teams

Unfortunately, both sides sometimes don't understand the perspective of the other.

Software engineers don't realize that it actually doesn't end after the order is processed and payment is received. People in the company are actually using that data for their reviews and decision making. They are business team, sales, marketing, product, etc. Whatever data is submitted, good or bad, complete or missing, valid or not, they will see it. Even years after the event has happened. Data engineering team continue the process from there to serve it to the people in the company.

Data engineers don't realize that software is a living being. It's not something without bug fixes and without improvements for new features. It will and it must change. That's how they support the company's business, by always improving and expanding. Those changes might affect their data pipelines. Meaning, data pipelines are also a living being. And they must embrace it, not deny it.

Nobody's wrong. Both sides are optimizing for different things.

One very most classic famous example: Software team deploys a new feature, changes a field from user_id to account_id. Makes sense for the product. But the data pipeline breaks because it's still looking for user_id, and nobody notices until the weekly business review when all the user metrics are zero. Software team thinks "well it's your problem". Data team thinks "well you're the cause". Very unproductive. Much preventable.

Being the pendulum

I started as a software engineer. Built apps, web apps, APIs, shipped features. Then I moved to data engineering. Built pipelines, worked on data infrastructure, made dashboards work.

It was humbling. I was a senior software engineer and suddenly I was a noob again. I didn't know Airflow. I didn't know dbt. I didn't know BigQuery. I didn't understand why everyone was so paranoid about schema changes. I wrote a backfill job that seemed fine but actually had a bug that took two weeks to find.

But after that, I became the bridge.

When software engineers complained about data team requests, I got it. But I also understood why the data team needed it. When data engineers complained about schema changes, I got it. But I also understood why the software team shipped without waiting.

I could translate. I could explain. I could say "hey, if we just add this one field now, it'll save us three weeks of pain later" in a way that software engineers understood. I could say "hey, if we make this pipeline a bit more flexible, we won't block the product team every time" in a way that data engineers understood. I can see their blind spots. I can fill what they're missing. I can communicate to both sides. When we are building a new features, or a new app, I can see the end-to-end usage of this ONE system, both from users (customers) perspective and users (sales/marketing/finance/management) perspective.

Why you should try both, and be noob again (at least at first)

I mean, I'm not saying everyone needs to do this. But if you've been only on one side, and you're curious about the other, and been thinking about it, just try it.

You'll be a beginner again. You'll watch YouTube again looking for tutorials. You'll enroll on a course again. You'll write bad code again (oh wait, you still do?). That's fine. That's how learning works. And that's awesome!

But you'll understand the whole system. You'll understand why that engineering team make the decisions they make. You'll stop scratching your head because of them. You'll realize they're not being difficult. They're solving different problems.

I think companies need more people who understand both sides. Someone who understand about designing an app that's both usable for users AND easy to get data out of. Someone who understand about building a data pipeline that's both reliable AND doesn't block the product team for weeks. Someone who can actually make the end-to-end process work. (lol shameless plug, I'm open to work guys)

I know not everyone has the opportunity to switch (or want to switch). But if you do, don't be afraid. Listen to mipsytipsy. You don't have to choose one and stick to it for the rest of your life. Do software engineering for a while. Then do data engineering. Then you can go back, if you want. You can always go back. It's called a pendulum, not a hole in that 300 movie.