Publishing Pipeline – Refactoring

From a One-Off Script to a Publishing Platform

Three Weeks of Refactoring, Learning Python, and Building Something That Scales

Three weeks ago, my publishing “pipeline” was exactly what many automation projects start as: a single Python script, built to solve a single problem, for a single platform.

It worked — until it didn’t.

Today, that script has evolved into a modular, extensible publishing platform that can target multiple services, reuse logic cleanly, and grow without collapsing under its own weight. Along the way, I learned more about Python, architecture, and disciplined automation than I had in the previous three years combined.

This post is a recap of that journey, what changed technically, why it matters, and where this project is heading next.


Where We Started

The original setup had some clear limitations:

  • One script, one platform
  • Tight coupling between:
    • content loading
    • publishing logic
    • database access
  • Minimal state tracking
  • Hard to debug
  • Hard to extend
  • Impossible to reason about once it grew past a few hundred lines

In short: it worked, but it didn’t scale — neither technically nor mentally.

Here is what the flow looked like:

monolithic-script

Characteristics

  • One script owns everything
  • Platform logic intertwined
  • Hard to add new platforms
  • Any change risks breaking unrelated behavior
  • Re-running safely is difficult

The New Architecture: Clear Responsibilities Everywhere

Over the past weeks, the pipeline was redesigned around layers, each with a single responsibility.

1. Pipeline Layer (Orchestration)

Responsible only for flow:

  • Load posts
  • Process media
  • Inject backlinks
  • Publish to enabled platforms
  • Track results

No platform-specific logic lives here.


2. Publisher Layer (Per Platform Semantics)

Each platform gets:

  • a Publisher (what to publish, when, and how)
  • a Client (how to talk to the API)

This distinction turned out to be crucial.

For example:

  • WordPress supports updates
  • Dev.to supports updates with constraints
  • X (Twitter) is publish-once-only

That difference now lives exactly where it belongs: inside the publisher, not smeared across the pipeline.


3. Client Layer (Pure API Communication)

Clients do one thing only:

  • authenticate
  • send requests
  • return normalized results

There is

  • No database access
  • No publishing decisions
  • No content logic

This makes them:

  • testable
  • replaceable
  • reusable

4. Database Layer (State & Deduplication)

The database now tracks:

  • what was published
  • where
  • with which content hash
  • which media assets already exist
  • canonical URLs

This enables:

  • idempotent runs
  • safe re-publishing
  • deduplication of media
  • correct canonical handling across platforms

This is what the new flow looks like:

new-architecture

What Changed

  • Pipeline only controls flow
  • Publishers decide what to do
  • Clients only talk to APIs
  • Database provides state and idempotency
  • Each part is testable and replaceable

This is the turning point where the system stops being fragile.


Why This Is Faster (and Calmer)

One unexpected benefit: performance and reliability improved naturally.

Not because of clever optimizations, but because:

  • Small modules load faster
  • Less global state stays in memory
  • Failures are isolated
  • Logs are precise and meaningful
  • Re-runs are cheap and safe

Instead of hoping nothing breaks, the pipeline now knows when nothing needs to happen.


Adding a New Platform Is Now Boring (That’s a Compliment)

To add a new platform today, the steps are:

  • Write a client (platforms/foo_client.py)
  • Write a publisher (publishers/foo.py)
  • Wire it into the composition root

No changes to:

  • media handling
  • backlinks
  • canonical logic
  • database schema
  • pipeline flow

That’s the kind of boring you (well, I for certain) want.


Media, Backlinks, Canonicals — All Solved Properly

Some highlights that are now just working:

  • Media deduplication via hashing → upload once, reuse everywhere
  • Featured images correctly attached per post
  • Backlinks injected deterministically
  • Canonical URLs handled centrally and propagated correctly
  • Per-platform semantics respected (no more accidental re-posting)

None of this required hacks — only the right separation of concerns.


What I Learned (The Unexpected Part)

This project wasn’t about learning Python — but it turned into exactly that.

In three weeks, I learned more about:

  • Python module design
  • data modeling
  • immutability vs mutation
  • error handling
  • API semantics
  • architectural boundaries

…than I had in the previous three years of “occasionally writing Python”.

The key insight:

Python becomes dramatically easier once the architecture is clean.

Most complexity wasn’t Python at all — it was unclear responsibility.


What’s Next: Roadmap to v1.4.0

The current version 1.3.1 closes the stabilization phase. The next milestone, 1.4.0, will focus on capabilities.

Planned Features

1. Platform-Aware Update Strategies

  • Publish-once (X)
  • Update-in-place (WordPress)
  • Conditional update (Dev.to)
  • Future: quote-tweet or follow-up logic

2. Rate Limiting & Scheduling

  • Per-platform throttling
  • Optional delayed publishing
  • CI-safe dry-run mode

3. Observability Improvements

  • Structured logs
  • Optional JSON output
  • Better failure summaries

New Platforms (Starting with One)

At least one new platform will be added in v1.4.0.

Candidates:

  • LinkedIn (high priority)
  • Xing
  • Indeed (content syndication angle)

Later roadmap:

  • Substack
  • Patreon
  • Possibly others

Thanks to the current architecture, adding these is now a contained task, not a rewrite.


And Yes — This Will Become a Series

This post is likely the first of a small series covering:

  • Python for infrastructure-minded people
  • Real-world refactoring
  • Automation that survives growth
  • Designing for change, not just success

Not tutorials — but honest write-ups of systems that evolved under real constraints.


Final Thoughts

What started as a utility script has become a small platform.

Not because it needed to be fancy — but because it needed to be understandable, extendable, and trustworthy.

And that, more than anything, made the difference.

Down the road it might become a service interested people can use for their own writing. Since I am developing and running my own kubernetes cloud, there should be nothing in the way of making this a container solution accessible for tenants.


Did you find this post helpful? You can support me.

Hetzner Referral
Confdroid Feedback Portal

#

Author Profile

12ww1160DevOps engineer & architect

Leave a Reply

Your email address will not be published. Required fields are marked *

seventeen − three =