Don't Smoke Your Database With This Common Mistake
Third-Party API Calls Inside Database Transactions
It’s an easy mistake to make, and an intuitive one. You’re building a command that does a few things: call Stripe a couple of times, get back payment objects, and write the results to the database. You want the whole thing to be atomic — if anything fails, nothing should be persisted. So you wrap everything in a transaction. Done, right?
Not quite. This pattern — third-party API calls inside database transactions — is one of those things that looks correct on paper but causes real pain in production. I’ve seen it take down services at exactly the wrong times. Let me explain why, and what to do instead.
The Intuitive Approach
Imagine you’re scheduling a payment. The operation requires three calls to Stripe: create a customer, attach a payment method, and schedule a charge. After each call you get back an object and you want to persist it. Something like this:
def call
ActiveRecord::Base.transaction do
customer = Stripe::Customer.create(email: user.email)
db_customer = StripeCustomer.create!(stripe_id: customer.id, user: user)
payment_method = Stripe::PaymentMethod.attach(
payment_method_id,
customer: customer.id
)
db_payment_method = StripePaymentMethod.create!(
stripe_id: payment_method.id,
stripe_customer: db_customer
)
charge = Stripe::PaymentIntent.create(
amount: amount_in_cents,
currency: 'usd',
customer: customer.id,
payment_method: payment_method.id
)
StripeCharge.create!(stripe_id: charge.id, stripe_payment_method: db_payment_method)
end
end
The appeal here is obvious. If the third Stripe::PaymentIntent.create call blows up, the transaction rolls back and none of those partial records land in the database. Clean and atomic.
The problem is what you’re doing to your database in the meantime.
The Latency Mismatch Problem
A database transaction is not a free operation. Depending on your isolation level, it holds locks — at minimum on the rows it has written to, potentially on the pages or tables those rows live in. Those locks are held for the duration of the transaction.
On a well-provisioned system — say, two services on the same AWS availability zone — a straightforward database operation (a simple SELECT, INSERT, or UPDATE) should complete in well under 10 milliseconds. A batch of several such operations inside a single transaction? Still probably under 10ms total.
Stripe API calls, on the other hand, regularly take 500–700ms. Sometimes more. There’s no amount of infrastructure tuning you can do on your end to change that. You’re at the mercy of Stripe’s servers and the public internet.
So what you’ve actually built is a transaction that holds database locks for the better part of a second, per call, times however many Stripe calls you need to make. Three Stripe calls? You’re looking at a transaction that holds locks for potentially 2 seconds or more. And if you have any concurrency at all — multiple workers, background jobs, web requests — those locks start competing.
What This Looks Like in Production
At Mudflap, we had a background job that ran every morning to update fuel prices and validate a large number of cached objects. This job made multiple third-party API calls and also needed to commit a significant number of database updates. At some point, someone had wired this up with the API calls living inside transactions.
During the morning run, we started seeing cascading database performance issues. Connections were getting held. Queries were backing up. The combination of high-concurrency writes and long-running transactions caused severe lock contention — and the morning price update, which customers depend on at the start of their day, was the worst possible time for it.
The fix was straightforward once we diagnosed the problem: pull the API calls out of the transactions entirely.
The Pre-Transaction Pattern
The solution is to restructure your command object so that all third-party API calls happen in a dedicated pre-transaction step. The results are stored as instance variables on the command object. Then, in a separate step, you open the transaction and do nothing inside it except write to the database.
Here’s what that looks like:
class SchedulePaymentCommand
def call
fetch_stripe_objects # pre-transaction step: all API calls happen here
persist_to_database # transaction step: DB writes only, no network I/O
end
private
def fetch_stripe_objects
@stripe_customer = Stripe::Customer.create(email: user.email)
@stripe_payment_method = Stripe::PaymentMethod.attach(
payment_method_id,
customer: @stripe_customer.id
)
@stripe_charge = Stripe::PaymentIntent.create(
amount: amount_in_cents,
currency: 'usd',
customer: @stripe_customer.id,
payment_method: @stripe_payment_method.id
)
end
def persist_to_database
ActiveRecord::Base.transaction do
db_customer = StripeCustomer.create!(
stripe_id: @stripe_customer.id,
user: user
)
db_payment_method = StripePaymentMethod.create!(
stripe_id: @stripe_payment_method.id,
stripe_customer: db_customer
)
StripeCharge.create!(
stripe_id: @stripe_charge.id,
stripe_payment_method: db_payment_method
)
end
end
end
The transaction is now doing only what it should: coordinating a set of fast, local database writes. Lock hold time drops from ~1.5 seconds to under 10ms.
“But What About Atomicity?”
This is the obvious objection. In the original pattern, if anything fails, the transaction rolls back. With the pre-transaction pattern, what happens if the API calls all succeed but then the database write fails?
It’s a fair concern, but it’s also fully solvable — and handling it explicitly is actually better than relying on a transaction rollback, because a rollback doesn’t undo what already happened on Stripe’s end anyway. If you made three Stripe API calls before the DB write failed, rolling back your database changes leaves you with orphaned Stripe objects. The transaction gives you a false sense of safety.
The right pattern is a structured error handling routine on the command object — something that has a specific job: clean up whatever API-side state was created if the overall operation fails. Since the Stripe responses are stored in instance variables, the error handler still has access to them:
class SchedulePaymentCommand
def call
fetch_stripe_objects
persist_to_database
rescue => e
handle_error(e)
raise
end
private
def fetch_stripe_objects
@stripe_customer = Stripe::Customer.create(email: user.email)
@stripe_payment_method = Stripe::PaymentMethod.attach(
payment_method_id,
customer: @stripe_customer.id
)
@stripe_charge = Stripe::PaymentIntent.create(
amount: amount_in_cents,
currency: 'usd',
customer: @stripe_customer.id,
payment_method: @stripe_payment_method.id
)
end
def persist_to_database
ActiveRecord::Base.transaction do
db_customer = StripeCustomer.create!(
stripe_id: @stripe_customer.id,
user: user
)
db_payment_method = StripePaymentMethod.create!(
stripe_id: @stripe_payment_method.id,
stripe_customer: db_customer
)
StripeCharge.create!(
stripe_id: @stripe_charge.id,
stripe_payment_method: db_payment_method
)
end
end
def handle_error(error)
# Cancel whatever was created on Stripe's end, if anything
Stripe::PaymentIntent.cancel(@stripe_charge.id) if @stripe_charge
Stripe::PaymentMethod.detach(@stripe_payment_method.id) if @stripe_payment_method
Stripe::Customer.delete(@stripe_customer.id) if @stripe_customer
Rails.logger.error("SchedulePaymentCommand failed: #{error.message}")
end
end
The key insight: handle_error is a method designed to be overridden in subclasses. It’s not a catch-all — it’s the specific, intentional cleanup routine for this command. Each command knows what it created and knows how to undo it. That’s a cleaner contract than hoping a database rollback somehow undoes side effects that already happened in an external system.
The Container Shutdown Problem
There’s a harder failure mode that doesn’t get talked about as much, and I want to be upfront: it’s a problem with the pre-transaction pattern specifically, but it’s also a problem with the original wrapped-transaction pattern. Moving the API calls outside the transaction doesn’t introduce it — it already existed. It just becomes more visible when you start thinking carefully about the pre-transaction step in isolation.
Here’s the scenario. You have a container running in production. A deploy kicks off, the orchestrator sends a SIGTERM, and the container is supposed to drain in-flight requests before shutting down. Supposed to. In practice, containers sometimes get killed before they finish draining — either because the shutdown grace period is too short, the orchestrator loses patience, or the drain logic itself has a bug.
Now imagine that kill signal arrives at exactly the wrong moment: after a Stripe call has been dispatched from the pre-transaction step, but before the response has been received and written to the database. The HTTP request is already in flight to Stripe’s servers. From Stripe’s perspective, the operation completes successfully — the customer, payment method, or charge now exists in their system. From your system’s perspective, the process is dead. The response never arrived. Nothing was committed to the database. The operation simply didn’t happen as far as your records are concerned.
This is sometimes called a “ghost” operation: a real side effect in an external system with no corresponding record in yours.
What happens next depends on how your retry logic works. If the job is re-enqueued on restart and the operation runs again without any memory of what the previous attempt created, you make the same Stripe API calls again. Without idempotency keys, Stripe treats each of these as a new, independent request. In the case of Stripe::PaymentIntent.create, that means two separate charges get created for the same order. The customer gets billed twice.
Idempotency keys are the standard mitigation here, and Stripe supports them well. The idea is to generate a stable, deterministic key for each logical operation — typically derived from something like the order ID and the operation type — and pass it to Stripe on every call. If Stripe receives two requests with the same idempotency key within a 24-hour window, it returns the result of the first request instead of processing a new one.
def fetch_stripe_objects
idempotency_key = "schedule_payment_#{order.id}"
@stripe_customer = Stripe::Customer.create(
{ email: user.email },
{ idempotency_key: "#{idempotency_key}_customer" }
)
@stripe_payment_method = Stripe::PaymentMethod.attach(
payment_method_id,
{ customer: @stripe_customer.id },
{ idempotency_key: "#{idempotency_key}_payment_method" }
)
@stripe_charge = Stripe::PaymentIntent.create(
{
amount: amount_in_cents,
currency: 'usd',
customer: @stripe_customer.id,
payment_method: @stripe_payment_method.id
},
{ idempotency_key: "#{idempotency_key}_charge" }
)
end
With idempotency keys in place, a retry after a botched shutdown is safe: Stripe deduplicates the calls and returns the original response objects. Your retry can then persist those results to the database as if the first attempt had never been interrupted.
But idempotency keys don’t fully solve the problem. They protect you from creating duplicate objects in Stripe, but they don’t help you recover the original response after a container death. When the retried job runs, it still calls Stripe fresh and gets a response — it just happens to be the same response as before. The window between “API call succeeded” and “result written to the database” is still a gap where a process death causes silent data loss. And depending on how you structure retries, you may not even know the first attempt ever started.
At the time of writing, we’re still working on a robust solution to this. The approaches worth exploring involve making the pre-transaction step itself more durable — writing an intent record to the database before making the API call, so that a retry can detect partially completed operations and reconcile them. That’s a more involved pattern, and it deserves its own writeup.
The Rule
Here’s how I think about it now: a database transaction should contain only database operations. Any call that goes over a network — to Stripe, to a payment processor, to any external service — should live outside the transaction. Make all your external calls first, validate that the results look sane, then commit everything to the database in a fast, tight transaction. Handle failures explicitly with cleanup logic that knows what it actually needs to undo.
Transactions are a powerful primitive. They’re designed for coordinating local, fast, reliable operations. Stretching them to cover slow, unreliable network calls doesn’t make your system more correct — it just makes it slower and more fragile at the same time.

