The silent Horizon queue failure that cost us a full day of jobs

There's a particular class of Laravel Horizon bug that's dangerous precisely because everything looks fine. Horizon's dashboard shows green. Your workers are running. supervisord is happy. Your logs are empty. And yet somehow, over the course of eight hours, zero jobs have actually been processed.

The culprit is almost always the same: your application is dispatching jobs to one queue name, and your Horizon workers are listening on a different one. Nothing fails — the jobs just sit in Redis, piling up, while Horizon cheerfully reports success on a queue nobody's using.

This post walks through how to spot the bug, why it happens, and — most importantly — how Redis cluster curly-brace queue names turn it from an occasional nuisance into something that can quietly kill jobs for hours.

How the failure actually looks

The classic symptoms:

Horizon dashboard shows workers as "running", throughput graph is flat or near-zero
php artisan horizon:status returns running
Your application dispatches jobs without errors
Users report that background work (emails, reports, exports) never completed
Queue length in Redis is growing, not shrinking
No entries in failed_jobs — because jobs aren't failing, they're just never being picked up

The worst part is that your tests pass. In CI, you typically run jobs synchronously or through the array driver, so the queue name mismatch never shows up until production.

The fastest diagnostic

Before you touch any config, run this against your production Redis:

redis-cli --scan --pattern 'queues:*'

You'll see every queue Laravel has pushed work into. Compare that output against what Horizon is configured to listen on:

php artisan horizon:list

If the queue names in the first list don't match the queue names in the second, you've found your bug. Now you just need to understand why they don't match.

The three common causes

1. Dispatching to a queue name that doesn't exist in Horizon config

The simplest case. Somewhere in your code you have:

ProcessReport::dispatch($report)->onQueue('reports');

But your config/horizon.php only lists default and notifications:

'defaults' => [
    'supervisor-1' => [
        'queue' => ['default', 'notifications'],
        // ...
    ],
],

The reports queue gets populated in Redis, but no worker is watching it. Fix: add it to the Horizon supervisor's queue list.

2. Environment-specific queue names

Perhaps the most common version in multi-environment apps. You prefix queues per environment to keep them isolated:

// In a job
public function viaConnection(): string
{
    return 'redis';
}

public function viaQueue(): string
{
    return config('app.env') . '-emails';
}

But in Horizon config:

'queue' => ['production-emails'],

Staging dispatches to staging-emails, Horizon on staging listens for production-emails. Jobs pile up forever. This one bites hard on clone-from-production setups where someone forgot to environment-aware the Horizon config.

3. Redis cluster curly-brace naming (the sneaky one)

This is the version that cost me a full day. If you're running Redis in cluster mode, Laravel requires queue names to use curly-brace "hash tags" so all related keys land on the same node:

// config/queue.php
'redis' => [
    'driver' => 'redis',
    'connection' => 'default',
    'queue' => env('REDIS_QUEUE', '{default}'),  // <-- note the braces
    'retry_after' => 90,
    'block_for' => null,
],

The braces tell Redis cluster "hash only the part inside {} when deciding which node holds this key". Without them, the main queue key, the reserved set, the delayed set, and the notifications key can all land on different nodes — and job coordination falls apart.

The trap: this naming must be consistent everywhere. If your application dispatches to {default} but your Horizon config lists default (no braces), they are different queues as far as Redis is concerned. And because Horizon doesn't validate that its configured queues actually match real Redis keys, it'll happily report as running while listening on a queue that will never receive work.

The fix is boringly mechanical — make every reference use the same bracketed form:

// config/horizon.php
'defaults' => [
    'supervisor-1' => [
        'connection' => 'redis',
        'queue' => ['{default}', '{notifications}', '{emails}'],
        'balance' => 'auto',
        'processes' => 10,
        'tries' => 3,
    ],
],

And in any place you dispatch with an explicit queue name:

SendReport::dispatch($report)->onQueue('{reports}');

Monitoring so this never happens again

The core lesson I took from this: Horizon's own dashboard is not enough to tell you whether jobs are actually being processed. The dashboard tells you about workers, not about work.

Three pieces of monitoring I now add to every project:

1. Queue depth alerts. Use the laravel-horizon-prometheus-exporter package or a custom metric that reports Redis::llen('queues:{default}') to your monitoring system. Alert if any queue exceeds a sensible threshold (e.g., 500 jobs for default, 50 for high-priority queues). If the queue is full and Horizon says everything's fine, you've found the mismatch fast.

2. Synthetic job canary. Schedule a tiny job every minute that just writes a timestamp to a Redis key:

// app/Console/Kernel.php
protected function schedule(Schedule $schedule): void
{
    $schedule->job(new QueueCanaryJob())->everyMinute();
}

The job itself:

class QueueCanaryJob implements ShouldQueue
{
    public function handle(): void
    {
        Redis::set('queue:canary:last_run', now()->toIso8601String());
    }
}

Then have your uptime monitoring check that the value is less than, say, 5 minutes old. If the canary stops updating, you know Horizon isn't processing work regardless of what the dashboard says.

3. Job throughput SLO. If you normally process hundreds of jobs per hour, a dashboard showing zero throughput for 30 minutes should page you. Horizon exposes throughput via its metrics endpoint; plug it into whatever alerting you use.

The takeaway

Horizon's biggest strength — its hands-off, "just works" operational model — is also the reason this bug class is so painful. The abstraction is so clean that when the connection between "dispatcher" and "worker" silently breaks, there's no natural place to notice.

If you take one thing from this post: don't trust the dashboard alone. Measure the actual flow of work, not the health of the workers. A healthy worker listening on the wrong queue is the same as no worker at all.