Scraping hiring signals with Apify + LinkedIn

Hiring signals are the cleanest outbound trigger I have. Someone posting a specific role is telling the world, in advance, that a budget exists, a pain exists, and a hiring manager is actively looking at adjacent solutions. The list it produces is smaller than a flat ICP pull but converts at 3–4x the rate, reliably.

The problem is getting fresh signals at scale without spending half your week on manual LinkedIn searches. This is the Apify setup I use to pull 200–500 qualified hiring signals per week on autopilot, with the filters, rate limits, and dedupe logic that keeps it stable.

Nothing in here is a secret. What makes it work is the discipline around which jobs you keep and how you post-process them, not the scraping itself.

Upfront honesty on the gray zone: LinkedIn’s terms of service do not love scraping. Apify’s actors for LinkedIn run in a gray area — they work because they mimic logged-in user behaviour at human rate, not because LinkedIn explicitly permits it. Use a dedicated account you wouldn’t mind losing. Don’t scrape at volumes that would alarm any reasonable person looking at account activity. If your business model depends on LinkedIn data, plan the risk accordingly.

What I actually want out of each run

Before I configure a single actor, I write down what a “good row” looks like and what I’ll do with it:

Company: name, domain (resolved separately), size band, industry, country.
Role: title (exact), seniority (Senior/Director/VP/C-level), function (Sales/Marketing/Engineering/Ops), posting date, location, remote/hybrid/onsite.
Signal context: time since posting (I drop anything > 21 days old), number of applicants if available, whether reposted.

I drop every row that doesn’t meet all three of:

Company size is inside my ICP band (I don’t care about hiring signals at companies I wouldn’t sell to anyway).
Role is one I can match to a recognizable tradeoff I can open with (see Signal-based openers).
Posted in the last 21 days.

Everything else is noise. A list of 300 filtered rows beats a list of 3,000 raw rows, and the per-row enrichment cost downstream is a lot cheaper on the small list.

Days 1–2: Set up the Apify account and actor

Step 1 — Create a dedicated Apify account

Business plan or above, so you can run scheduled actors and retain results for 30+ days.
Create an API token specifically for your outbound workflow — not the default one. Makes it easier to revoke if something goes sideways.

Step 2 — Pick the actor

The two I rotate between:

LinkedIn Jobs Scraper (official-ish, most actors named some variant of this). Pulls public job postings without logged-in access. Volume is capped per run but the results are cleanest.
LinkedIn Profile Scraper paired with a job-search URL. More flexibility on filters, but requires logged-in cookies and carries higher block risk.

For 90% of my work I use the first one. The cookie-based route is a last resort when I need filters the public API doesn’t expose (e.g. pinpoint-filtering by applicant count).

Step 3 — Configure a dedicated LinkedIn account (only if needed)

If you go the cookie-based route:

Age the account at least 3 months before using it for scraping.
Connect to 200+ people in your target industry before first scrape, so the account looks like a normal user to LinkedIn’s risk systems.
Use a residential proxy region that matches the account’s stated location. Mismatched geo is one of the top triggers for soft blocks.

Days 2–3: Build the job-search URL set

The core input to the actor is a list of LinkedIn job-search URLs. Each URL represents one query — a combination of keyword, location, time posted, and filters.

My filter discipline

I build one URL per (role_family × region × seniority_band) combination. Example for a US-focused ICP targeting revenue leaders at SMBs:

VP Sales + United States + last 7 days + Associate/Senior/Director/Executive seniority
Head of Sales + United States + last 7 days + Director/Executive seniority
Director of Sales + United States + last 7 days + Director seniority
Revenue Operations + United States + last 14 days + Associate/Senior/Director seniority

That’s four URLs. I’ll typically have 12–25 URLs running in a single schedule.

Why not one big URL

Each URL has a cap (LinkedIn returns roughly 1,000 results max per search). A single broad URL returns the same big-name companies repeatedly and misses the long tail. Ten narrower URLs cover more ground for the same scrape budget.

What not to filter on

Company size. LinkedIn’s size filter is noisy and often mislabeled. I pull size from a downstream enrichment step (Apollo, Clay) instead.
“Easy Apply” filter. Easy Apply postings over-index toward specific ATS providers and give you a biased view of the market. Skip this filter.

Days 3–4: Schedule, rate-limit, retry

Schedule

One run per URL per day, at a staggered time. Actors running back-to-back from the same account get flagged faster than the same total volume spread across the day.

6 am UTC: region A URLs
10 am UTC: region B URLs
2 pm UTC: executive seniority URLs (lower volume, okay to run together)
6 pm UTC: ops/individual-contributor URLs

Rate limits

Apify lets you set concurrency and max-requests-per-minute. I keep:

Max 1 concurrent request per actor run.
20–30 requests/minute ceiling. Anything faster and LinkedIn’s bot detection picks it up inside a week.
Between-job delay of 2–4 seconds, randomised. Uniform delay patterns are themselves a detection signal.

Retries and dead-letter queue

Each actor run either succeeds, partially succeeds, or gets blocked mid-run. Partial success is the tricky case — you got some data but not all, and the gap isn’t at the end of the list.

My setup:

If an actor returns < 30% of expected volume: retry once, 4 hours later, from a different IP. If the retry also underproduces, mark the URL as “temporarily blocked” and skip for 48h.
If two consecutive runs fail from the same account cookie: rotate to a backup account. This doesn’t happen often, but when it does, you want the swap to be automatic, not a fire drill.

Days 4–5: Normalise, dedupe, enrich

Scraped data is dirty by default. Before anything touches your outbound sequencer, it runs through these five steps.

Step 1 — Normalise titles

LinkedIn titles are free-form. “VP of Sales”, “VP Sales”, “Vice President, Sales”, “Sales VP” all mean the same thing for your purposes. I maintain a lookup table that maps 200+ variants down to ~15 canonical titles. Every new variant I see gets added.

Skipping this step means your “VP Sales” list is actually 15 title clusters and you under-count your own coverage.

Step 2 — Resolve company to domain

LinkedIn gives you company name, not domain. I use Clearbit Autocomplete (free tier works for < 500/day) to go from company name to domain. Success rate is ~85% for US companies, ~70% for EU, ~50% for SMBs under 50 employees.

For unresolved rows: drop them. I tried manual resolution early on; the time cost vs. yield didn’t justify it.

Step 3 — Dedupe

Dedupe on (company_domain, role_family, posting_date_week). A company posting three similar SDR roles in the same week is one signal, not three. Dedupe before enrichment or you’ll pay for the same company three times.

Step 4 — Enrich to a person

The hiring signal is about the company. But your outbound needs a person. The standard enrichment step:

If the posting lists a hiring manager, use them.
If not, find the highest-ranking person in the relevant department at that company (VP Sales for a sales hire, CMO for a marketing hire, CEO for a first-of-role at a small company).
Cross-reference their tenure: if they started in the last 90 days, they’re the priority target (new leaders are more receptive to outbound and more likely to make moves).

Step 5 — Score

Each row gets a 1–5 priority score based on:

Signal freshness (1–7 days = 5, 8–14 days = 4, 15–21 days = 3, older = drop).
Signal strength (first-of-role hire = 5, replacement hire = 4, team expansion = 3).
Company fit (inside core ICP = 5, adjacent = 3).
Target role seniority (decision-maker = 5, influencer = 4, end-user = 2).

Average score across the four. Only rows at 4.0+ go into the active outbound queue. Rows at 3.0–3.9 go into a secondary nurture queue. Rows under 3.0 get dropped.

Days 5+: Maintain, monitor, iterate

Monitor these three numbers weekly

Post-dedupe, post-enrichment yield per 100 scraped rows. Healthy is 25–35 usable rows. If you’re hitting 10–15, your filters are too broad or your dedupe is too aggressive.
Enrichment fill rate on the scraped list. Should match your usual waterfall fill rate (55–65%). If it’s noticeably lower, the companies you’re scraping are too small for your standard enrichment stack — swap providers.
Reply rate on signal-based sends vs. non-signal sends. If signal-sourced rows aren’t outperforming the baseline by at least 2x, either the signals are stale by the time you send, or your opener isn’t using the signal. Both fixable.

Red flags

Sudden yield drop across all URLs. LinkedIn changed something on the page structure. Check Apify’s actor changelog; there’s usually an update within 48h. Until then, scraping is broken.
Yield normal but enrichment fill collapses. Usually means the scraped companies have shifted — maybe you picked up a wave of hiring signals from micro-companies that your enrichment providers don’t cover. Adjust the company-size filter upstream.
Reply rate drops on signal-sourced sends. Often means the market noticed the signal before you did. If LinkedIn returned the same posting for 30 other salespeople, the hiring manager has already heard from 30 of them. Tighten the freshness window (21 days → 7 days).

When to stop scraping and buy a feed instead

Apify for LinkedIn works up to about 1,500–2,000 rows per week. Above that, you’re either spending 5+ hours/week babysitting blocked actors, or you’re running so many accounts the risk profile stops making sense.

At that volume, move to a commercial hiring-signal feed — Ocean.io, Predictive Hire Signal, or the hiring-intent feeds from the major data providers. They’re not cheap (typically $1,500–5,000/month for a real feed), but the time and risk you free up is usually worth it.

Most B2B outbound programs never need more than 500 hiring signals per week to saturate their tier-1 and tier-2 account lists. If you’re still under that, Apify is the right tool.

Final checklist

Before I ship a scheduled scraping workflow:

[ ] Dedicated LinkedIn account(s), aged 3+ months, connected into relevant network.
[ ] 10–25 job-search URLs covering the ICP × seniority × region matrix.
[ ] Staggered daily schedule, max 30 req/min, randomised delays.
[ ] Retry + dead-letter logic on partial failures.
[ ] Title-normalisation lookup maintained and growing.
[ ] Company-to-domain resolution step with >70% fill.
[ ] Dedupe on (domain, role_family, week).
[ ] Person-enrichment step with priority logic for new leaders.
[ ] 4-factor scoring, only 4.0+ goes into active queue.
[ ] Weekly monitor on yield, fill rate, and signal-based reply lift.

What comes next

A working hiring-signal feed is an enormous upgrade over cold ICP lists, but it’s still the input to the outbound machine. What happens to those rows after they land — the opener you pair with each signal, the bridge you build to your offer, the speed with which you send — is where the conversion rate lives. The signal gets you through the opener. Everything downstream is still your problem.