If you run a 15-room boutique inn, you can probably stop reading. Google will crawl your whole site before its coffee gets cold, and crawl budget is a problem you get to not have. Congratulations.
But if you run a 140-room resort, a multi-property group, or anything with a booking engine that quietly spawns a URL for every date, room type, and “adults plus children plus rate plan” combination, then buckle up. Your site is not 140 pages. It is closer to 140,000 pages, and almost all of them are garbage that Google is wasting its time on instead of crawling the pages that actually sell rooms.
That is what crawl budget is really about. Not some mystical SEO ranking factor. It is a time-and-attention problem. Let’s fix it.
What crawl budget actually is (in plain hotelier English)
Googlebot does not have infinite patience for your site. It decides roughly how many URLs it is willing to fetch from you in a given window, based on two things:
- Crawl capacity — how fast your server responds without falling over. Slow, wheezing server, fewer crawls.
- Crawl demand — how much Google wants your pages, based on popularity and how often they change.
Multiply those together and you get the practical reality: a finite number of fetches per day. If 90 percent of those fetches land on junk URLs, your new spa package landing page or your updated rooms page can sit there for weeks, undiscovered, while Googlebot lovingly re-crawls ?check_in=2027-11-14&adults=2&sort=price_desc for the ten-thousandth time.
The mental model: crawl budget is Googlebot’s grocery budget. You want it buying ingredients for meals you actually serve (rooms, packages, offers, location pages), not filling the cart with 9,000 nearly identical cans of beans (parameter and filter URLs). Your job is to write the shopping list.
This is closely related to your site’s bone structure. If you have not read it yet, our guide to hotel website architecture that ranks covers how to lay out a property so the important pages sit shallow and well-linked. Crawl budget is what happens when that architecture goes sideways at scale.
How resort sites accidentally manufacture 100,000 junk URLs
Nobody sits down and decides to create a hundred thousand thin pages. It happens by accident, usually from four culprits.
1. The booking engine and date parameters
Your booking widget is the biggest offender. Every interaction can generate a crawlable link: ?check_in=, ?check_out=, ?adults=, ?children=, ?promo=, ?rate_plan=. If those URLs are reachable by a plain link (not just a form POST), Googlebot will find them, follow them, and try to index a near-infinite calendar. There is no last day on a calendar. Googlebot will crawl into the year 2099 if you let it.
2. Faceted navigation and filters
“Filter by: ocean view, king bed, swim-up, under 400 a night, pet friendly.” Lovely for guests. Each filter combination is often a unique URL, and the combinations multiply. Ten filters with a few options each is thousands of permutations, most of which return the same three rooms in a different order.
3. Multi-property duplication
Resort groups love templates. Property A’s “Dining” page and Property B’s “Dining” page share 80 percent of the same boilerplate. Add a “Things to Do” page per property that all pull from the same regional content, and you have duplication that splits ranking signals and burns crawls on pages that compete with each other.
4. Session IDs, tracking params, and print versions
?utm_source=, ?sessionid=, ?ref=, &print=true. Each one creates a “new” URL pointing at content Google already has. Multiply by every page on the site.
Here is roughly how the math runs on a mid-sized resort. These numbers are illustrative, not measured — but they show the shape of the problem:
| URL type | Real, useful pages | URLs Googlebot can find |
|---|---|---|
| Rooms and suites | 12 | 12 |
| Packages and offers | 8 | 8 |
| Location and dining | 15 | 15 |
| Booking date and rate params | 0 | 40,000+ |
| Faceted filter combinations | 0 | 25,000+ |
| Tracking and session URLs | 0 | 10,000+ |
Thirty-five useful pages. Seventy-five thousand crawlable URLs. You can see why your new package page is gathering dust.
Step one: see what Google is actually doing
Do not guess. Open the evidence first.
Google Search Console, Crawl Stats (Settings, then Crawl Stats). This shows total crawl requests over time, average response time, and a breakdown by file type, response code, and Googlebot type. Look for two red flags: a huge share of requests going to URLs with query strings, and a rising average response time, which signals your server is the bottleneck.
The Pages report (formerly Coverage). The numbers that matter:
- Crawled, currently not indexed — Google fetched it and decided it was not worth indexing. Often thin or duplicate.
- Discovered, currently not indexed — Google knows the URL exists but has not bothered crawling it. This is the smoking gun for crawl budget problems. Google is rationing.
- Duplicate, Google chose a different canonical — your parameter and filter URLs colliding.
If “Discovered, currently not indexed” is in the tens of thousands, Googlebot is drowning and triaging your site for you. Badly.
Run a crawl yourself. Point a crawler like Screaming Frog at your site and watch how many URLs it finds versus how many you intended to publish. If you intended 35 pages and the crawler is still discovering URLs at 60,000, congratulations, you found your leak. Server log analysis is the black-belt version of this — pull your access logs and count how many Googlebot hits land on parameter URLs versus real pages. It is tedious and it is the truth.
The single most common thing we see on big resort sites: nobody has ever looked at where Googlebot actually spends its time. Once you look, the fix is usually obvious and the wasted crawl is usually 70 to 90 percent of total activity.
Step two: stop manufacturing junk URLs
Now you turn off the tap. There is an order of operations here, and getting it wrong (looking at you, “noindex plus robots.txt block”) is how people make things worse.
Canonical tags — your default tool
A canonical tag tells Google “this URL is a variant of that URL, give the credit to that one.” For sorted, filtered, and parameter versions of a page that you still want crawled and consolidated, a self-referencing canonical on the clean version plus a canonical pointing to it from the messy versions is the right move. It does not save crawl budget directly — Google still fetches the page to read the canonical — but it stops the duplication from splitting your signals.
robots.txt disallow — for pure crawl waste
When a URL pattern has zero SEO value and you never want it fetched, block it in robots.txt. The classic targets:
User-agent: *
Disallow: /*?sort=
Disallow: /*?sessionid=
Disallow: /*&print=
Disallow: /search
Disallow: /*?check_in=
This is how you stop Googlebot from crawling the infinite calendar. The catch: robots.txt stops crawling, not indexing. A blocked URL can still appear in results as a bare link if it has external inbound links. And critically — if you block a URL in robots.txt, Google cannot crawl it, which means it cannot see a noindex tag on it. The two tools do not stack.
noindex — for pages you want crawled but not ranked
<meta name="robots" content="noindex, follow"> (written out: a robots meta tag set to noindex, follow) tells Google “drop this from the index but keep following its links.” Use it for thin pages that must stay crawlable — like a filter result you want Google to pass through but not rank. The page stays crawlable, so Google can keep reading the directive.
Here is the decision rule, the one most people get wrong:
Want it consolidated, not separately ranked? Canonical. Pure waste you never want fetched again? robots.txt disallow. Want it crawled and link-followed but kept out of the index? noindex. Never combine noindex with a robots.txt block on the same URL — Google cannot read a noindex it is not allowed to crawl.
Handle parameters at the source
The cleanest fix is upstream. Configure your booking engine so that date and rate selections happen via form submission and JavaScript state, not crawlable <a href> links. If Googlebot cannot find the parameter URL as a link in the first place, you never have to clean it up. Fewer links to the infinite calendar means fewer trips down it.
Step three: build sitemaps that double as a control panel
Your XML sitemap is not just a list of URLs. For a big site it is how you tell Google your priorities and how you measure indexation per segment.
Rules that matter for resort and multi-property sites:
- One sitemap maxes out at 50,000 URLs and 50MB uncompressed. Past that, you split and use a sitemap index file that points to the children.
- Split by property and page type. A sitemap index referencing
/sitemaps/property-a-rooms.xml,/sitemaps/property-b-rooms.xml,/sitemaps/blog.xml,/sitemaps/offers.xmllets you open Search Console and see exactly which segment is under-indexed. If Property B’s rooms sitemap shows 4 of 12 indexed, you know precisely where to dig. - Only include canonical, indexable, 200-status URLs. No redirects, no noindex pages, no parameter junk. A sitemap full of non-indexable URLs trains Google to trust your sitemap less.
- Keep lastmod honest. If you stamp every page with today’s date on every deploy, Google learns to ignore the field. Update lastmod only when the content genuinely changes, and Google will lean on it to crawl your updated pages faster.
The payoff: a clean, segmented sitemap turns “is my site indexed?” from a vague anxiety into a dashboard you can read in 90 seconds.
Step four: spend the reclaimed crawl on pages that book rooms
Cutting junk is only half the win. The other half is making sure your good pages are easy to reach and worth crawling often.
- Flatten your click depth. A page buried six clicks from the homepage gets crawled rarely. Your money pages — rooms, top packages, primary location pages — should be reachable in two or three clicks. This is the architecture work again, and it directly affects how often Google refreshes those pages.
- Internal-link your important pages hard. Crawl frequency follows internal links. If your spa package is linked from the homepage, the rooms pages, and the offers hub, Google treats it as important and re-crawls it more.
- Fix your server speed. Crawl capacity scales with response time. A faster server literally earns you more crawls. This is the same work that wins you guests — see hotel page speed and direct bookings for why a fast site converts and gets crawled more.
- Consolidate duplicate multi-property content. Where Property A and Property B have near-identical pages, differentiate them with genuinely local content or consolidate where it makes sense. Two thin pages competing is worse than one strong page.
Why this connects to the bigger picture
Crawl budget feels like deep plumbing, and it is. But it ties straight back to the thing you actually care about: getting found for your own name and your own offers instead of ceding that ground. When Googlebot wastes its budget on parameter sludge, your branded and package pages get crawled and refreshed slowly — which is one quiet reason a hotel can rank below the OTAs for its own name. The OTAs have armies of engineers keeping their crawl efficiency razor-sharp. Your booking engine, left to its defaults, is doing the opposite.
This is not about beating the OTAs at their own game — they will outspend you on technical infrastructure every day of the week. It is about not handing them an easy advantage. A clean, crawl-efficient site means Google sees your pages fresh and complete, which is the precondition for winning back more direct bookings and clawing back the 15 to 25 percent commission you hand over on every OTA reservation. The math on that margin is exactly why we wrote up how OTAs win the search game and how a healthier booking mix starts with the basics being right.
If you are newer to all this, our hotel SEO 2026 starter guide is the gentler on-ramp before you go log-file diving.
The 30-minute audit you can run today
You do not need a consultant to find out if you have a problem. Here is the short version:
- Open Search Console, Crawl Stats. Note total crawl requests and how many hit query-string URLs.
- Open the Pages report. Read your “Discovered, currently not indexed” and “Crawled, currently not indexed” counts.
- Add
?test=1style parameters and check whether your filters and booking widget create crawlable links. View source and search for?check_inin your<a href>links. - Open your sitemap. Count URLs. Check whether it includes any redirects, noindex pages, or parameter URLs (it should not).
- Pick the single biggest leak — usually the booking engine parameters — and fix that one first.
If those five steps turn up tens of thousands of junk URLs and a tiny fraction of your real pages indexed, you have found a meaningful, fixable problem — and probably a chunk of direct-booking revenue that Google has been too distracted to surface.
Let’s read your logs
Crawl budget and indexation work is unglamorous, high-leverage, and exactly the kind of technical SEO that gets ignored on big resort sites until traffic mysteriously plateaus. If your property has thousands of URLs and you are not sure how many of them Google is actually wasting time on, that is a one-afternoon diagnosis with a real payoff.
We do this for independent and boutique resort groups all day — see our hotel SEO service for how the technical audit fits into the bigger direct-booking picture, check pricing to see what fits your group, or just book a call and we will tell you straight whether crawl budget is your problem or a distraction from a bigger one.