Skip to content
HotelSEO Lab
← The Lab
Hotel SEO Foundations

Crawl Budget and Indexing for Big Resort Sites

How larger resort and multi-property hotel sites can manage crawl budget, indexation, faceted URLs, and sitemaps so Google spends its time on pages that actually book rooms.

HotelSEO LabMarch 24, 2026 11 min

If you run a 15-room boutique inn, you can probably stop reading. Google will crawl your whole site before its coffee gets cold, and crawl budget is a problem you get to not have. Congratulations.

But if you run a 140-room resort, a multi-property group, or anything with a booking engine that quietly spawns a URL for every date, room type, and “adults plus children plus rate plan” combination, then buckle up. Your site is not 140 pages. It is closer to 140,000 pages, and almost all of them are garbage that Google is wasting its time on instead of crawling the pages that actually sell rooms.

That is what crawl budget is really about. Not some mystical SEO ranking factor. It is a time-and-attention problem. Let’s fix it.

What crawl budget actually is (in plain hotelier English)

Googlebot does not have infinite patience for your site. It decides roughly how many URLs it is willing to fetch from you in a given window, based on two things:

Multiply those together and you get the practical reality: a finite number of fetches per day. If 90 percent of those fetches land on junk URLs, your new spa package landing page or your updated rooms page can sit there for weeks, undiscovered, while Googlebot lovingly re-crawls ?check_in=2027-11-14&adults=2&sort=price_desc for the ten-thousandth time.

The mental model: crawl budget is Googlebot’s grocery budget. You want it buying ingredients for meals you actually serve (rooms, packages, offers, location pages), not filling the cart with 9,000 nearly identical cans of beans (parameter and filter URLs). Your job is to write the shopping list.

This is closely related to your site’s bone structure. If you have not read it yet, our guide to hotel website architecture that ranks covers how to lay out a property so the important pages sit shallow and well-linked. Crawl budget is what happens when that architecture goes sideways at scale.

How resort sites accidentally manufacture 100,000 junk URLs

Nobody sits down and decides to create a hundred thousand thin pages. It happens by accident, usually from four culprits.

1. The booking engine and date parameters

Your booking widget is the biggest offender. Every interaction can generate a crawlable link: ?check_in=, ?check_out=, ?adults=, ?children=, ?promo=, ?rate_plan=. If those URLs are reachable by a plain link (not just a form POST), Googlebot will find them, follow them, and try to index a near-infinite calendar. There is no last day on a calendar. Googlebot will crawl into the year 2099 if you let it.

2. Faceted navigation and filters

“Filter by: ocean view, king bed, swim-up, under 400 a night, pet friendly.” Lovely for guests. Each filter combination is often a unique URL, and the combinations multiply. Ten filters with a few options each is thousands of permutations, most of which return the same three rooms in a different order.

3. Multi-property duplication

Resort groups love templates. Property A’s “Dining” page and Property B’s “Dining” page share 80 percent of the same boilerplate. Add a “Things to Do” page per property that all pull from the same regional content, and you have duplication that splits ranking signals and burns crawls on pages that compete with each other.

4. Session IDs, tracking params, and print versions

?utm_source=, ?sessionid=, ?ref=, &print=true. Each one creates a “new” URL pointing at content Google already has. Multiply by every page on the site.

Here is roughly how the math runs on a mid-sized resort. These numbers are illustrative, not measured — but they show the shape of the problem:

URL typeReal, useful pagesURLs Googlebot can find
Rooms and suites1212
Packages and offers88
Location and dining1515
Booking date and rate params040,000+
Faceted filter combinations025,000+
Tracking and session URLs010,000+

Thirty-five useful pages. Seventy-five thousand crawlable URLs. You can see why your new package page is gathering dust.

Step one: see what Google is actually doing

Do not guess. Open the evidence first.

Google Search Console, Crawl Stats (Settings, then Crawl Stats). This shows total crawl requests over time, average response time, and a breakdown by file type, response code, and Googlebot type. Look for two red flags: a huge share of requests going to URLs with query strings, and a rising average response time, which signals your server is the bottleneck.

The Pages report (formerly Coverage). The numbers that matter:

If “Discovered, currently not indexed” is in the tens of thousands, Googlebot is drowning and triaging your site for you. Badly.

Run a crawl yourself. Point a crawler like Screaming Frog at your site and watch how many URLs it finds versus how many you intended to publish. If you intended 35 pages and the crawler is still discovering URLs at 60,000, congratulations, you found your leak. Server log analysis is the black-belt version of this — pull your access logs and count how many Googlebot hits land on parameter URLs versus real pages. It is tedious and it is the truth.

The single most common thing we see on big resort sites: nobody has ever looked at where Googlebot actually spends its time. Once you look, the fix is usually obvious and the wasted crawl is usually 70 to 90 percent of total activity.

Step two: stop manufacturing junk URLs

Now you turn off the tap. There is an order of operations here, and getting it wrong (looking at you, “noindex plus robots.txt block”) is how people make things worse.

Canonical tags — your default tool

A canonical tag tells Google “this URL is a variant of that URL, give the credit to that one.” For sorted, filtered, and parameter versions of a page that you still want crawled and consolidated, a self-referencing canonical on the clean version plus a canonical pointing to it from the messy versions is the right move. It does not save crawl budget directly — Google still fetches the page to read the canonical — but it stops the duplication from splitting your signals.

robots.txt disallow — for pure crawl waste

When a URL pattern has zero SEO value and you never want it fetched, block it in robots.txt. The classic targets:

User-agent: *
Disallow: /*?sort=
Disallow: /*?sessionid=
Disallow: /*&print=
Disallow: /search
Disallow: /*?check_in=

This is how you stop Googlebot from crawling the infinite calendar. The catch: robots.txt stops crawling, not indexing. A blocked URL can still appear in results as a bare link if it has external inbound links. And critically — if you block a URL in robots.txt, Google cannot crawl it, which means it cannot see a noindex tag on it. The two tools do not stack.

noindex — for pages you want crawled but not ranked

<meta name="robots" content="noindex, follow"> (written out: a robots meta tag set to noindex, follow) tells Google “drop this from the index but keep following its links.” Use it for thin pages that must stay crawlable — like a filter result you want Google to pass through but not rank. The page stays crawlable, so Google can keep reading the directive.

Here is the decision rule, the one most people get wrong:

Want it consolidated, not separately ranked? Canonical. Pure waste you never want fetched again? robots.txt disallow. Want it crawled and link-followed but kept out of the index? noindex. Never combine noindex with a robots.txt block on the same URL — Google cannot read a noindex it is not allowed to crawl.

Handle parameters at the source

The cleanest fix is upstream. Configure your booking engine so that date and rate selections happen via form submission and JavaScript state, not crawlable <a href> links. If Googlebot cannot find the parameter URL as a link in the first place, you never have to clean it up. Fewer links to the infinite calendar means fewer trips down it.

Step three: build sitemaps that double as a control panel

Your XML sitemap is not just a list of URLs. For a big site it is how you tell Google your priorities and how you measure indexation per segment.

Rules that matter for resort and multi-property sites:

The payoff: a clean, segmented sitemap turns “is my site indexed?” from a vague anxiety into a dashboard you can read in 90 seconds.

Step four: spend the reclaimed crawl on pages that book rooms

Cutting junk is only half the win. The other half is making sure your good pages are easy to reach and worth crawling often.

Why this connects to the bigger picture

Crawl budget feels like deep plumbing, and it is. But it ties straight back to the thing you actually care about: getting found for your own name and your own offers instead of ceding that ground. When Googlebot wastes its budget on parameter sludge, your branded and package pages get crawled and refreshed slowly — which is one quiet reason a hotel can rank below the OTAs for its own name. The OTAs have armies of engineers keeping their crawl efficiency razor-sharp. Your booking engine, left to its defaults, is doing the opposite.

This is not about beating the OTAs at their own game — they will outspend you on technical infrastructure every day of the week. It is about not handing them an easy advantage. A clean, crawl-efficient site means Google sees your pages fresh and complete, which is the precondition for winning back more direct bookings and clawing back the 15 to 25 percent commission you hand over on every OTA reservation. The math on that margin is exactly why we wrote up how OTAs win the search game and how a healthier booking mix starts with the basics being right.

If you are newer to all this, our hotel SEO 2026 starter guide is the gentler on-ramp before you go log-file diving.

The 30-minute audit you can run today

You do not need a consultant to find out if you have a problem. Here is the short version:

  1. Open Search Console, Crawl Stats. Note total crawl requests and how many hit query-string URLs.
  2. Open the Pages report. Read your “Discovered, currently not indexed” and “Crawled, currently not indexed” counts.
  3. Add ?test=1 style parameters and check whether your filters and booking widget create crawlable links. View source and search for ?check_in in your <a href> links.
  4. Open your sitemap. Count URLs. Check whether it includes any redirects, noindex pages, or parameter URLs (it should not).
  5. Pick the single biggest leak — usually the booking engine parameters — and fix that one first.

If those five steps turn up tens of thousands of junk URLs and a tiny fraction of your real pages indexed, you have found a meaningful, fixable problem — and probably a chunk of direct-booking revenue that Google has been too distracted to surface.

Let’s read your logs

Crawl budget and indexation work is unglamorous, high-leverage, and exactly the kind of technical SEO that gets ignored on big resort sites until traffic mysteriously plateaus. If your property has thousands of URLs and you are not sure how many of them Google is actually wasting time on, that is a one-afternoon diagnosis with a real payoff.

We do this for independent and boutique resort groups all day — see our hotel SEO service for how the technical audit fits into the bigger direct-booking picture, check pricing to see what fits your group, or just book a call and we will tell you straight whether crawl budget is your problem or a distraction from a bigger one.

FAQ

Quick answers

Does crawl budget actually matter for a small boutique hotel?

Usually not. If your site is under a few thousand URLs, Google can crawl all of it easily and crawl budget is a non-issue. It becomes real once you have a multi-property resort, a big events calendar, or a booking widget that spawns thousands of dated and filtered URLs.

What is the fastest way to see crawl problems on a resort site?

Open Google Search Console, go to the Pages report and the Crawl Stats report under Settings. If you see tens of thousands of Discovered but not indexed or Crawled but not indexed URLs, or Googlebot is hammering parameter URLs, you have a crawl and indexation problem worth fixing.

Should I block faceted or parameter URLs with robots.txt or noindex?

It depends on the goal. Use noindex plus a canonical when you still want the page crawled and consolidated. Use robots.txt disallow when the URLs are pure crawl waste you never want fetched, like internal search and infinite calendar links. Do not noindex a page you also block in robots.txt, because Google cannot read the noindex if it cannot crawl the page.

How many sitemaps should a multi-property hotel group have?

Split by property and by page type so each sitemap stays under 50,000 URLs and you can read indexation per segment. A common setup is one sitemap index that references child sitemaps for each property, plus shared sitemaps for blog content and landing pages.

Free intro call

Let's go find out why the OTAs are outranking you for your own name.

20 free minutes. We'll look at your hotel live, show you where you're invisible — on Google and in the AI answers — and tell you straight whether we can help.

No lock-in · No 12-month handcuffs · You talk to the strategist