Share this chat
Anyone with the link can view this conversation.
We need to improve the SEO of a site that has millions of pages. I want to give Google quality information so it can read each page cleanly and rank it, without getting lost. The hard part: the site is multi-language, we have millions of products, and sometimes a page contains almost nothing beyond the product name, the brand, and a short extract from the manual. What should we actually serve to Google?
With millions of pages, your real enemy is Google's thin-content spam filter: Google will assume you created the pages just to rank, and push all of them down in the results. So make every page clearly different, and give Google a clear structure.
- One clear hierarchy. A main page, a page per brand linking to its products, a page per product. One page per product, not one per manual, or you create near-duplicate pages.
- Remove pages with little unique content. Anything thin or duplicated hurts the ranking of the whole domain, so delete or merge those pages.
- Structured data in the head. Add JSON-LD describing the product so Google can read the page without guessing.
- Translate the whole document, properly. Each language needs a real, fully translated page declared as such, not a half-translated copy.
Structure first, content second: make the page extremely easy for Google to interpret.
I've done most of that. Here is what is live now:
- Full translation, with the structured information high in the HTML, in a layout Google reads easily.
- JSON-LD in the head, adapted to the data I actually have for each product.
- Product and manual images on the page.
I also added AI-generated descriptions: when someone visits a page, if I'm confident enough about the product data, I generate a richer text in that language, but lazily, to avoid wasting tokens. The text is generated on the first visit in a given language and saved for every visit after that.
Caching it is smart, but be careful about what triggers the generation. Googlebot crawls your pages very fast, so if any visit can start a generation, the crawler will trigger it on millions of pages at once, and your token bill will explode during the first crawl.
- Don't generate for bots. Generate only when a real human opens the product page, then serve the saved version to everyone after that, crawler included.
- Don't generate when the data is too uncertain. No description at all is better than a generic one that adds nothing.
The other thing to solve is your sitemap. With this many URLs you can't use a single sitemap file, and splitting it cleanly takes some thought. Let me look into the right structure.
I solved the sitemap myself. I built a sitemap index that points to child sitemaps split by the first letter of the brand, with products organized by brand and then by product. So one index references many letter-based sitemaps. And I put the translation alternates inside the sitemap itself, so I don't need a separate sitemap per language. It simplified everything, and Google parses it and keeps indexing at a steady pace.
Send an email to Eliott
Write your message here and it will open in your email app.
Send an email to Eliott
Message sent!
Eliott will get back to you soon.
Mind introducing yourself?
Just so Eliott knows who he's talking to. Only your name is required.
Claude is an AI and can make does make mistakes. Please triple-check responses.