Share this chat
Anyone with the link can view this conversation.
We need to improve the SEO of every page, and we have a lot of them. I want to feed Google quality information so it scrapes the page cleanly and ranks it, without getting lost. The catch: we're multi-language, we have millions of products, and sometimes there's almost nothing beyond the name, the brand, and a bit from the manual. What should we actually serve Google?
With millions of pages, your real enemy is the thin-content spam filter: Google will assume you spun up pages just to rank, and bury the lot. So differentiate hard and give it structure.
- One clear hierarchy. A main page, a page per brand linking to its products, a page per product. One page per product, not one per manual, or you create near-duplicates.
- Kill low-differentiation pages. Anything thin or duplicated drags the whole domain down, so prune or merge it.
- Structured data in the head. Add JSON-LD describing the product so Google reads the page without guessing.
- Translate the whole document, properly. Each language needs a real, fully translated page declared as such, not a half-translated copy.
Structure first, content second: make the page trivially easy for Google to interpret.
Done most of that. I shipped:
- Full translation, with the structured info high in the HTML, in a layout Google reads easily.
- JSON-LD in the head, adapted to whatever data I actually have for the product.
- Product and manual images on the page.
I also added AI-generated descriptions: on a page visit, if I'm confident enough about the product, I generate a richer text in that language, but lazily, to avoid burning tokens. It's generated on the first visit in a given language and saved for every visit after.
Smart to cache it, but watch the trigger. Googlebot crawls your pages fast, so if a visit generates the description, the crawler will set off generation across millions of cold pages at once, and your token bill spikes on the first crawl.
- Don't generate for bots. Generate only when a real human opens the product page, then serve the saved version to everyone after, crawler included.
- Skip the low-confidence ones. No description beats a generic, low-signal one.
The other piece is your index. With this many URLs you can't ship a single sitemap, but splitting it cleanly takes some thought. Let me look into the right structure.
I worked out the sitemap myself. I built a sitemap index that points to child sitemaps split by the first letter of the brand, with products organized by brand then product. So one index references many letter-sitemaps. And I put the translation alternates inside the sitemap itself, so I don't need a separate sitemap per language. It simplified everything, and Google parses it and keeps indexing steadily.
Claude is an AI and can make does make mistakes. Please triple-check responses.