Territoria tracks elected officials across many elections. Every time there's a new vote, for most people in the database I need to update their mandate, merge duplicates, match them back to who they were before, and handle name changes: sometimes a completely new name, sometimes a new double-barrelled surname when a woman marries. The official dataset only comes out a month and a half after the election, so in the meantime my clients need to update the database themselves from whatever they have: Facebook posts, random PDFs, WhatsApp messages. How would you approach that?

One thing decides the architecture: are you allowed to run this through AI APIs at all, given how sensitive political contact data is?

Yes, but privacy first. It has to run on European models, ideally Mistral.

Good, Mistral keeps it in the EU and off the big US providers. Build it as a pipeline with a human gate at the end, never a straight write to the live database:

  • Extract. Feed each source (Facebook post, PDF, WhatsApp export) to Mistral and pull a structured candidate record: name, role, mandate, dates, constituency. Messy in, normalized out.
  • Match. Fuzzy-match each candidate against existing people on name plus constituency plus previous mandate. Flag likely duplicates and name changes (the married-name and double-surname cases especially), and score the confidence of every match.
  • Validate. Nothing commits automatically. An intermediate review page proposes each change ("merge A into B", "update mandate", "rename"), shows the source snippet it came from, and a human confirms or corrects before it writes.

The extraction does the grunt work, the matching surfaces the risky calls, and the validation page keeps a person accountable for sensitive data, exactly where you don't want an AI deciding alone.

That's the right shape. The validation page is the part I was missing: it lets the AI do the heavy lifting while my clients stay in control of what actually changes.

Claude is an AI and can make does make mistakes. Please triple-check responses.