HIPE-2026 is a public AI competition (a “shared task”) run as part of CLEF, where teams build systems that read old historical texts and answer a surprisingly tricky question:
“Which person is connected to which place — and when?”
Why this matters
Libraries and archives have millions of historical documents (newspapers, books, letters). Many are digitized using OCR, which often creates messy text (typos, broken words, old spelling). These texts are also multilingual (French, German, English, Luxembourgish) and come from different centuries.
Historians and researchers would love to automatically build:
- timelines of where someone lived or traveled,
- biographies,
- “who was where when?” maps,
- and knowledge graphs (databases of people, places, and facts).
But you can’t solve this just by checking if a person name and a place name appear in the same article. The text might mention a place for context without saying the person actually went there.
What HIPE-2026 asks AI systems to do
For each document, the system looks at pairs: (Person, Place).
Then it must decide two things:
1) at = “Has the person ever been at this place (before the article’s date)?”
This has three labels:
- true: the text clearly supports it
- probable: not directly stated, but strongly implied
- false: no evidence (or evidence against it)
2) isAt = “Is the person at this place around the publication time?”
This is binary:
- + yes, around that time
- – no evidence for that
So at is “in the past at any time,” and isAt is “specifically around the time of the document.”
A real example (in plain words)
If an article says:
“Colonel X, commanding officer of Myrtle Beach Air Force Base…”
That strongly connects the colonel to the base — so at = true and likely isAt = +.
But if the article only mentions “Myrtle Beach” as a city name, that might suggest a connection but not physical presence. So it might become at = probable and isAt = –.
The key idea: the system must understand context, time, and indirect clues.
What makes it hard
- Old OCR text is noisy (misspellings, broken formatting).
- Multiple languages and old writing styles.
- The number of person-place pairs can explode (many people × many places in one document).
- “Probable” requires reasonable inference, but not wild guessing.
How HIPE-2026 evaluates systems (3 angles)
HIPE-2026 isn’t only about being “most accurate.” It also tests whether systems are practical.
- Accuracy: How often the labels are correct (balanced across labels).
- Accuracy + Efficiency: Rewards systems that are good and not too expensive/heavy to run.
- Generalization: A surprise test set checks if the model still works on a different kind of text (e.g., older French literary texts).
Bottom line
HIPE-2026 pushes AI beyond simple “find names” tasks. It’s about extracting useful human meaning from historical documents: person–place links with time awareness. If successful, it can power better tools for historians, digital humanities research, and building large historical knowledge graphs.
source: https://arxiv.org/pdf/2602.17663