Software and Services

Restoring ancient texts with AI

15 March 2022
A representation of how a deep neural network could help to restore damaged or incomplete ancient Greek texts. Source: DeepMind

DeepMind, the artificial intelligence (AI) wing of Alphabet, has developed a deep neural network that can restore the missing text of damaged inscriptions, identify its original location and help establish the date of when they were created.

Human writing changed humanity and the world forever. Giving the first detailed insight into the Mediterranean region, the Greeks began writing on stone, pottery and metal to document everything 2,500 years ago.

However, much of this text is incomplete and many inscriptions have been damaged over the centuries and moved from their original locations.

Modern techniques so far — such as radiocarbon dating — can’t be used on these materials because it makes the inscriptions difficult, and it is very time consuming.

As such, DeepMind created Ithaca in collaboration with the Department of Humanities of Ca' Foscari University of Venice, the Classics Faculty of the University of Oxford, and the Department of Informatics of the Athens University of Economics and Business.

The goal is to help historians better interpret these inscriptions to give a richer understanding of ancient history and further unlock how AI can be used to help historians restore ancient items from the past.

Ithaca is named after the Greek island in Homer’s Odyssey and builds on DeepMind’s Pythia, the previous system the company created focused on textual restoration. Ithaca achieves 62% accuracy in restoring damaged texts, 71% accuracy in identifying its original location and can date texts to within 30 years of the true date ranges.

DeepMind is working with Google Cloud and Google Arts & Culture to launch a free interactive version of Ithaca and has opened its source code, pre-trained models and developed an interactive Colaboratory notebook.

How it works

DeepMind trained Ithaca on the largest digital dataset of Greek inscriptions from the Packard Humanities Institute. These models are commonly trained using words because the order in which they appear in sentences and relationships provide extra context and meaning.

Since many of the inscriptions are damaged or missing chunks of text, the company trained Ithaca using both words and individual characters as inputs. The neural network then evaluates these inputs in parallel, allowing the technology to assess inscriptions as needed.

Additionally, DeepMind created visual aids to help the results be interpretable by historians. These include:

  • Restoration hypotheses: Ithaca generates several prediction hypotheses for the text restoration task for historians to choose from using their expertise.
  • Geographical attribution: Ithaca shows its uncertainty by giving historians a probability distribution over all possible predictions — instead of just a single output.
  • Chronological attribution: When dating a text, Ithaca produces a distribution of predicted dates across all decades from 800 BCE to 800 CE.
  • Saliency maps: To convey the results to historians, Ithaca uses a technique commonly employed in computer vision that identifies which input sequences contribute most to a prediction.

The results

Ithaca’s design decisions and visualization aids made it easier for historians to interpret results.

Working alone to restore ancient texts, historians achieved just 25% accuracy. But with Ithaca, the performance increased to 72%, which surpassed the model’s individual performance and shows potential for human-machine cooperation in historical interpretation, establishes dating of historical events and contributes to current methodological debates.

Ancient text dating

Historians disagree on the data of a series of important Athenian decrees made at the time of Socrates and Pericles. The decrees have long been thought to have been written before 446/445 BCE, but new evidence suggests a date of 420 BCE.

The Ithaca training data contains the earlier figure of 446/445 BCE, but the dataset did not contain the dated inscriptions and then submitted these held-out texts for analysis. Ithaca’s average predicted data for the decrees is 421 BCE, aligning with the most recent dating analysis. This shows how machine learning can contribute to debates around Greek history, DeepMind said.

The full research can be found in the journal Nature.

To contact the author of this article, email PBrown@globalspec.com


Powered by CR4, the Engineering Community

Discussion – 0 comments

By posting a comment you confirm that you have read and accept our Posting Rules and Terms of Use.
Engineering Newsletter Signup
Get the Engineering360
Stay up to date on:
Features the top stories, latest news, charts, insights and more on the end-to-end electronics value chain.
Advertisement
Weekly Newsletter
Get news, research, and analysis
on the Electronics industry in your
inbox every week - for FREE
Sign up for our FREE eNewsletter
Advertisement
Find Free Electronics Datasheets
Advertisement