Artificial intelligence (AI) is transforming the study of ancient texts, from deciphering inscriptions inaccessible for 2,000 years to decoding unknown hieroglyphs.
The examination of ancient texts has long been a meticulous task reserved for specialists in paleography, linguistics, and history. However, AI is revolutionizing this field with its ability to process vast amounts of data and learn complex patterns.
A recent article in Nature discusses how certain AI models lead this revolution, achieving unprecedented advances in interpreting ancient texts and becoming impactful tools in historical studies.
While computational technologies for text analysis are not new, traditional Optical Character Recognition (OCR) faced significant limitations with ancient texts due to irregular handwriting, material wear, and unique linguistic contexts.
The breakthrough came with machine learning, a branch of AI allowing algorithms to learn from data rather than follow predefined rules. This method trains systems with large data volumes, enabling them to identify patterns and make predictions. Yet, even this approach struggled with fragmented texts or languages no longer spoken.
Deep learning has expanded possibilities significantly. This technique employs artificial neural networks inspired by the human brain, analyzing data with unprecedented complexity. In the realm of ancient texts, neural networks not only recognize letters and words but also learn linguistic and cultural contexts, enhancing accuracy and versatility.
An example is the Pythia model, developed for interpreting ancient Greek inscriptions. Pythia was trained on over 35,000 transcribed Greek inscriptions, allowing it to learn writing patterns and linguistic structures.
AI applied to historical texts combines several key technologies, resulting in a comprehensive and efficient tool for challenging interpretations.
Advanced optical recognition and computer vision technologies have overcome limitations of traditional OCR systems. These tools analyze physical characteristics of texts, such as ink type, brush strokes, or wear marks, crucial for interpreting damaged documents or inscriptions on irregular materials like stone or ceramics.
Generative models and Generative Adversarial Networks (GANs) utilize two neural networks working together: one generates hypotheses (e.g., missing words or letters), while the other evaluates their quality. These tools are particularly useful for reconstructing incomplete texts by proposing multiple solutions based on historical and linguistic contexts.
Natural Language Processing (NLP) models are also key in ancient texts. They identify words and analyze phrase meanings and their cultural context, aiding in translating extinct or poorly documented languages, such as Phoenician and Linear A.
The combination of these technologies profoundly impacts archaeology and history, particularly in restoring damaged manuscripts. Documents previously unreadable due to deterioration, like a burned Roman manuscript inaccessible for 2,000 years, can now be analyzed by AI, which detects minimal ink traces and proposes complete reconstructions.
Additionally, AI enables the decoding of dead languages directly from unknown texts, identifying grammatical and syntactical patterns without prior translations.
For fragmented inscriptions found at archaeological sites, algorithms can reconstruct missing words with unprecedented precision, revealing unexpected historical connections between seemingly isolated cultures. By analyzing large data sets, algorithms have identified surprising similarities among texts from different civilizations, suggesting greater interconnectivity than previously thought.