Unmasking the Alchemists: Using Bayesian Inference to Solve 17th-Century Authorship Disputes
Introduction
The early 17th century was a period of intellectual upheaval, defined in part by the “Rosicrucian Manifestos”—a series of anonymous pamphlets that ignited a firestorm across Europe. While these texts claimed to represent a secret brotherhood, historians have spent four centuries debating whether they were the work of a single mastermind or a collective of theologians, occultists, and scientists. Traditional stylistic analysis often relies on subjective interpretations, which are easily swayed by personal bias. However, by leveraging Bayesian inference, we can move beyond gut feeling and quantify the probability of distinct authorship with statistical rigor.
Bayesian authorship attribution is a powerful tool in digital humanities. It allows us to calculate the probability that a specific author wrote an anonymous work, given a set of known linguistic “fingerprints.” By updating our beliefs about authorship as we analyze new stylistic evidence, we can move from speculative history to a data-driven understanding of these elusive 17th-century texts.
Key Concepts
At its core, Bayesian inference is about updating the probability of a hypothesis as more evidence becomes available. In the context of authorship attribution, we are interested in two main components:
- The Prior Probability: This is our initial belief about who wrote the text before we look at the linguistic data. If we know that Johann Valentin Andreae lived in the same city where a pamphlet was printed, our prior probability for him is higher than for a scholar living in London.
- The Likelihood: This is the probability that a specific author would use a particular set of stylistic markers (such as function word frequency or punctuation patterns).
The Posterior Probability is the final result: the updated belief about the author’s identity after accounting for both the prior knowledge and the observed linguistic evidence. Unlike “frequentist” statistics, which look at long-term frequencies, Bayesian inference allows us to integrate historical context—an essential feature when dealing with the niche, highly curated language of 17th-century theological pamphlets.
Step-by-Step Guide
- Curate a Stylistic Corpus: Gather known works from suspected authors (e.g., Andreae, Fludd, Maier) and a control group of contemporary writers. Ensure your corpus is cleaned of modern annotations and normalized for spelling variations common in the early 1600s.
- Extract Linguistic Features: Focus on “stylometry markers”—elements writers use unconsciously. The most effective include the frequency of function words (e.g., “the,” “and,” “but,” “for”), common punctuation patterns, and the use of specific Latinate versus Germanic sentence structures.
- Calculate Feature Distributions: Use a model (such as a Burrows’ Delta or a Multinomial Naive Bayes classifier) to map the frequency of these markers for each known author. This creates a “signature” for each writer.
- Apply the Bayesian Model: Compare the anonymous Rosicrucian text against these signatures. For each feature, calculate the likelihood that it originated from Author A vs. Author B. Use these likelihoods to update your prior probability.
- Sensitivity Analysis: Test your model by running it on “known” anonymous texts from the same period. If your model correctly identifies the author of a known pamphlet, you gain confidence in its assessment of the actual Rosicrucian mystery.
Examples and Case Studies
Consider the Fama Fraternitatis, the first of the Rosicrucian manifestos. Historians have long debated if it was written solely by Johann Valentin Andreae. By applying Bayesian inference, researchers can examine the use of specific high-frequency “stop words”—words that are grammatically essential but semantically neutral.
If the model finds that the frequency of the word “und” (and) in the Fama matches the idiosyncratic frequency found in Andreae’s undisputed letters, but contradicts the usage patterns of Robert Fludd, the posterior probability shifts heavily toward Andreae. When we run this analysis across 50 distinct function words, the probability usually coalesces into a clear signal, effectively isolating a single stylistic fingerprint.
This method has been successfully used to solve long-standing literary debates, such as identifying the anonymous authors of the Federalist Papers. By applying the same logic to 17th-century pamphlet culture, we can distinguish between a single author mimicking multiple voices versus a legitimate circle of contributors.
Common Mistakes
- Over-reliance on Content Words: Beginners often count keywords like “alchemy” or “gold.” These are often dictated by the topic, not the author. Focus on function words (the, of, that) which authors use subconsciously.
- Ignoring Spelling Normalization: 17th-century printing was inconsistent. Failing to normalize text—turning “v” into “u” where appropriate or standardizing archaic spellings—will result in high error rates because the model will perceive these as unique words.
- Small Sample Sizes: Applying Bayesian models to a short paragraph is unreliable. Ensure your anonymous target text is long enough to provide a statistically significant sample of the author’s habitual vocabulary.
- The “Closed Set” Assumption: Assuming the author must be one of your suspects is a dangerous trap. Always include a “None of the Above” category in your model to account for the possibility that the true author is someone whose work is lost or not in your database.
Advanced Tips
To move your analysis to a professional level, consider Topic Modeling (Latent Dirichlet Allocation) combined with your Bayesian model. This allows you to strip away the “subject matter noise.” If two authors write about alchemy, they will use similar content words. Topic modeling identifies these clusters, allowing you to mathematically remove them so that the model focuses purely on the author’s unique “syntactic pulse.”
Furthermore, use Bootstrap Aggregation (Bagging). By running the model 1,000 times, each time slightly altering the input text, you can see how stable the result is. If the author identity fluctuates wildly, the result is inconclusive. If the identity remains consistent despite the variation, you have found a robust stylistic signature.
Conclusion
The mystery of the Rosicrucian pamphlets remains one of the most intriguing puzzles of the 17th century. By using Bayesian inference, we transition from reading these texts as myths to analyzing them as data. This process does not strip away the romance of the era; rather, it allows us to identify the real human beings—the scholars and dreamers—who penned these revolutionary words. Through careful feature selection, rigorous normalization, and iterative Bayesian updating, we can finally separate the historical signal from the centuries of noise, bringing clarity to an age of secrets.




Leave a Reply