I’ve worked professionally with databases for a living for around 20 years, to varying degrees.
I’ve worked with words professionally for roughly the same amount of time (as an author, editor, and publisher).
It’s only natural that I’d be interested in ways of mashing them together.
If you’d like to learn more about text analytics, text mining, unstructured data mining, and several other synonymous terms for turning a big pile of words into meaningful data, here are some good resources.
- Predictive Analytics Today lists the top software and the top free software for text mining.
- KD Nuggets similarly lists free and paid mining tools.
- The MIT Review boils the most popular fiction down to just six plots. There’s an interactive analyzer here.
If you want to sort piles of words into good/bad, happy/sad, calm/mad, and such, that’s where sentiment analysis comes into play.
- MashApe lists some great APIs to plug into your data.
- Liz Rush reveals code to use sentiment APIs (and other cool algorithms).
SQL Server and R
Most of my database career has been using Microsoft SQL Server. I’m at the beginning stages of learning R, a data science language.
- Microsoft’s Cognitive Services (available through the Azure Data Market) includes a few text analytics APIs.
- R can be used for many text mining purposes.
- MS SQL Tips has a step-by-step beginner’s guide for text mining in SQL.
- The SQL Bits conference has this video of Dejan Sarka‘s walkthrough of text analysis in SQL Server 2014.
- This SQL Server Data Mining site is chock-full of links and examples.
Mining the Bible
As a case study, I’ve imported dozens of translations of the Bible into SQL Server, where I can look for correlations. It’s an interesting text to work with, since all these translations (a) started from the original Hebrew and Greek, (b) are written in English, and (c) have passages uniquely identified through a numbering system. That allows for some intense analysis.
Here is some great work that other people have already done in that field.
- Chris Harrison has turned all the cross-references into this beautiful rainbow of links.
- Open Bible has run the Bible through sentiment analysis to find the positive / negative blow of the Bible as a whole.
- The Guardian published this graph combining holy books of several religions.