About DeepGrammar

I make a lot of dumb mistakes when I write, and I’ve always dreamed of having a smart computer that could point out the errors that escape my notice. Building such a grammar checker is hard. You can’t just write down the rules of English grammar and check that they are followed like you can when building a compiler for a programming language. Natural languages such as English have some syntactic regularity, but they are squishy, and a grammar checker needs to have some understanding of the content to see that underlying regularity. This means that a computer must understand what you intended to write to know if you have written it correctly.

Grammar checkers can’t read our minds, but they can read a lot of text written by different people to learn patterns of what we intend to write. This learning can’t be over the words themselves because the grammar checkers wouldn’t be able to generalize to unseen text or learn from related text. For instance, the word “dog” and the word “cat” are completely different symbols, but as far as grammar checking goes, they should be treated pretty much the same.

Fortunately, the relatively new field of deep learning moves beyond symbols to represent words as vectors. Vectors are sequences of numbers, such as [14.2, 17.1, 2.4], and deep learning learns vectors for words that encode their semantic meaning. Instead of just being equal or not equal like symbols, one can measure the distance between the vector for one word and the vector for another, and this measure of distance can provide a powerful mechanism for generalization. For example, deep learning may learn that the sentence “I feel worried” is closer to the meaning of the sentence “I feel anxious” than to the meaning of the sentence “I feel sleepy.”

DeepGrammar is a grammar checker built on top of deep learning. DeepGrammar uses deep learning to learn a model of language, and it then uses this model to check text for errors in three steps:

  1. Compute the likelihood that someone would have intended to write the text.
  2. Attempt to generate text that is close to the written text but is more likely.
  3. If such text is found, show it to the user as a possible correction.

Let’s see DeepGrammar in action. Consider the sentence “I will tell he the truth.” DeepGrammar calculates that this sentence is unlikely, and it tries to come up with a sentence that is close to that sentence but is likely. It finds the sentence “I will tell him the truth.” Since this sentence is both likely and close to the original sentence, it suggests it to the user as a correction.

Here are some other examples of sentences with the corrections found by DeepGrammar:

You can find a quantitative evaluation of DeepGrammar here.