January 6, 2017
Two days ago, I released
syllables, a Go package that counts the syllables in a string of text.
lynx, the static generator that builds this website, imports the readability package from
BluntSporks. It is used to calculate the “rough” Flesch-Kincaid reading level of the article text in each post. The formula that
BluntSporks’s readability package uses to count the syllables in a text string is:
letter_count / 3 + digit_count. I thought that this formula could be easily improved. I thought wrong.
It turns out that accurately counting syllables is actually quite difficult. English letter combinations contain irregular patterns. Some combinations that parse as one syllable should actually count as two. Other combinations parse as two syllables but should actually count as one. These irregularities can be more than one syllable (up to probably four). Letter combinations that parse as two and three are actually one or three and two or four, respectively.
Most syllable counters will return the same results for common words. Strings of more than 50 characters often gave different results. Therefore, the test cases that are included in the current
syllables package do not test for syllable counting accuracy over a certain input string length. Beyond that length, it can only test for regular-expression matching integrity over time with code changes.
wooorm for the syllable-counting algorithm.
regexp package. It has been replaced by the curly brace quantifier, which may need to be fine-tuned.
Swift has made me pay more attention to function calling syntax. I’ve tried to make calling the functions in
syllables as clear and expressive as possible. The exported function
func In(string) int returns the integer number of syllables in the input string. Assume a string variable
text that contains the input text and the
syllables package is imported. Calling the counting
In function looks like this:
syllables.In(text). Go does not make it easy for the implementer to use the same function name for different datatype parameters (which would be its own topic for a different day). Thus, a convenience function is given to count the syllables in a byte array. Its prototype is
func InBytes(byte) int, and
syllables.InBytes(buffer) invokes it on a byte array
buffer through the package. This convenience function just converts the bytes into a string and returns the result of invoking
In on that string. This syntax is documented at godoc.org.
After the release, I found the
syllables package written by