OK, Google. I want to find some information on a topic. Not just any information, but something that’s well-written, not too detailed, and entertaining.
Good luck with that.
As an article from Penn Engineering’s blog points out, the complex algorithm that drives search engines is limited. Search engines can easily find content that’s relevant to whatever keywords or phrase you type in. But they can’t create an accurate summary of its content, explain its tone or intent, or find a variety of other subtle meanings.
At least not yet.
Subtleties Even a Computer Can Grasp
Ani Nenkova, an associate professor of computer and information science at UPenn, is working on breaking down these kinds of subtleties into terms that machines can understand. Since every code has distinct and predictable patterns, language should be no exception.
To that end, Nenkova and a former grad student, Annie Louis, pored through a trove of written material specific to a particular genre – more than 10 years of New York Times science articles -- seeking statistical relationships within word choice and sentence structure. Using writing and journalism textbooks as a guide, they identified rules that are often followed -- because those textbooks frequently advise students to make their writing “visual,” for example, they developed a rubric for scoring visual quality. They also looked for patterns such as frequency and complexity of words used, use of folksy colloquialisms, reliance on dense academic language and other factors.
Baseline “quality buckets” were also created for the writing analyzed, based on relationships to the curated anthology Best American Science Writing.
Nenkova found some of the results surprising. Writing they had labeled “typical” quality used the most visual words, while “great” writing used those words more judiciously and stuck to specific categories or themes.
“In writing that stands out, there are fewer visual words but they exhibit a stronger pattern of organization,” Nenkova said.
Even more surprising was that notable writing appeared to be lighter on detail -- it was able to convey complex information on a particular scientific finding without “going into the weeds.”
More Than Just Searching
Nenkova’s research represents more than just a potential improvement on a search algorithm. By uncovering the hidden linguistic “rules” that expert communicators use when writing, she hopes to improve people’s lives.
Those with Autism Spectrum Disorder (ASD), for instance, can struggle daily to communicate with others -- trouble choosing the right tone or telling appropriate jokes can lead to difficulties maintaining a job or developing relationships. Teaching computers to identify clear communication might lead to software for people with ASD.
Some of Nenkova’s undergraduates have also adapted her tools to create software to gauge the clarity of their writing in real time. Others have worked on applying the techniques to characterizing good writing in languages other than English, which may have very different definitions of how “quality” text should be organized. Internet searches can easily turn up webpages in other languages and being able to identify hidden traits that resonate within a given culture, Nenkova points out, may help ease the spread of ideas worldwide.