Computer scientists over at Stony Brook University have figured out what makes one fiction book more successful over another.
“Predicting the success of literary works poses a massive dilemma for publishers and aspiring writers alike,” said Assistant Professor Yejin Choi. “We examined the quantitative connection between writing style and successful literature. Based on novels across different genres, we investigated the predictive power of statistical stylometry in discriminating successful literary works, and identified the stylistic elements that are more prominent in successful writings.”
Choi and colleagues found that frequent use of words such as “and,” “but,” and “or” are more often used in successful books, along with prepositions, nouns, and pronouns. Verbs, adverbs, and foreign words are more often used in less successful books, along with topical, extreme, and negative words.
“Successful” was defined by Project Gutenberg download counts, and the research team studied eight genres—adventure, mystery, historical fiction, fiction, science-fiction, love stories, short stories, and poetry.
“For a small number of novels, we also considered award recipients—such as Pulitzer and Nobel prizes—and Amazon sales records in order to define a novel’s success,” Choi said. “Additionally, we extended our empirical study to movie scripts, where we quantified a film’s success based on the average review scores at imdb.com.”
Choi believes the research is the first of its kind.
“To the best of our knowledge, our work is the first that provides quantitative insights into the connection between the writing style and the success of literary works,” Choi said. “Previous work has attempted to gain insights into the ‘secret recipe’ of successful books. But most of these studies were qualitative, based on a dozen books, and focused primarily on high-level content—the personalities of protagonists and antagonists and the plots. Our work examines a considerably larger collection—800 books—over multiple genres, providing insights into lexical, syntactic, and discourse patterns that characterize the writing styles commonly shared among the successful literature.”