The Telegraph reported last week that a group of computer scientists from Stony Brook University in New York have developed an algorithm which can predict with 84 per cent accuracy whether a book will be a commercial success.
As Assistant Professor Yejin Choi explained in the paper published by the Association of Computational Linguistics Success with Style: Using Writing Style to Predict the Success of Novels by Vikas Ganjigunte Ashok, Song Feng, and Yejin Choi “Predicting the success of literary works poses a massive dilemma for publishers and aspiring writers alike.”
“To the best of our knowledge, our work is the first that provides quantitative insights into the connection between the writing style and the success of literary works. We examine the quantitative connection, if any, between writing style and successful literature. Based on novels over
several different genres, we probe the predictive power of statistical stylometry in discriminating successful literary works, and identify
characteristic stylistic elements that are more prominent in successful writings.”
“Our study reports for the first time that statistical stylometry can be surprisingly effective in discriminating highly successful literature from less
successful counterpart, achieving accuracy up to 84%. Closer analyses lead to several new insights into characteristics of the writing style
in successful literature, including findings that are contrary to the conventional wisdom with respect to good writing style and readability.”
The Telegraph reports, “Less successful work tended to include more verbs and adverbs and relied on words that explicitly describe actions and emotions such as “wanted”, “took” or “promised”, while more successful books favoured verbs that describe thought processes such as “recognised” or “remembered”.
Journalist Matthew Sparkes elaborates: “The survey also found that a range of factors determine whether or not a book will enjoy success, including “interestingness,” novelty, style of writing, and how engaging the storyline is, but admit that external factors such as luck can also play a role. By downloading classic books from the Project Gutenberg archive they were able to analyze texts with their algorithm and compare its predictions to historical information on the success of the work.”
The paper states, “In order to quantify the success of literary works, and to obtain corresponding gold standard labels, one needs to first define “success”. For practical convenience, we largely rely on the download counts available at Project Gutenberg as a surrogate to quantify the success of novels. For a small number of novels however, we also consider award recipients (e.g., Pulitzer, Nobel), and Amazon’s sales records to define a novel’s success.” It goes on to say, “In order to obtain less successful books, we consider the Amazon seller’s rank included in the product details
of a book. The less successful books considered in Table 9 had an Amazon seller’s rank beyond 200k (higher rank indicating less commercial success) except Dan Brown’s The Lost Symbol, which we included mainly because of negative critiques it had attracted from media despite its commercial success.”
Genre including adventure, historical fiction, detective fiction, ‘love stories’, science fiction and classic literature and poetry were studied, with a selection of lower ranking books on Amazon included in order to compare the texts of successful and less successful books.
As the report states, “Perhaps due to its obvious complexity of the problem, there has been little previous work that attempts to build statistical models that predict the success of literary works based on their intrinsic content and quality…All these [previous] studies however, are qualitative in nature, as they rely on the knowledge and insights of human experts on literature. To our knowledge, no prior work has undertaken a systematic quantitative investigation on the overarching characterization of the writing style in successful literature.”