Author Alert: Amazon to begin screening self-published works for commercial viability

Amazon.com’s publishing arms CreateSpace and Kindle Direct Publishing will soon be more selective, a la the traditional publishers in New York.

Using software that evolved from research by a group of computer scientists from Stony Brook University in New York, CS and KDP will analyze each submission; if the submission doesn’t pass muster in terms of potential for literary or commercial success, then it will be rejected. The author then has the option of revising the manuscript and resubmitting it, or paying a fee to have it published anyway. The lower the score, the higher the fee.

computation_linguistics_studyThis makes perfect sense.

After all, Jeff Bezos and Amazon.com are in the business of making money. So why accept all comers without any sort of screening process? With the software—tentatively dubbed “Thresher” (as in “separating the wheat from the chaff,” although some may see it in terms of the shark)—Amazon can maximize its profitability by monetizing publishing services that it currently offers free of charge.

Thresher will do in seconds what literary agents and acquisition editors at publishing houses take days, weeks and months (if ever) to accomplish. What’s more, Thresher will accomplish this with a significantly higher success rate than agents and editors can ever do—and deliver Amazon.com’s knock-out punch to the traditional publishing industry.

Unbelievable? In part. Because I’m pulling your leg.

While such a scenario is possible, and may be turn out to be practical, I just made this up.

Mind you, the computer scientists at Stony Brook and their predictability algorithm do exist. That much is true.

And I did contact Amazon.com and asked about the application of these scientists’ research to screening uploads to CreateSpace and KDP. “It’s news to us,” the Amazon spokesperson said. (Ironic, because the researchers presented their study last October in Seattle.)

The broader question is whether this goal of accurately predicting a book’s success can be achieved by a computer. The Stony Brook researchers, apparently, believe it can; they described their methodology as “surprisingly effective.” Whether the concept has any commercial application or viability remains to be seen. (It turns out that Google, not Amazon, helped fund the research, in association with the Gutenberg Project.)

So why am I wasting your time by spreading a false rumor? First, to get your attention, then point out that others have misrepresented the report recently published by that group of computer scientists at Stony Brook, making it seem as if the researchers can actually predict whether a book will be successful or not.

Among those reporting on this topic, I point my accusing finger at Matthew Sparkes, Deputy Head of Technology and formerly a reporter on the City desk at The Telegraph newspaper (UK)—someone who should know better.

On January 9, the newspaper published a Sparkes missive titled “Scientists find secret to writing a best-selling novel.”

His opening statement:

 Scientists have developed an algorithm which can analyse a book and predict with 84 per cent accuracy whether or not it will be a commercial success.

That title and first sentence grabbed my attention, as I’m sure it captured the eyes of many an aspiring novelist. Wow! All I have to do is follow the Stony Brook Formula and—shazam!—my novel soars to the top of the New York Times best-seller list. I pop a cork and light cigars with $100 bills. I then take a trophy wife, buy a tropical island and live far from the madding crowd.

However, let’s now analyze the analysis. The headline and opening paragraph of Sparkes’ article are not just misleading, they are false. And anyone taking the time to actually read the report on this academic study will see this. You can read the report—“Success with Style: Using Writing Style to Predict the Success of Novels”—online.

(But before you dive in, you may want to familiarize yourself with “unigram,” “bigram,” “stylometry,” “clausal tag” and other terms tossed out by those immersed in the study of computational linguistics and predictability.)

In fact, the researchers qualified their assertion by stating: “. . . achieving accuracy up to 84% in the novel domain . . .” My emphasis, with the operative phrase being “up to.”

When one examines the table of results, one discovers that the 84 percent figure applies to a single subcategory (Adventure, Unigram) out of 120 categories they tested. In truth, many of their results were little better than the flip of coin. And in the case of Hemingway’s Old Man and the Sea, they employed a bit of unscientific rationalization to explain away a low score, thanks to Hemingway’s predilection for short, declarative sentences.

On average, the predictability rate landed closer to 70 percent (not bad, but nothing to bet the ranch on), and in one instance (Historical Fiction, POS) it attained a paltry 47 percent (i.e., less predictive value than a coin toss). The highest across-the-board percentage of any screening component of the study topped out at 73.5 percent, while bottoming out a lackluster 64.5. (Keep in mind the baseline is 50 percent—the coin toss—not 0 percent.)

Casino bosses in Vegas would LOL at these odds.

The authors of the report also differentiated between literary success and commercial success. They acknowledged that literary success does not necessarily mean “commercial success” and even a mediocre or poorly written book may still become popular (witness 50 Shades of Grey and The Lost Symbol) and achieve commercial success.

The study analyzed 800 books, from classic literary works and best-selling novels to some of the worst-selling books available from Amazon.com (unbeknownst, apparently, to Amazon.com). It also evaluated some movie scripts.

“To the best of our knowledge, our work is the first that provides quantitative insights into the connection between the writing style and the success of literary works,” the researchers say. They also note that the study provides “insights into lexical, syntactic, and discourse patterns that characterize the writing styles commonly shared among the successful literature,” while acknowledging that “some elements of successful styles are genre-dependent.”

Thus, the researchers concluded that “. . . deep syntactic features expressed in terms of different encodings of production rules consistently yield good performance across almost all genres.”

Sparkes translated that into layman’s terms: “They found several trends that were often found in successful books, including heavy use of conjunctions such as ‘and’ and ‘but’ and large numbers of nouns and adjectives.”

Seriously? Aren’t conjunctions, nouns and adjectives characteristics of every book? Hemingway, excepted, of course, when it comes to conjunctions.

“Less successful work tended to include more verbs and adverbs,” Sparkes wrote.

Did Mr. Sparkes actually think about what he’d written? He didn’t see any point in questioning such nonsense? Mind you, most discerning critics agree that frequent use of adverbs exemplifies lower-quality writing. But verbs?

So, if I understand this correctly, successful writers use tons of conjunctions (typical of a run-on sentence—think Faulkner), and tons of nouns (generally, a critical component of a complete sentence) and adjectives (a preponderance of which is also known pejoratively as “purple prose”). Meanwhile, unsuccessful writers use a lot of verbs (another critical component of a complete sentence). Imagine that. Ever try writing a sentence, let alone a book, without a verb?

So . . . what, if any, useful information can we glean from this study? (Drum roll, please! Aspiring writers pay attention.)

Less successful books used such unigrams as:

  • Negative terms: never, risk, worse, slaves, hard, murdered, bruised, heavy, prison
  • Body Parts: face, arm, body, skins
  • Location: room, beach, bay, hills, avenue, boat, door
  • Emotional /Action Verbs: want, went, took, promise, cry, shout, jump, glare, urge
  • Extreme Words: never, very, breathless, sacred, slightest, absolutely, perfectly
  • Love Related: desires, affairs

More successful books used such unigrams as:

  • Negation: not
  • Report/Quote: said, words, says
  • Self Reference: I, me, my
  • Connectives: and, which, though, that, as, after, but, where, what, whom, since, whenever
  • Prepositions: up, into, out, after, in, within
  • Thinking Verbs: recognized, remembered

Good luck with that. (And forget about writing romance novels.)

The scientists wrapped up their report with this gem: “In sum, our analysis reveals an intriguing and unexpected observation on the connection between readability and the literary success—that they correlate into the opposite directions. Surely our findings only demonstrate correlation, not to be confused as causation, between readability and literary success. We conjecture that the conceptual complexity of highly successful literary work might require syntactic complexity that goes against readability.”

Translation: A highly successful literary work is unreadable.

Hmmm. That’s news?

My conclusion: In sum, Amazon will not be hiring these guys any time soon.

______________________

Footnote 1: Unigram. According to Wikipedia: “In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. An n-gram of size 1 is referred to as a ‘unigram’ . . .”  Got that?

Footnote 2: Grammatical irony. Incorrect usage found in the report’s Abstract: the verb “lead” rather than the correct past-tense conjugation “led.” But who needs verbs?

Footnote 3: Prediction. The report cited herein will become a highly successful literary work based on its low readability quotient.

Advertisements

About Polishing Your Prose

Larry M Edwards is an award-winning investigative journalist, author, editor and publishing consultant. He is the author of three books, and has edited dozens of nonfiction and fiction book manuscripts. Under Wigeon Publishing, he has produced six books. As author, "Dare I Call It Murder? A Memoir of Violent Loss" won First Place in the San Diego Book Awards in 2012 (unpublished memoir) and 2014, Best Published Memoir. The book has also been nominated for a number of awards, including: Pulitzer Prize, Benjamin Franklin Award, Washington State Book Award, and One Book, One San Diego. As Editor, "Murder Survivor’s Handbook: Real-Life Stories, Tips & Resources" won the Gold Award in the 2015 Benjamin Franklin Book Awards, Self-Help. For a sample edit and cost estimate, contact Larry: larry [at] larryedwards [dot] com -- www.larryedwards.com -- www.dareicallitmurder.com -- www.wigeonpublishing.com
This entry was posted in Editing, Publishing, Reading, Writing and tagged , , , , , , . Bookmark the permalink.

6 Responses to Author Alert: Amazon to begin screening self-published works for commercial viability

  1. susanpjames says:

    You got me there. Wish it were so. Great tongue-in-cheek assessment (yours). Write a book without verbs? Or maybe adjectives will work better without verbs to restrain them..On the serious side, how viable is it that Amazon/Create Space develop something like this to weed out better entries?

    • I did speak (via email) to someone at Amazon, and as far as I know, this is not being considered. But that doesn’t mean it’s not, and I would expect this eventually. But the methodology has to be much better. What the researchers should do is take a bunch of texts and track them from pre- to post-publication and see how well the algorithm works. The problem with the study is that they looked at texts from the 19th century to modern times, so I question the validity of that model.

  2. Mikel miller says:

    Too funny to be true, and it turns out it wasn’t true. Thanks.

  3. robinskone says:

    So the Great American Novel is yet to be written! Whee — there’s still hope for me!! Oh, oops, probably not supposed to say “hope.” Or “Whee.” There go my chances. Drat. (Probably not supposed to say that, either.)

  4. Pingback: I Know Nothing About Writing | Polishing Your Prose

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s