Today I embarked on analysing the top 50 Amazon bestselling books from 2009 – 2019. It is a well analysed dataset, but, as I love books, I thought it would be fun to have a look at it myself. I actually found some of the results quite surprising.
Tools I used:
-python pandas
-MS Excel
The dataset is not actually that large (550 records, duh), so a filtered and sorted spreadsheet would suffice to perform most of the analysis. Pandas would still prove useful for gathering and summarizing certain statistics.
To start, I wanted to get straight down to which authors dominated the bestseller list during the eleven year period. The value_counts() function afforded by pandas would provide a quick and easy way of obtaining this. As the name suggests, this gives a count of each time a value appears in the specified column. We assign the result to ‘author_counts’, and use the head method to look at the first 10 rows:
import pandas as pd
file = pd.read_csv('amazon_top50_books.csv')
df = pd.DataFrame(file)
author_counts = df['Author'].value_counts()
author_counts.head(10)
Author
Jeff Kinney 12
Gary Chapman 11
Rick Riordan 11
Suzanne Collins 11
American Psychological Association 10
Dr. Seuss 9
Gallup 9
Rob Elliott 8
Stephen R. Covey 7
Stephenie Meyer 7
Et voila. I actually didn’t know who half of these were! (Honestly). In my defence, once I looked people up, I can say at least I was aware of their work, and aware that it was in mainstream popularity. For anyone else who is in the dark, in descending order by rank these are:
- the author of the kids series ‘Diary of a Wimpy Kid’
- the author of the relationship guide, ‘The 5 Love Languages’
- the author of the young adult fantasy series, ‘Percy Jackson and the Olympians’
- the author of the young adult dystopian fantasy series, ‘The Hunger Games’
- the publisher of the APA style guide. Now, I was surprised by this one, as I actually studied psychology between 2011 – 2014, and I never needed to buy this! I did some research, and it turns out that an increasing number of disciplines outside of Psychology actually do use APA style for their academic publications. Many libraries and academic institutions are probably required to order it in bulk. And it got them to no.5 on the list (with a book that is probably as dry as hell). So there you go.
- Good old Dr Seuss! It is actually one book in particular, ‘Oh, the places you will go!’ that has done him proud, however ‘What pet should I get?’ did also win a place on the bestsellers list in 2015.
- publisher of ‘Strengthsfinder 2.0’
- the author of a couple of kids’ joke books
- the author of ‘The 7 habits of highly successful people’
- the author of the ‘Twilight’ series
Now, it occurred to me that just because an author makes the most appearnaces on the bestsellers list, that does not necessarily make them the ‘best’ or most successful author. Many of the above are authors of a series, and have multiple entries that could potentially get them a place on the list.
Going back to my spreadsheet, I determined that the highest user rating for any best seller was 4.9/5 (no such thing as a perfect ‘5’). The analysis now turns to which books were awarded the highest rating. In pandas, I created a new dataframe that filtered user rating by the identified top value 4.9. Some books received the top rating in multiple years, so I created another subset of that dataframe just to show the books that uniquely received the top score:
top_rated_books = df[df['User Rating'] == 4.9]
unique_top_rated_books = top_rated_books.drop_duplicates(subset=['Name'])
print(unique_top_rated_books)
40 Brown Bear, Brown Bear, What Do You See? Bill Martin Jr.
81 Dog Man and Cat Kid: From the Creator of Capta... Dav Pilkey
82 Dog Man: A Tale of Two Kitties: From the Creat... Dav Pilkey
83 Dog Man: Brawl of the Wild: From the Creator o... Dav Pilkey
85 Dog Man: Fetch-22: From the Creator of Captain... Dav Pilkey
86 Dog Man: For Whom the Ball Rolls: From the Cre... Dav Pilkey
87 Dog Man: Lord of the Fleas: From the Creator o... Dav Pilkey
146 Goodnight, Goodnight Construction Site (Hardco... Sherri Duskey Rinker
151 Hamilton: The Revolution Lin-Manuel Miranda
153 Harry Potter and the Chamber of Secrets: The I... J.K. Rowling
155 Harry Potter and the Goblet of Fire: The Illus... J. K. Rowling
156 Harry Potter and the Prisoner of Azkaban: The ... J.K. Rowling
157 Harry Potter and the Sorcerer's Stone: The Ill... J.K. Rowling
174 Humans of New York : Stories Brandon Stanton
187 Jesus Calling: Enjoying Peace in His Presence ... Sarah Young
207 Last Week Tonight with John Oliver Presents A ... Jill Twiss
219 Little Blue Truck Alice Schertle
244 Obama: An Intimate Portrait Pete Souza
245 Oh, the Places You'll Go! Dr. Seuss
288 Rush Revere and the Brave Pilgrims: Time-Trave... Rush Limbaugh
289 Rush Revere and the First Patriots: Time-Trave... Rush Limbaugh
303 Strange Planet (Strange Planet Series) Nathan W. Pyle
420 The Legend of Zelda: Hyrule Historia Patrick Thorpe
431 The Magnolia Story Chip Gaines
476 The Very Hungry Caterpillar Eric Carle
486 The Wonderful Things You Will Be Emily Winfield Martin
521 Unfreedom of the Press Mark R. Levin
545 Wrecking Ball (Diary of a Wimpy Kid Book 14) Jeff Kinney
So J.K. Rowling is still going strong. Interesting to note that a majority of these would appear to be childrens’ books!
To finish, I thought it would be nice to do a visualization. I wanted to investigate with a bar chart how the popularity of fiction vs. non-fiction has changed over the eleven year period.
First, I did a group by in pandas by year, then genre to get the aggregate statistics of interest:
genre_counts_per_year = df.groupby(['Year', 'Genre']).size().unstack(fill_value=0)
print(genre_counts_per_year)
Genre Fiction Non Fiction
Year
2009 24 26
2010 20 30
2011 21 29
2012 21 29
2013 24 26
2014 29 21
2015 17 33
2016 19 31
2017 24 26
2018 21 29
2019 20 30
Now, using this output, I created my visualization with a chart in MS Excel:

And there you go! So, it looks like with the exception of 2014, the general trend is that non-fiction features somewhat more than fiction, but the difference is not that big.
There are lots of reasons why you might expect this- people don’t just buy books for entertainment, which tends to be the primary purpose of fiction. Non-fiction books serve a variety of purposes, including entertainment but also study, instruction (food and cookery books come to mind), and self-help. A limitation of the dataset is that it only categorizes ‘genre’ in terms of whether a book is fiction or non-fiction. It would be interesting to look in depth at which subgenres (e.g., fantasy, crime, self-help, what have you) feature on the list. Part of the challenge of course is that identifying what genre a book falls into can be subjective, and people might have differing opinions on the matter.
Follow-up considerations:
- I thought it was interesting that there was a trend for the most highly rated books to be childrens’ books. Is there an explanation for that? Is it natural to assume that adult literature is just held up to a higher standard, and so you would expect the reviewers response to be more critical and harsh? Are childrens’ books genuinely just more enjoyable? Could it be because, a lot of childrens’ books are actually bought as gifts for children, and it is not actually the children themselves who give the ratings, but the adults, and their rating is distorted somewhat by the elevated response of their youthful recipient?
- I noticed that all the authors considered were native english language authors. it turns out that this is because the data only considers the top selling english language titles! There are bestseller lists for other languages as well. It might be interesting to compare, and find out what the most popular titles in other languages are.
- Could I attempt creating my own sub-categories in the original dataset, to look more specifically at what kinds of books dominated the list?
- The data only goes up to 2019. It would be interesting to scrape data up to present date, to find out which books are now in vogue. Would we see that some of the top contenders have continued to hold their place?
Leave a comment