On Sunday we opened the Bookhype doors to a few alpha testers. It’s definitely been a whirlwind of emotions for me. I’ve been working on Bookhype for months and it’s weird to finally be letting other people in on it. Suddenly all the imperfections are huge and glaring at me. We have duplicated records (editions not grouped into a work correctly), missing series information, some duplicated series, missing covers, etc. We also struggle to import books people have read that are only available on Amazon (self published Kindle books, as an example).
I’ve tried really hard to do this project in what I felt was the best way — we get book data from legitimate sources that we’re allowed to license it from with a lot of freedom. We don’t use massively restrictive APIs (Amazon) or use APIs in a way that isn’t allowed (Goodreads). While this does give us a huge database of books, it turns out data directly from publishers is messy (see: book data parsing woes). Other sites have had years to refine their databases and fix all these messy records, but we’re just getting started. I’m feeling disappointed that I can’t offer people a more refined database. I find myself wondering: will people even want to join a site where the book data isn’t perfect?
Cleaning up our database is definitely a long term goal, but it’s not going to happen overnight. I suppose the best we can do is keep moving forward and hope people will give us a chance.
Well, I haven’t seen the site yet, but as a librarian on both GoodReads and BookLikes, all these parsing issues are very familiar. I also do personal imports from PRH and Macmillan api’s, so I’m familiar with the regular expression fun that comes with trying to anticipate all the ways a single piece of data can be recorded. Don’t sweat it – there’s no way to anticipate it all without driving yourself crazy and/or creating overly bloated parsing filters. Choose some trusted users and set them up to allow librarian edits, and they can catch the strays.
I know a lot of people who aren’t at all happy with GR that are looking for an alternative now that BookLikes seems to have hit rock bottom. Once you get going, I think plenty of people will give you a chance (especially since you’re focussed on data first). Good luck.
Thanks so much for your thoughts. 🙂 A librarian program is definitely on the horizon. There are a few things I want to set up first before launching it, but it will definitely happen!