A Coding Challenge – Organize my PDF library

I have a massive library of PDF documents. Many of these are ebooks, most of them are individual journal articles from my six years of enthusiastic study. I need a way to organize them besides my highly detailed file system that I use Google Desktop Search to search. (I am a librarian after all)

But I have seen the future, and it is… Mac only. It’s called Papers.

I’ve also discovered the world of E-Comics and this amazing application, ComicRack.

So, Papers downloads metadata about PDF files from a number of different sources, but doesn’t seem to use the file name except in very specific instances, like PubMed’s unique file naming system. ComicRack has an exceptional parser for file names, getting it right almost all the time. Papers allows you to search within files, by author and title. ComicRack can be extended to read PDF’s, but doesn’t have the ability to search within them and you would expect with a piece of software set up for displaying images. It’s strength is in reading and organizing.

So, this is what I need. I need a program that can parse file names, identify ISBN’s of print versions of the work or download journal information from databases like PubMed and JSTOR. I need to be able to search within documents. I need to be able to create lists and to display PDF’s with both graphics and text. Optical Character Recognition would be nice, but not essential. I would also like it to have the ability to name files based on rules set up, I’ve invested a lot of time into my organizing system and I’d like to preserve my standards of Lastname,Firstname – Title.

Anybody know of something that resembles a mashup of Papers and ComicRack? Let me know. The academic world needs it.

5 Responses

  1. Take a look at Zotero:


    It’s a plugin for Firefox and it’s pretty nifty. I don’t know about importing existing content, but it’s good for gathering papers on an ongoing basis.

  2. That’s my issue, is that I have a mass of existing content. I convert most of it with Mobipocket Reader on an as-needed basis but I’m interested in purchasing additional e-books and I’d like to be able to create lists, import information from JSTOR and Pubmed.

    I’ve used Zotero, ‘fraid it dosn’t do what I need.

    I need Papers ported over to Windows.

  3. >I need Papers ported over to Windows.

    Wild idea but may work: search for a Mac OS emulator on Windows. After all, emulators for OS’s do exist – for example, WINE is a Windows emulator that runs on Linux).

    If you find a good one, you may be able to run Papers on top of it (while its running on your Mac). Even if you find a Mac OS emulator for Windows, it might not work well enough for your needs – since writing such software is quite difficult – but it may be worth a try.

    I don’t know if there are any such emulators; but GIYF * 🙂

    – Vasudev Ram

    * that means: Google Is Your Friend, in case you didn’t know.

  4. Clever… but no cigar.

    I don’t want to bother with Mac emulators. I’m familiar with WINE and I’ve worked with other Crossover products. It’s just way too complicated and requires too many layers of programming. I know with modern processors you can get away with virtualization, but I’m more interested in a native program.

    And I have looked at library programs, including LibraryThing, Libra and some other open source alternatives. Not enough automation, I’ve already done enough work on the file system I have them stored in now, I don’t want to re-catalog everything.

  5. Mendeley is designed to do exactly what you’re looking for, and it’s cross-platform. The desktop client extracts the data from the PDFs and renames them according to your specified format, kinda how MP3 renamers work.

    Give it a try(it’s a beta!) and let us know how it works for you.

