Deciphering Gmail IMAP Labels With getmail
We have outgrown Google Apps, (formerly GAFYD (formerly GFYD)), or something along those lines. The problem of exporting your mail from Gmail is not a trivial one. From discussions by Opera Software’s lead QA for Opera Mail’s posting on Gmail’s Buggy IMAP Implementation to Matt Cutts’ posting on How to back up your Gmail on Linux in four easy steps to LifeHacker’s posting on Back up Gmail on Linux with Getmail to Wired’s recent wiki entry on Make a Local Backup Of Your Gmail Account, it seems that there is no one definitive source on how to pull your mail and retain your labels.
So here is what we have done to solve this problem:
- Use getmail - this has been the best archiver we have run across. There are other applications - isync, OfflineIMAP, Fetchmail, etc. - that probably do a decent job, but getmail is still the best in my view. There are other hacks - use Mail.app to synch the Gmail IMAP directory, then convert emlx to maildir; same for Thunderbird and mbox; etc - but we wanted something a little more straightforward - Occam’s razor, right?
- Install getmail - On my dev machine, I used macports (port install python25; port install getmail) to install the latest getmail which had dependencies on Python 2.5. After this was done, I set up the getmailrc config file and fired off an attempt using SimpleIMAPSSLRetriever… which failed due to a lack of SSL in the newly installed Python. I had to go back and install Readline (port install py25-readline), then install SSL for Python (port install py25-socket-ssl).
- Patch Python - There is malloc bug in imaplib when fetching large documents using SSL. So open up imaplib.py from your Python lib dir (in my case /opt/local/lib/python2.5/) and replace:
data = self.sslobj.read(size-read)
with
data = self.sslobj.read(min(size-read, 16384))
to maintain a 15MB memory block if necessary.
- Configure getmail - Now that most of the fun is taken care of, we need to set up a configuration file for getmail (~/.getmail/getmailrc) and create the proper local destination. First the getmailrc file:
[retriever] type = SimpleIMAPSSLRetriever server = imap.gmail.com mailboxes = ("[Gmail]/Starred",) username = username@yourdomain.com password = xxx [destination] type = Maildir path = ~/Maildir/ [options] verbose = 2 message_log = ~/.getmail/gmail.logFirst of all, we are using IMAP to retrieve mail as POP has a limit of 99 documents per access and that would take forever.
Second, we are using the Maildir format for the destination so we need to make sure the target directories have been created (~/Maildir/cur, ~/Maildir/new, ~/Maildir/tmp).
Third, we need to specify a mailbox or mailboxes to download or the INBOX will be the default.
Fourth, we need a trailing comma on the list of mailboxes to download due to a parsing error in getmail (actually the mailboxes option needs to be a tuple, but the trailing comma negates that).
Fifth, we need to know the syntax of Gmail’s internal IMAP structure to pull down discrete folders. Non-label folders (Starred, Sent Mail, Drafts, etc.) are accessed with “[Gmail]/Starred” (as in the above config) and labels are accessed directly. For example, the label “Important Project” would have this in the config:
mailboxes = ("Important Project",) - Download your Gmail - For every folder/label I had within Gmail, I downloaded to a separate folder so I could import into dovecot IMAP without hassle. This entailed changing the mailboxes option in getmailrc, running getmail, renaming Maildir to label/directory name, rinsing, repeating.
If dovecot turns out to be a hassle, I’ll blog about that next. Or about bricking my iPhone with the 1.2 firmware because I didn’t read the instructions (yes, I got into the iPhone developer program).
Update: Because maildir uses the modification time of every file to determine the sent date, all emails pulled by the above method will basically lose their sense of time. The below PHP script will restore the modification times:
/* VARS ***********************************************************/ $box = ''; $stem = SITE_DIR.'Maildir/'.$box.'/new/'; /******************************************************************/ $dir_contents = scandir($stem); foreach($dir_contents as $item) { if(!ListFind('.,..,.DS_Store',$item)) { $file = $stem.$item; $content = file_get_contents($file); $date = extractText($content,"\nDate: ","\n"); $utime = strtotime($date); $converted = date('YmdHi.s',$utime); shell_exec('touch -mt '.$converted.' "'.$file.'"'); } } function extractText($content,$start,$end) { if(strrpos($content,$start)===false) { return false; } $startpoint = strpos($content,$start)+strlen($start); $endpoint = strpos($content,$end,$startpoint); $length = $endpoint - $startpoint; return trim(substr($content,$startpoint,$length)); }
1 Comment so far
Leave a reply



Hi !
Thanks for your article !
Maybe you should use strripos and strpos functions in the extractText function, because sometimes, mails are not with “Date:” but with “date:”
Steph