Saturday 8 June 2013

Method for Archiving Gmail

My sister-in-law runs a fashion business and as a consequence sends and receives a  lot of big emails.  Even Google Apps for Business' 25GB mail box allowance is used up within a couple of years.  I needed a way to archive emails out of Gmail and on to local storage.  Here's how I do it:

  1. Install the getmail tool.  I'm running it under Cygwin on Windows, but it should run equally well on Linux.  Here are the Cygwin steps:
    1. Run the Cygwin installer and make sure that you have python installed
    2. Make a directory for the getmail install: mkdir getmail
    3. cd getmail
    4. Download the latest release: wget http://pyropus.ca/software/getmail/old-versions/getmail-4.43.0.tar.gz
    5. tar -zxvf getmail-4.43.0.tar.gz
    6.  cd getmail-4.43.0
    7.  python setup.py install
  2. Log into the Gmail web UI of the account you want to archive
  3. Search for the files that you want to archive.  E.g. to find all the files sent or received in 2011 search for:  "before: 2012/01/01 after: 2010/12/31"
  4. Now we need to give all the search results a unique label.  The trick here is that you need to label all the search results not just the first page, so:
    1. In the Gmail web UI click the select "all" check box
    2. When you do this, Gmail will ask "All 20 conversations on this page are selected. Select all conversations that match this search", click on the link to make sure you select all of them
    3. Click on the label icon
    4. Fill something unique in the "Label as:" box, e.g. "Mail2011"
    5. Click the "Create New" link
    6. The "New Label" dialogue box will pop up.  Click "Create"
    7. A warning box will explain that this will affect all conversations in the search.  Click "OK"
    8. Wait a few seconds until the "Loading..." box goes way to give Google time to make all its changes
  5. Next we need to set up a config file that will let you download all the selected emails
    1. First make a new directory for the archive
    2. mkdir emailarchive
    3. cd emailarchive
    4. Then edit a new config file
    5. vi getmail.conf
    6. Adapt my config to your use:
      [retriever]
      type = SimpleIMAPSSLRetriever
      server = imap.gmail.com
      username = user@gmailaccount.com
      password = topsecret
      mailboxes = ("Mail2011",)
      port = 993


      [destination]
      type = Mboxrd
      path = /home/dave/emailarchive/Mail2011.mbox


      [options]
      received = false
      delivered_to = false
      read_all = false
      verbose = 1

  6. The next step is a gotcha, the archive file has to exist before you try to run the getmail program, so do: touch Mail2011.mbox
  7. You are ready to run the program now with:
    getmail --getmaildir /home/dave/emailarchive --rcfile getmail.conf
  8. Some time later you will have all the emails downloaded to a local mbox format mail store
  9. Finally you can now go back into Gmail, select the label you created earlier and delete all the emails.  Note that Gmail won't free up your space until the emails are emptied from your trash folder

If you need instructions on how to read mbox format mail store files take a look at these instructions for several different mail reading clients.

Thanks to Matt Cutts for outlining the general approach to Gmail archiving that I have adapted here.



No comments: