2010-10-20

Creating an Encrypted Subversion Repository on Linux

Why?

I have my source code on a server in the cloud. That makes perfect sense - I want to have my code accessible from everywhere, even if the only person accessing the repository is my own self. Access is secured using SSH with PKI - only whoever has the private key can access the system, no passwords allowed.

While I feel pretty secure about access, it bugs me that the source code is not encrypted at rest. Whoever gains access to a copy of the repository (for instance, from a backup) has the code in cleartext. That's absolutely not good. On the other hand, setting up an encrypted repository is too much of a hassle, and I couldn't find anything online about how to do it.

One rainy day (yes, we have those in Southern California, and we look at them like people in Hawaii look at snow) I decided I had enough of it. I wasn't going to take it anymore. I had to do it.

What Not?

When setting up my encrypted repository, I wanted to avoid the most common mistake: a repository that can be accessed from the machine itself. You see, the problem with most encryption software for drives is that it stores the key with the hardware. If you do it that way, the encryption is pretty pointless.

You could set up encryption so that only people with login access to the machine (who also know the password) can decrypt the repository. This approach works well for encrypted home directories, but in my source code access there is no password.

So, whatever I did, I needed to pass the credentials (or the path to them) with the request itself. The request would provide location and password, and that would be sufficient to unlock the encrypted file.

How?

My ideal scenario was simple: a Truecrypt repository on the server with SVN (Subversion) access. I base the whole description on this combination, and the peculiarities of both come into play at several times.

I chose Truecrypt over, say, CryptFS because the repository is a single file. It is completely opaque to the intruder, and I can even set it up so that it's not clear the file mentioned is a Truecrypt repository. (For instance, I could call it "STARWARS.AVI" and make people think it's a bootleg copy of a movie.) With most crypto filesystems, encryption is per file, which means the file name and the existence of single files (and directories is visible).

I chose Subversion over, say, git because... well, because my repo is already in Subversion, and because SVN has this neat remote "protocol," which consists of creating a remote, secure connection to the server and executing the commands locally, without a special network protocol involved.

Tricks

The first part is very, very simple: after installing truecrypt and subversion (as well as ssh, which you should already have) you need to create a Truecrypt container. Choose a file container and give it a Linux (ext3) filesystem, and make it big enough to fit the largest size your repository will ever grow to.

To create the container, simply type in truecrypt -t -c on the server. That will start the interactive dialog that will create the encrypted file. Give it any name (I assume here you called it STARWARS.AVI) and location (doesn't really matter). The defaults are fine for everything, you'll provide file name, file size, and none for the file system. When it comes to the password, choose something really really good.

[Note: volume creation on the client has the advantage of being able to use the graphical interface, which helps a ton.]

Since we didn't select a filesystem, we have to create one. To do that, we need to learn a little about truecrypt internals - but we'll use it in a moment for the actual subversion trick, so it's not too bad. Here goes: truecrypt creates a series of "devices" that the system uses to talk (indirectly) to the encryption core. That's done because truecrypt lives entirely in user space, and hence encryption is not available on a kernel level.

The devices are called mappers and reside in /dev/mapper/truecrypt*. To Linux, the devices behave like regular block devices (read, like a hard drive). One you have a map done, you can do anything with it that you would normally do with a drive, including formatting.

To map, you invoke truecrypt with the container name and the no filesystem option:

truecrypt --text --password= STARWARS.AVI --filesystem=none --keyfiles= --protect-hidden=no

(Long options this time to save the pain of explaining.)

Now you should have a mapper mounted - if you type mount, you should see one line that starts with truecrypt. Remember the number after aux_mnt (typically 1 on your first try).

Now we create the filesystem:

mkfs.ext3 /dev/mapper/truecrypt1

(You may have to be root to do that - in which case add "sudo" at the beginning of the line.)

AutoFS

The "evil" trick that we are going to use next is a dynamic filesystem mounted automatically. AutoFS is a package that allows you to declare that certain directories on your system are special and access to them requires special handling. For instance, I use autoFS to connect to my SSH servers. The directory /ssh on this machine is configured to open an sshfs connection to whichever server I name. So, if I write:

ls /ssh/secure.gawd.at

I get the listing of the home directory on the server secure.gawd.at. (The server doesn't exist, but that's beyond the point.)

In this case, we will use what is called an executable map: autoFS invokes a script that you name, and will take the results as the configuration options it needs to mount the filesystem. In our case, the script will first open the truecrypt container and then move on to passing the options for the underlying filesystem to autoFS.

Once more: ls /svn/magic-name - autoFS - mapper script - truecrypt mapper - mount options - mount

I wrote the script in Tcl, which is still my favorite shell to use. It requires nothing but Tcl and Tcllib, both package available in pretty much all Linux distributions (although the Debian people, noted bigots, require you to specify a version number). You can download it here.

Copy the script into the /etc directory. If you didn't already, install autofs on your machine. Now edit the /etc/auto.master file and add a line for this new file. Let's call it auto.enc and link it to the /svn directory.

A little magic setup: you have to create the directory (sudo mkdir /svn); then you have to make the map executable (sudo chmod+x /etc/auto.enc); finally, restart autoFS so that it looks at your map.

Now we have to choose the container and map. To make my life easier, I chose to use a .d directory in /etc. If you create a directory /etc/auto.enc.d with (sudo mkdir /etc/auto.enc.d), then all the links inside it will be considered maps. The name of the link is the name of the map, and the location it points to is the file container.

If you want to use the container /enc/data/STARWARS.AVI under the symbolic name repo, then you would do this:

ln -s /enc/data/STARWARS.AVI /etc/auto.enc.d/repo

Grand Finale: Credentials

Now the big question: how does the mapper find the password? That was the big question at the beginning. The way I solved it was to add the password to the name of the directory. Crazy guy!

If you followed the instructions above, then whenever you access /svn/anything, the map is consulted. The map script looks at the data passed in and looks for an "@" sign, which is considered the separator between map and password. So, if you wanted to access the repository repo with the password "secure," you would type in

ls /svn/repo@secure

The script would mount the mapper and tell autoFS to mount the directory as an ext3 file system.

But, but, but!!! You are passing the credentials in cleartext! That's bound to be terribly bad!!!

Well, yes and no. The transmission between server and client is SSH, so nobody can see the password in the clear. On the server, the password is in the clear, but it is not logged anywhere (unless you tell SVN to log everything). On the other hand, someone that happens to be on the server when a request comes in is also able to look at the encrypted data, since it is mounted for that period of time. So if an attacker looks at the password, the attacker might as well look at the files that are protected.

Epilogue

Let's just assume you got everything working - you should now be able to create a repository:

svnadmin create svn+ssh://svn/repo@secure

Now you should notice that the file STARWARS.AVI has been modified. If you mount it using truecrypt, you will see a series of files in there - files that SVN will continue using from now on whenever you access the encrypted repository. Hooray!

Notes and Addenda

1. I set the expiration timeout for the directories low, but not incredibly low - at ten seconds. You do that by specifying the "timeout" parameter in the auto.master file. That way, I can do an svn update; svn commit cycle without requiring a second mount. You can play with the parameters yourself.

2. The encryption scheme could be improved easily by using keyfiles instead of passwords. To do so, you would place a keyfile on a remote location (a web server, maybe) and require the script to get that resource, decrypt it using the password provided, and then use that as the keyfile. The advantage is that you require three pieces of information: the truecrypt container, the encrypted keyfile, and the password to the keyfile, to do your bidding.

3. Disclaimer: this setup works for me, but that's because I munged around for dozens of hours until I figured out all the options and configuration items necessary. If it doesn't work for you, don't sue me. If it works, but it stops working after a while and your brilliant source code is lost forever, don't sue me. Proceed with caution, always make backups, and never rely on the advice of strangers. Mahalo.

2010-09-25

The YouTube Conspiracy in Abby/CClive

There is a command line utility available on all Ubuntu derivatives called cclive. If you install it (sudo apt-get install cclive), you can give it a YouTube URL and it will download the video on it. I love using it for backup purposes - I upload a video from the camera, perform all my changes on YouTube, and then cclive the outcome for posterity. Just in case YouTube "forgets" about my latest cam ride.

There is also a GUI for cclive, called Abby. Abby is more than just a frontend for cclive, though, it also helps with pages that have multiple videos on them - playlists or RSS feeds. Abby is hosted on googlecode and written in QT/C++.

I started surfing, so I decided to scour YouTube for surf instruction videos. There is a set of 12 introductory videos by two Australian pros on there, so I decided to download them to take them to the beach. My N900 gladly displays YouTube videos on its gorgeous screen. Unfortunately, though, the T-Mobile coverage is fairly bad, so a downloaded video was the only real option.

BPM Detection in Linux

Doing a lot of cardio workouts, it is really good to have music that beats to your rhythm. The pulsating sound gives you energy and pace, both excellent ways to make a good workout, great, and to make time pass faster. When I get new music on my player (a Sansa Clip+ - the almost perfect player for a Sporty Spice) life is good. An hour of running is gone before I even know it, and when I look at the calorie count, I feel like Michael Phelps freshly crowned with Olympic gold.

At first I would stumble across music that matched the pace. I do lots of different kinds of workouts, so certain songs would work with different segments. I have a running pace, a hiking pace, a mountain climbing pace, a cycling pace, a weight lifting pace, a spinning pace, etc. I would get used to certain songs in certain parts, but that would get old, fast.

Then I got used to counting beats. I would look at the big clock in the gym and count beats for 15 seconds. That would give me a general idea of what I could use a song for. My favorite spinning song, for instance, was "Hazel Eyes," so anything that had the same beat count would be a good replacement.

Then I started getting bored with this random approach and realized I had a library of hundreds of CDs ripped onto my computers. I just had to detect the beat automatically and I would be able to simply do a lookup search for a specific BPM count and get all possible results.

2010-09-24

The Rise and Fall of Internet Browsers

It amazes me how, since the very inception, Internet Browsers have been subject to periodic meteoric rise and subsequent fall. They do so a lot more than other pieces of software, like operating systems or word processors. It seems people are much more willing to throw out their browsers than virtually any other kind of software.

It all started with the venerable grandfather of them all, Mozilla Navigator. Marc Andreesen, the ur-type of the "smart kid with an idea brighter than even he thinks it is who goes on to think he's the smartest person on the planet because he's been lucky with his idea", and his team created the software and threw it out. Instant success, huge company, enormous IPO. But a piece of software that was horrible, and got more and more horrible as time wore on.

Mozilla was mired in the conflict of the dot-com days: how do you monetize a piece of software without charging the user? It would take almost ten years for Google to show us, but back in the day, it meant shareware. Mozilla was selling servers, and the browser was a loss-leader. It got all the attention that a loss-leader gets - it got more and more bloated, supporting more and more reasons for people to upgrade their servers (and not buy them from anyone else), but in the process it got slower and slower.

Finally, in one of his last acts of Imperial Fiat, Bill Gates decreed that the Internet was not a fad and that Microsoft needed to get in on the action. A few years later and a ton of lawsuits after, Mozilla was dead (or bought by AOL, which is pretty much the same thing) and Internet Explorer the only dominant figure in the landscape.

Then IE started showing problems. Not bloat and slowness, although those became more apparent. No, it was security that became the big issue. IE's security model was cooperative and not designed for the abusive exploits of Internet Mafia conglomerates. As a result, surfing certain types of "shady" sites would invariably land your machine into zombie territory, or at least get you a virus infection or two.

When the Mozilla Foundation announced it was looking at a new, brand new browser named Firebird, even the most hopeful were not easily convinced. Navigator was a monster, written by people that needed to get things done, no matter how unmanageable the result, and Firebird would have to be a rewrite from scratch to compete.

But it did. Renamed Firefox (sadly), it began a march of conquest that landed it to top spot in the browser stats. Nowadays, almost half of all Internet users choose Firefox, while IE has only a little more than a third of the market.

Firefox was helped by a series of advantages: it was much faster than IE; it was factors more secure than IE; it had an extensive extension system with loads of useful things - useful for users, which IE had traditionally ignored in favor of usefulness for companies. Only lately has Firefox started to show weakness, and from the most unlikely of sources.

I invested much time in my Firefox setup. I have the extensions I want, synchronized across my two dozen machines (don't ask) using a sync extension. I have Firefox customizations for nearly everything, and I write my own Greasemonkey scripts. Yet, I started using Google's Chrome browser (Chromium on this laptop). Why? Because Chromium uses "one process per tab".

How does it matter? Why is it so important to me that each tab have its own process? The answer is Flash. You see, Flash is a giant memory leak. Whenever I land on a page that has Flash on it, memory gets allocated (by the Flash plugin) and never released. After a few hours of heavy browsing, my browser slows down to a crawl. Another few hours, and it's completely unusable. After a day, I have to restart it, and the process of freeing up memory may take upwards of 10 minutes.

Flash on Linux, of course, is an afterthought. The way Adobe treats its Linux users, though, shows all the weaknesses of the technology in a merciless way. First, there is the closed nature of Flash: Linux users cannot suggest modifications or fix bugs, as they do with other software, because the plugin is closed.

Then, there is the "one-size-fits-all" approach of the plugin. I find Flash used for controls on web pages (especially ones that require notifications), for e-cards, especially of the inspirational or funny kind, and for online videos. Those three use cases are totally different, and using the same software for each of them is only in the interest of the maker of the software, Adobe, not in the interest of the user.

So. for now I am forced to leave Firefox for no reason of its own and adopt a different (and very capable) browser simply because I can't get Flash to work on FF.

2010-09-14

Solar-friendly Gadgets

Living in San Diego, you spend a lot of time at the beach. There, it's a tragedy to have no outlets to recharge things, while you have a lot of sun available. Ideally, you would use a solar charger to recharge all your devices while they are being used.

There are lots of chargers on the market. Solio makes very popular ones, but they are by far not the only vendor. Basically, all of them in one way or the other allow you to connect a variety of devices to a solar panel that outputs the correct power (voltage).

When buying a charger, you should look mainly for one thing: that the output is USB. Dedicated plugs are nice, but ultimately you'll end up connecting most devices via USB, so something that requires an adapter to get to USB requires you to carry something extra (Solio, for instance, mysteriously has output that looks like USB, but isn't. You need an adapter to translate to USB, which is really stupid. I assume it's done to have something else to make you lose and buy.

The devices, though, are a little more challenging, since there is more than one thing you have to look at.

2010-08-22

A Tale of Broken Packages

Now, you are probably going to think this is about mishaps with a shipping company. Boohoo! My UPS package arrived damaged... Sad Face... None of that: UPS is reliable as ever, as are the other shipping companies (FedEx, DHL, USPS, what have you). No, the packages I am talking about are Linux/Ubuntu packages.

One of the major advantages of using a distro like Ubuntu is that someone else figures out what works with what and makes packages available for you to download and install semi-automatically. It's really easy and a lot more fun than the super-crappy way you install software on Windows. Instead of downloading an installer that tells you to shut down all applications before starting and then goes through a hundred screens of questions that you really don't care about (Where do you want the software installed? Do you want the software to report usage statistics?), in Ubuntu you just say, "I want Amarok," and there it is.

Sure, the whole system could use improvements. For instance, it would be more than just nice to add user ratings to the packages, so that you can see which one of the zillion alternatives is rated best. Also, it would be nice to know the size of the download before you get started. Finally, it would be great if there was a meta-server that lists major available repositories, and you'd just check the box next to the ones you'd like.

What I am going to be complaining about here, though, is more basic. I expect the download of a piece of software to give me a package that (a) doesn't break my system, (b) doesn't cause security problems, and (c) does something useful. While (a) and (b) have not been violated yet, I just got a package that clearly violates (c) big time.

2010-07-16

Rethinking Traditions

A confession: I don't like signing emails. I find it stupid. You know the message is from me, after all your email client tells you that before you open it. What's the point of salutation and signing? What does, Sincerely, Cinserely tell you that you didn't already know?

Turns out there was a good reason for the signing. That's from the days of snail mail and before there was such a thing as a typewriter: the presence of a signature was the only certain way to know who wrote you a letter. I remember the days when you'd get one, and you'd turn it over to read who sent it. If I had known I would feel old just for admitting I had ever read a hand-written letter, I would have believed everything science fiction told me.

But now I write emails, and I got used to doing a lot of things that you couldn't really do with paper. For instance, I reply inline - breaking up long messages and replying to a question right beneath it. Or I make creative use of the Subject: header. Of BCC: myself to have a record of sending the message.

More often than not, I won't sign an email. It's not that I forget, and it's not that I am too lazy. It's that I find that a truly pointless activity. I will typically close my message with a friendly greeting to the family or coworkers, or a wish for something fun, but rarely sign. It's a tradition we keep on, just because we don't think about it.