CopyFS is now even cooler

[UPDATE: There is a newer version of CopyFS available]

Some backstory: I used to write a LOT of code in the C programming language. I used to be actively involved in numerous open-source programs that used C. Hell, I even co-authored a book that was completely about C code, written in the format of the much-loved classic Lions’ Commentary on UNIX, and available in 6 languages. That was “back then”. I’ve gotten spoiled with Perl, Java and C++ over the last half-decade and haven’t written more than trivial patches in C in … 6 years. I still read C fluently… But not so much on the writing.

So going [almost] back to my programming roots, I took on a project that forced me to dive head-first back in C. And not just a wussy application… No sir. A filesystem. Some guys beta test backup software, and some guys use their laptop as a live debugger for filesystem enhancements.  Before I get into  the details of the software, let me say that I’ve  finally popped back into the world of seeing C solutions, and knowing exactly how to get what I want out of the code… I’m not at the level I’m at with Perl, where I can literally have a conversation with a person in the language itself, but my C-fu is, again, strong.
So enough about me. CopyFS is a FUSE filesystem that supports file versioning. v1.0 is a pure copy-on-write filesystem. Anytime you make a change, it makes a copy. If you change the metadata of a file, it makes a copy. You can list all of the versions of a file, and make any version the “current” version very easy. It really is a great tool.
I’ve been using CopyFS for a while now in various venues, the most active of which is my Eclipse development tree- Where nearly all of my Perl, Python, C++, Java… and now C code is developed. It allowed me to have my own little revision control system on my filesystem without the complicated mucking around with CVS/SVN/git/etc. repositories.
After using it for the quite-a-while I have been, the lack of certain features became a little painful. As CopyFS is open source software, I could just complain that the software doesn’t do what I want, write a snotty post to a mailing list and assert the entitlement that others seem to believe they have just because they downloaded a piece of software… Or I could enhance it. I could solve my own problems.

Problem 1: Text Diffs

By far, the largest problem I was having was determining WHICH of the 212 versions of a source file were the ones I wanted to revert too. Ok, I broke something in the latest version, where’s one that doesn’t have changes to that area? That was done with no core changes- just 90ish lines of additions and a couple changed lines in the fversion userspace application- All Perl.

Problem 2: Way too many versions

As I mentioned previously, any change to a file will trigger a copy. I have some files with hundreds of versions. For some files, this is great. For others… For example a pre-linked object file, this is unnecessary. While the sum of the code additions necessary to add this into CopyFS was a mere 200 lines, it took me thousands of lines of C over the past few weeks to get to that 200 lines. Using fversion, you can now tell the CopyFS daemon you want to purge the oldest N versions, or all versions, of a given file.

In progress…

  • Another piece to the problem 1 puzzle is the ability to search your versions for a string or a pattern. I’m workingish on that. I have it workingish in a pure-Perl solution, but I know it would be better to do it in C… I just love regular expressions in Perl, and know that if I wanted anything remotely that powerful in C, I would have to include PCRE which will bloat the project, something I’m not willing to do. It’s small and fast right now. I like it that way. This has been implemented… fversion -G pattern will now match whatever you put at it. 100% perl.
  • Another piece to problem 2 is the ability to purge individual versions, or version ranges. For example, I have a file that has 212 versions. I want to preserve v1.0 and v212.0. I want to purge 2-211. I can’t do that right now, so all 212 are there. Yes, I could lock v1.0 and then make a change to it, thus creating v213 which would essentially BE v1, and then purge 211 versions… But that’s not elegant, and I prefer to do things elegantly. I’m working on this now. It’s all C.

Braindump

  • Setting a special xattr that could mark a file as “don’t copy”… Better yet mark it with a number that is the number of copies you want to keep of this file… So if it was 1, you’d always have the current version plus 1. If it was 12… You get the point.
  • Along the same line, having a mount-time option that sets the maximum number of versions kept for all files on the volume. This would be useless for my uses, but useful if you only wanted to keep versions in order to restore a misdeleted file or something.
  • … same line, again. Maybe have a config file that lets you set certain file types differently: eg. all .o files (object files) never keep copies. All .pl, .cpp, .c, .java, etc keep all. Everything else keep 3.
  • Web interface to allow users who don’t have shell access an easy way to restore old versions (thinking of user-servicable backups here)
  • Still need to implement directory handling/recursion for the purge.
This entry was posted in Architecture, Coding, Linuxy, Work and tagged , , , . Bookmark the permalink.

2 Responses to CopyFS is now even cooler

  1. Pingback: CopyFS Update | Out Of The Clouds

  2. Hi,
    I have worked on encFS (Adding encryption layer to file system).
    I dont know from where to start with copyfs, but let me know if I can be helpful to contribute to this work.
    I have just tried copyfs and found that it can be very useful for software developer where source code automatically versioned in the backgorud. This can help to come back to your previous changes, if new changes are not working.

Leave a Reply

Your email address will not be published. Required fields are marked *