Re: [LACloud-Computing] Elastic Filesystems

From: Jordan Mendler
Sent on: Monday, January 25, 2010 3:10 PM
You may also want to look at Sun's SAM-QFS. It has been around for ages, and I believe was recently open-sourced. I haven't used it, but on paper it is supposed to handle distribution and replication, and like Lustre is natively mountable through a kernel module. It also has HSM functionality built it.

Jordan

On Mon, Jan 25, 2010 at 3:02 PM, Jordan Mendler <[address removed]> wrote:
Darren,

You will likely be disappointed in that nothing perfect exists. A lot of it depends on the application, as several half-@$$ components exist that could potentially be hacked together.
 
HDFS - Hadoop FS

Designed for huge files. Small files are supposed to be slow. Dedicated master with no hot-failover last time I looked. Fuse interface is supposed to be slow. Ask Allen -- he worked with it. Also, appears to be pretty robust, but most users do batch processing on it and not long-term archiving.

Lustre
 
Fastest I have used, but no real replication, so data nodes are a single point of failure. Ideal for fast scratch, but wouldn't trust my data to it long term, unless you do replication outside of Lustre. Has occasional crashes and issues, but pretty robust overall (in a relative sense).
 
Gluster
 
I evaluated it several times and had bad experiences with performance and reliability. Haven't tried it more recently than 6 months-1 year ago, so perhaps things changed. As of last evaluation was the most promising for a mountable/replicated filesystem, but was no where near prime time on large deployments.
 
MogileFS

Used it at TinyTube. Had lots of little issues, but ultimately worked as expected (though had to implement several work-arounds). Not mountable, though not sure if FUSE-extension exists. More of an API to do puts and gets, with a MySQL DB to track metadata and storage nodes that are replicated across. Nice in that files are stored on disks of replica nodes without any special encoding, so worst case scenario, you can copy out the FID to recover.
 
Terrastore
Dokan - FUSE for windows
XtreemFS

No clue. Any feedback?

Cassandra
Project Voldemort

They are supposed to be more key-value based stores than a traditional filesystem. Sort of like HBase or a redundant/persistent Memcache, from what I have heard. One of my co-worked recently looked at them and choice HBase (he was looking for more of a distributed DB), so if you are interested, I can set up a conference call.

Ceph

Last time I spoke to the developer (~1 year ago), he made it very clear that it was not recommended for production use. Not sure if this has changed.

You may also want to do some digging for the file store Caltech developed for LIGO. It looked interesting and if I recall, it was open-source with a similar architecture to MogileFS and written in Python.

Please share your notes as well. It's been about a year since I looked (and chose Lustre), so would like to find out where things are nowadays.

Jordan

Our Sponsors

We want Sponsors!

Support us and get exposure.

Other nearby
Meetups
Why these groups?
x

The Meetup Groups shown here are topically similar to LA Cloud Computing.

Groups are more likely to be displayed here if they:

  • have a Meetup scheduled
  • have a high rating
  • have a group photo
  • are "public" and not "private"
  • have shown they are likely to stick around (older than 30 days)

Log in

  • Not registered with us yet?
or

Log in to Meetup with your Facebook account.

Sign up

or

Join this Meetup Group even quicker with your Facebook account.

By clicking the "Sign up using Facebook" or "Sign up" buttons above, you agree to Meetup's Terms of Service