Main Page

From SubCloud

Jump to: navigation, search

Welcome to the "SubCloud" wiki! This wiki is intended to serve users of SubCloud as a definitive reference guide and handbook.

Back to http://www.subcloud.com

Contents

Overview

SubCloud is a shared enterprise file system built on top of Amazon S3 cloud storage. SubCloud is the fastest easiest way to integrate your enterprise with Amazon S3.

Prerequisites

  • an Amazon S3 account!
  • a machine running a modern Linux distro
    • FUSE, at least 2.5
    • GLIBC, at least 2.3.4

Getting Started with Trial Edition

  1. Sign up w/Amazon S3
  2. Download SubCloud, unpack it, and put the binary in /usr/bin
  3. Create a file called "/etc/passwd-subcloud" with the following line:
accessKeyId:secretAccessKey:licenseKey:expire

e.g., (non working)

0PN5J17HBGZHT7JJ3X82:uV3F3YluFJax1cknvbcGwgjvx4QpvB+leU8dUj2o:1pq02ow93ie84ur75yt61pq02ow93ie84ur75yt6=:2005-12-31

The fields are:

  • AWS access key ID
  • AWS secret access key
  • SubCloud license key
  • SubCloud license key expire

Note the above is just an example. You'll need to substitute the fields for your AWS credentials and SubCloud license key.

(You can get a SubCloud trial license key (a) from http://www.subcloud.com/download or (b) by contacting us via http://www.subcloud.com/contact.)

Now do something like this:

% mkdir /s3
% subcloud mybucket /s3

That's it! The bucket mybucket is now available read/write in the folder /s3! (If the bucket does not exist then a US bucket will be created.)

Note- bucketName must conform to the "additional guidelines" here: http://docs.amazonwebservices.com/AmazonS3/2006-03-01/index.html?BucketRestrictions.html

Try mkdir or vi, or try copying some files to /s3!

Getting Started with Enterprise Edition

Enterprise Edition is the exact same binary as Trial Edition. To activate Enterprise Edition simply substitute a perpetual license key in "/etc/passwd-subcloud" and remove the expire, e.g.,

1P0PN5J17HBGZHT7JJ3X82:uV3F3YluFJax1cknvbcGwgjvx4QpvB+leU8dUj2o:6ty57ru48ei39wo20qp16ty57ru48ei39wo20qp1=

You'll need to restart the binary for the license key to take effect, e.g.,

% umount /s3
% mount mybucket /s3

Note- unlike SubCloud Trial Edition, SubCloud Enterprise Edition does *not* automatically create buckets. You'll need to manually create buckets with another Amazon S3 client program, e.g., jets3t.

Options

This section describes options specific to SubCloud, in addition to the standard FUSE options.

accessKeyId

Your Amazon AWS Access Key ID, e.g., 1PQ02OW93IE84UR75YT6.

secretAccessKey

Your Amazon AWS Secret Access Key.

connect_timeout

The connect timeout is the amount of time in seconds that SubCloud waits for a connection before giving up. Default=2Default=10. [1.0.5c]

See also readwrite_timeout.

default_acl

  • the default canned ACL to use for created/updated files
    • one of private|public-read|public-read-write|authenticated-read

encoding

Set to "gzip" to enable compression on writes, e.g., "-oencoding=gzip".

Note- SubCloud always decompresses gzip encoded files on reads regardless of the encoding value.

readwrite_timeout

The read/write timeout in seconds. Default=10Default=60. [1.0.5c]

For example,

% subcloud /s3 -oreadwrite_timeout=60

See also connect_timeout.

retries

The retry count. SubCloud exponentially retries S3 transactions on error conditions.

Default is 5 for a total of 6 attempts (i.e., 1 attempt and 5 retries). [1.0.5]

See also connect_timeout and readwrite_timeout.

url

(obsolete in 1.0.2, see "ssl" option)

ssl

Set to any value to use "https". Default is "" which means use "http".

See also "cainfo" option and/or "capath" option.

(since 1.0.2, obsoletes "url" option)

cainfo and capath

"Certificate authority" configuration for https... you would typically use one or the other... "cainfo" points to a certificate file, "capath" points to a certificate folder.

For example, to use https on Debian Etch:

% subcloud /s3 -ossl=1 -ocainfo=/etc/ssl/certs/ca-certificates.crt

use_cache

Set to a local folder to use for a local file cache, e.g., "-ouse_cache=/tmp".

SubCloud uses the local file cache to minimize downloads from S3. If local file cache is enabled, when SubCloud downloads a file, it also saves the file in the local file cache. SubCloud will serve up subsequent read requests from the local file cache if their MD5 checksums match.

write_behind

DO NOT USE THIS

  • set to any value to enable write behind cache

MIME types

SubCloud uses the file /etc/mime.types to determine the Content-Type when creating/updating files.

What else should I know?

  • renaming folders can take a while because SubCloud does a brute force rename for every key beneath the folder (deep folder rename)
  • gzip for reading is always "enabled", that is, SubCloud will always expand gzip encoded objects
  • gzip for writing is enabled iff -oencoding=gzip in which case SubCloud will gzip encode objects during create and/or update

Use Cases

Use cases have the following assumptions:

  • s3 buckets are mounted at /s3

Mysql Backup

This is a use case for doing a nightly full backup of a Mysql database.

Create a file in /etc/cron.daily, e.g., /etc/cron.daily/initech-mysql.cron:

#!/bin/sh
mkdir -p /s3/initech-backup/`date +%Y/%m/%d`
mysqldump --single-transaction -pmypassword mydatabase | gzip > /s3/initech-backup/`date +%Y/%m/%d`/mydatabase.sql.gz

Be sure you do a "chmod +x /etc/cron.daily/initech-mysql.cron".

That's it! A compressed full backup of your Mysql database is performed nightly around 4a or so. Note the use of --single-transaction. This makes a consistent snapshot of the database when using InnoDb (no effect when using Myisam).

CVSROOT Backup

This is a use case for doing a nightly full backup of a CVS repository. Assumptions:

  • CVS repository is "/usr/local/cvsroot"
  • S3 bucket is "initech-backup", mounted at /s3/initech-backup

Create the file /etc/cron.daily/initech-cvsroot.cron with the following contents:

#/bin/sh
mkdir -p /s3/initech-backup/`date +%Y/%m/%d`
tar -czf /s3/initech-backup/`date +%Y/%m/%d`/cvsroot.tar.gz -C /usr/local cvsroot

Be sure you do a "chmod +x /etc/cron.daily/initech-cvsroot.cron".

That's it! A full backup of your CVSROOT is performed nightly around 4a or so, organized in a tidy year/month/day folder hierarchy (makes it easy to manually purge old backup sets).

FAQ

General

Does the SubCloud binary "phone home"?

No. The SubCloud binary does not contact any SubCloud server for any reason whatsoever.

Can I stitch SubCloud into /etc/fstab?

Yes! This example (a) uses the bucket "mybucket" and (b) enables the FUSE option "allow_other":

subcloud#mybucket /mnt/mybucket fuse allow_other,accessKeyId=1PQ02OW93IE84UR75YT6 0 0

What do you mean by "store files natively and transparently"?

There are a number of strategies for implementing a file system atop Amazon S3, e.g., block device-based and file-based...

SubCloud is file-based therefore files are stored natively in Amazon S3. You can use other Amazon S3 clients to access the same files. Even if you stop using SubCloud you still have full access to the files in a native and transparent fashion.

What do you mean by "deep directory rename"?

Deep directory rename refers to the fact that when you rename a folder, SubCloud automatically does a brute force rename of the contents of the folder.

What Linux distributions has SubCloud run on?

  • Debian Etch
  • Fedora Core 4 [EC2]
  • Fedora Core 5
  • Fedora 7
  • Fedora 8 [EC2]
  • Fedora 8 x64 [EC2]
  • Fedora 9
  • Ubuntu 8.04
  • Ubuntu Server x64 8.04
  • Gentoo

I'm sure there are others: SubCloud requires FUSE (at least 2.5) and GLIBC (at least 2.3.4).

Can I mount the same bucket from more than one host at the same time?

  • Yes! read/write too!

Will SubCloud clobber existing x-amz-meta- custom headers?

  • No!

How do I change the Content-Type of files?

A: see "/etc/mime.types"

How do I export a SubCloud file system via NFS?

How do I run encfs atop SubCloud?

disk quota?

What is the difference between s3fs and SubCloud?

SubCloud adds:

  • compression
  • EU bucket support (1.02)
  • Content-MD5
  • deep directory rename
  • greater posix compliance
  • various optimizations
  • better error logging
  • binary distribution
  • support

Is SubCloud compatible with s3fs?

Yes! Buckets used by s3fs can be mounted by SubCloud and vice versa. (Exception: compressed files created by SubCloud will not be uncompressed by s3fs)

Why doesn't Enterprise Edition automatically create buckets?

For robustness purposes, e.g., SubCloud can be configured in /etc/fstab to have it startup successfully even if network connectivity is down.

What about permissions?

Use the default_permissions option.

Can I store files larger than 5GB?

Not automatically. Because SubCloud stores files natively and transparently in Amazon S3, it has the same 5GB file size limit. You can use, e.g., /usr/bin/split to split and join large files.

Can SubCloud use a http proxy?

Yes- use the "http_proxy" environment variable (must be lower case!). Example, if your http proxy is 10.20.30.40 port 80 then do this-

export http_proxy=10.20.30.40:80

And then (re-)start SubCloud. SubCloud reads the "http_proxy" environment variable at startup only.

Troubleshooting

How do I install FUSE on CentOS?

yum install fuse dkms dkms-fuse

How do I install FUSE on Debian Etch?

apt-get update
apt-get install libfuse2
apt-get install fuse-utils
mknod /dev/fuse -m 0666 c 10 229

How do I install FUSE on Fedora?

yum install fuse

How do I install FUSE on Gentoo?

emerge fuse

How do I install FUSE on Ubuntu?

apt-get install fuse

How do I install FUSE on Red Hat Enterprise Linux?

yum install fuse

How do I troubleshoot it?

  • tail -f /var/log/messages
  • tcpdump -s 1500 -A host s3.amazonaws.com
  • use the FUSE -f switch, e.g., /usr/bin/subcloud -f my_bucket /s3

Why do I get "input/output error"?

When I mount a bucket, only the current user can see it, other users cannot. How do I fix it?

  • use allow_other, e.g., /usr/bin/subcloud -o allow_other my_bucket /s3

I renamed a folder and it was taking a long time so I stopped it and now files are gone!

  • Your files are still there, they're just "orphaned"! "Deep" renames can take a while... Do the rename again and SubCloud will continue the "deep" rename.

Release Notes

1.0.7 (2008-11-??)

  • streaming

1.0.5e (2008-11-09)

  • re-seat content-type
  • fix mknod/mkdir/chown permissions
  • removed legacy single command line argument check
  • don't retry on 401/403

1.0.5d (2008-09-23)

  • fixed timeout/retry issue
    • symptom was a bunch of "Operation was aborted by an application callback(42)" in a row in /var/log/messages, followed by "giving up"

1.0.5c (2008-09-21)

  • connect_timeout=10; readwrite_timeout=60;
  • retry back off timer goes 1s, 2s, 4s, 8s, 16s, etc...
  • changed starting back off timer from 5 seconds to 1 second
  • fixed chown bug whereby ownership was not changed if there was no id=>name mapping

1.0.5b (2008-09-17)

  • 64bit build!
  • fixed subtle timeout/retry issue
  • fixed parallel readdir/stat cache issue

1.0.5a (2008-09-17)

  • symlink permissions=0777
  • cleanup /var/log/messages

1.0.3 (2008-09-16)

  • discard "bad" http connections
  • retry all responses 400 or greater
  • changed default retries from 2 to 3

1.0.2 (2008-09-15)

  • EU bucket support
    • replaced "url" option with "ssl" option
  • discard "bad" http connections

1.0.1 (2008-09-05)

  • timeout/retry: exponential backoff

1.0.0 (2008-09-01)

  • trial/enterprise: create bucket iff trial edition

0.9.5 (2008-08-27)

  • https
  • retry on 400 Bad Request (for BadDigest)

0.9.4 (2008-08-18)

  • fixed typo in local file cache

0.9.3

  • fixed FILE* leak
  • create bucket iff personal edition

0.9.2

  • subcloud_init: create bucket
  • changed default_acl to "", i.e., "private"

0.9.1 (2008-08-01)

  • init
Personal tools