Main Page
From SubCloud
Welcome to the "SubCloud" wiki! This wiki is intended to serve users of SubCloud as a definitive reference guide and handbook.
Back to http://www.subcloud.com
Overview
SubCloud is a shared enterprise file system built on top of your Amazon S3 cloud storage. SubCloud is the fastest easiest way to integrate your enterprise with Amazon S3.
What's new in 2.0
(not yet released)
- Faster
- aggressive internal caching
- Streaming
- e.g., use darwin streaming server
- Encryption
- using AES
Prerequisites
- an Amazon S3 account!
- a machine running a modern Linux distro
- FUSE, at least 2.5
- GLIBC, at least 2.3.4
Getting Started with Trial Edition
- Sign up w/Amazon S3
- Download SubCloud, unpack it, and put the binary in /usr/bin
- Create a file called "/etc/passwd-subcloud" with the following line:
accessKeyId:secretAccessKey:licenseKey:expire
e.g., (non working)
0PN5J17HBGZHT7JJ3X82:uV3F3YluFJax1cknvbcGwgjvx4QpvB+leU8dUj2o:1pq02ow93ie84ur75yt61pq02ow93ie84ur75yt6=:2005-12-31
The fields are:
- AWS access key ID
- AWS secret access key
- SubCloud license key
- SubCloud license key expire
Note the above is just an example. You'll need to substitute the fields for your AWS credentials and SubCloud license key.
(You can get a SubCloud trial license key (a) from http://www.subcloud.com/download or (b) by contacting us via http://www.subcloud.com/contact.)
Now do something like this:
% mkdir /s3 % subcloud mybucket /s3
That's it! The bucket mybucket is now available read/write in the folder /s3! (If the bucket does not exist then a US bucket will be created.)
Note- bucketName must conform to the "additional guidelines" here: http://docs.amazonwebservices.com/AmazonS3/2006-03-01/index.html?BucketRestrictions.html
Try mkdir or vi, or try copying some files to /s3!
Getting Started with Enterprise Edition
Enterprise Edition is the exact same binary as Trial Edition. To activate Enterprise Edition simply substitute a permanent license key in "/etc/passwd-subcloud", e.g.,
1P0PN5J17HBGZHT7JJ3X82:uV3F3YluFJax1cknvbcGwgjvx4QpvB+leU8dUj2o:6ty57ru48ei39wo20qp16ty57ru48ei39wo20qp1=
You'll need to restart the subcloud binary for the license key to take effect, e.g.,
% umount /s3 % subcloud mybucket /s3
Note- unlike SubCloud Trial Edition, SubCloud Enterprise Edition does *not* automatically create buckets. You'll need to manually create buckets with another Amazon S3 client program, e.g., jets3t.
Encryption
New in release 2.0 (not yet released).
Note- SubCloud can be configured to perform encryption/decryption transparently in the background. This section is useful only if you wish to encrypt/decrypt s3 objects without using the subcloud binary.
To decrypt files without using the subcloud binary:
1. download the file from amazon s3 with your 2nd favorite s3 tool
2. do this:
% openssl enc -aes-256-cbc -d -pass pass:mypassphrase -in /tmp/cipher/myfile.txt -out /tmp/plain/myfile.txt
Here, mypassphrase is the passphrase, the same passphrase specified when invoking subcloud.
Compression
Note- SubCloud can be configured to perform compression/uncompression transparently in the background. This section is useful only if you wish to compress/uncompress s3 objects without using the subcloud binary.
SubCloud can be configured to compress files uploaded to Amazon S3 by setting the "encoding" option, e.g. -oencoding=gzip
SubCloud sets Content-Encoding: gzip when uploading gzip compressed files to Amazon S3.
To decompress files without using the subcloud binary:
1. download the file from amazon s3 using your 2nd favorite s3 tool
2. do this:
% gzip -d /tmp/myfile.txt
Options
This section describes options specific to SubCloud, in addition to the standard FUSE options.
accessKeyId
Your Amazon AWS Access Key ID, e.g., 1PQ02OW93IE84UR75YT6.
secretAccessKey
Your Amazon AWS Secret Access Key.
connect_timeout
The connect timeout is the amount of time in seconds that SubCloud waits for a connection before giving up. Default=2Default=10. [1.0.5c]
See also readwrite_timeout.
default_acl
- the default canned ACL to use for created/updated files
- one of private|public-read|public-read-write|authenticated-read
encoding
Set to "gzip" to enable compression on writes, e.g., "-oencoding=gzip".
Note- SubCloud always decompresses gzip encoded files on reads regardless of the encoding value.
readwrite_timeout
The read/write timeout in seconds. Default=10Default=60. [1.0.5c]
For example,
% subcloud /s3 -oreadwrite_timeout=60
See also connect_timeout.
retries
The retry count. SubCloud exponentially retries S3 transactions on error conditions.
Default is 5 for a total of 6 attempts (i.e., 1 attempt and 5 retries). [1.0.5]
See also connect_timeout and readwrite_timeout.
url
- the url to use to access the Amazon S3 web service, default="http://s3.amazonaws.com"
- e.g., you can set it to "https://s3.amazonaws.com" to use https. Note- might need to set cainfo and/or capath
(obsolete in 1.0.2, see "ssl" option)
ssl
Set to any value to use "https". Default is "" which means use "http".
See also "cainfo" option and/or "capath" option.
(since 1.0.2, obsoletes "url" option)
cainfo and capath
"Certificate authority" configuration for https... you would typically use one or the other... "cainfo" points to a certificate file, "capath" points to a certificate folder.
For example, to use https on Debian Etch:
% subcloud mybucket /s3 -ossl=1 -ocainfo=/etc/ssl/certs/ca-certificates.crt
use_cache
Set to a local folder to use for a local file cache, e.g., "-ouse_cache=/tmp".
SubCloud uses the local file cache to minimize downloads from S3. If local file cache is enabled, when SubCloud downloads a file, it also saves the file in the local file cache. SubCloud will serve up subsequent read requests from the local file cache if their MD5 checksums match.
write_behind
DO NOT USE THIS
- set to any value to enable write behind cache
MIME types
SubCloud uses the file /etc/mime.types to determine the Content-Type when creating/updating files.
What else should I know?
- renaming folders can take a while because SubCloud does a brute force rename for every key beneath the folder (deep folder rename)
- gzip for reading is always "enabled", that is, SubCloud will always expand gzip encoded objects
- gzip for writing is enabled iff -oencoding=gzip in which case SubCloud will gzip encode objects during create and/or update
- if you are going to mount a bucket for other users, use the allow_other and default_permissions options (see http://apps.sourceforge.net/mediawiki/fuse/index.php?title=FAQ), e.g.,
% /usr/bin/subcloud mybucket /s3 -oallow_other -odefault_permissions
Use Cases
Use cases have the following assumptions:
- s3 buckets are mounted at /s3
Mysql Backup
This is a use case for doing a nightly full backup of a Mysql database.
Create a file in /etc/cron.daily, e.g., /etc/cron.daily/initech-mysql.cron:
#!/bin/sh mkdir -p /s3/initech-backup/`date +%Y/%m/%d` mysqldump --single-transaction -pmypassword mydatabase | gzip > /s3/initech-backup/`date +%Y/%m/%d`/mydatabase.sql.gz
Be sure you do a "chmod +x /etc/cron.daily/initech-mysql.cron".
That's it! A compressed full backup of your Mysql database is performed nightly around 4a or so. Note the use of --single-transaction. This makes a consistent snapshot of the database when using InnoDb (no effect when using Myisam).
CVSROOT Backup
This is a use case for doing a nightly full backup of a CVS repository. Assumptions:
- CVS repository is "/usr/local/cvsroot"
- S3 bucket is "initech-backup", mounted at /s3/initech-backup
Create the file /etc/cron.daily/initech-cvsroot.cron with the following contents:
#/bin/sh mkdir -p /s3/initech-backup/`date +%Y/%m/%d` tar -czf /s3/initech-backup/`date +%Y/%m/%d`/cvsroot.tar.gz -C /usr/local cvsroot
Be sure you do a "chmod +x /etc/cron.daily/initech-cvsroot.cron".
That's it! A full backup of your CVSROOT is performed nightly around 4a or so, organized in a tidy year/month/day folder hierarchy (makes it easy to manually purge old backup sets).
FAQ
General
Does the SubCloud binary "phone home"?
No. The SubCloud binary does not contact any SubCloud server for any reason whatsoever.
Can I stitch SubCloud into /etc/fstab?
Yes! This example (a) uses the bucket "mybucket" and (b) enables the FUSE option "allow_other":
subcloud#mybucket /mnt/mybucket fuse allow_other,accessKeyId=1PQ02OW93IE84UR75YT6 0 0
What do you mean by "store files natively and transparently"?
There are a number of strategies for implementing a file system atop Amazon S3, e.g., block device-based and file-based...
SubCloud is file-based therefore files are stored natively in Amazon S3. You can use other Amazon S3 clients to access the same files. Even if you stop using SubCloud you still have full access to the files in a native and transparent fashion.
What do you mean by "deep directory rename"?
Deep directory rename refers to the fact that when you rename a folder, SubCloud automatically does a brute force rename of the contents of the folder.
What Linux distributions has SubCloud run on?
- Debian Etch
- Fedora Core 4 [EC2]
- Fedora Core 5
- Fedora 7
- Fedora 8 [EC2]
- Fedora 8 x64 [EC2]
- Fedora 9
- Ubuntu 8.04
- Ubuntu Server x64 8.04
- Gentoo
I'm sure there are others: SubCloud requires FUSE (at least 2.5) and GLIBC (at least 2.3.4).
Can I mount the same bucket from more than one host at the same time?
- Yes! read/write too!
Will SubCloud clobber existing x-amz-meta- custom headers?
- No!
How do I change the Content-Type of files?
A: see "/etc/mime.types"
How do I export a SubCloud file system via NFS?
use fsid in /etc/exports, see http://article.gmane.org/gmane.comp.file-systems.fuse.devel/5310
How do I run encfs atop SubCloud?
disk quota?
What is the difference between s3fs and SubCloud?
SubCloud adds:
- compression
- EU bucket support (1.02)
- Content-MD5
- deep directory rename
- greater posix compliance
- various optimizations
- better error logging
- binary distribution
- support
Is SubCloud compatible with s3fs?
Yes! Buckets used by s3fs can be mounted by SubCloud and vice versa. (Exception: compressed files created by SubCloud will not be uncompressed by s3fs)
Why doesn't Enterprise Edition automatically create buckets?
For robustness purposes, e.g., SubCloud can be configured in /etc/fstab to have it startup successfully even if network connectivity is down.
What about permissions?
Use allow_other and default_permissions options, see http://apps.sourceforge.net/mediawiki/fuse/index.php?title=FAQ.
Can I store files larger than 5GB?
Not automatically. Because SubCloud stores files natively and transparently in Amazon S3, it has the same 5GB file size limit. You can use, e.g., /usr/bin/split to split and join large files.
Can SubCloud use a http proxy?
Yes- use the "http_proxy" environment variable (must be lower case!). Example, if your http proxy is 10.20.30.40 port 80 then do this-
export http_proxy=10.20.30.40:80
And then (re-)start SubCloud. SubCloud reads the "http_proxy" environment variable at startup only.
Troubleshooting
How do I install FUSE on CentOS?
Any or all of the following:
- install rpmforge (https://rpmrepo.org/RPMforge/Using)
- yum install fuse dkms dkms-fuse
How do I install FUSE on Debian Etch?
apt-get update apt-get install libfuse2 apt-get install fuse-utils mknod /dev/fuse -m 0666 c 10 229
How do I install FUSE on Fedora?
yum install fuse
How do I install FUSE on Gentoo?
emerge fuse
How do I install FUSE on Ubuntu?
apt-get install fuse
How do I install FUSE on Red Hat Enterprise Linux?
yum install fuse
How do I troubleshoot it?
- tail -f /var/log/messages
- tcpdump -s 1500 -A host s3.amazonaws.com
- use the FUSE -f switch, e.g., /usr/bin/subcloud -f my_bucket /s3
Why do I get "input/output error"?
- Does the bucket exist?
- Are the credentials correct?
- Is the local time set within 15 minutes of Amazon's time? (i.e., RequestTimeTooSkewed) (check /var/log/messages)
- Is the license key expired? if so you'll get error code 999
- Does the bucket name conform to the "additional guidelines"? http://docs.amazonwebservices.com/AmazonS3/2006-03-01/index.html?BucketRestrictions.html
When I mount a bucket, only the current user can see it, other users cannot. How do I fix it?
- use allow_other, e.g., /usr/bin/subcloud -o allow_other my_bucket /s3
I renamed a folder and it was taking a long time so I stopped it and now files are gone!
- Your files are still there, they're just "orphaned"! "Deep" renames can take a while... Do the rename again and SubCloud will continue the "deep" rename.
Release Notes
2.0.0 (????-??-??)
- native encryption
- streaming
- aggressive attr/stat cache
1.0.5e (2008-11-09)
- re-seat content-type
- fix mknod/mkdir/chown permissions
- removed legacy single command line argument check
- don't retry on 401/403
1.0.5d (2008-09-23)
- fixed timeout/retry issue
- symptom was a bunch of "Operation was aborted by an application callback(42)" in a row in /var/log/messages, followed by "giving up"
1.0.5c (2008-09-21)
- connect_timeout=10; readwrite_timeout=60;
- retry back off timer goes 1s, 2s, 4s, 8s, 16s, etc...
- changed starting back off timer from 5 seconds to 1 second
- fixed chown bug whereby ownership was not changed if there was no id=>name mapping
1.0.5b (2008-09-17)
- 64bit build!
- fixed subtle timeout/retry issue
- fixed parallel readdir/stat cache issue
1.0.5a (2008-09-17)
- symlink permissions=0777
- cleanup /var/log/messages
1.0.3 (2008-09-16)
- discard "bad" http connections
- retry all responses 400 or greater
- changed default retries from 2 to 3
1.0.2 (2008-09-15)
- EU bucket support
- replaced "url" option with "ssl" option
- discard "bad" http connections
1.0.1 (2008-09-05)
- timeout/retry: exponential backoff
1.0.0 (2008-09-01)
- trial/enterprise: create bucket iff trial edition
0.9.5 (2008-08-27)
- https
- retry on 400 Bad Request (for BadDigest)
0.9.4 (2008-08-18)
- fixed typo in local file cache
0.9.3
- fixed FILE* leak
- create bucket iff personal edition
0.9.2
- subcloud_init: create bucket
- changed default_acl to "", i.e., "private"
0.9.1 (2008-08-01)
- init
