Synchronizing files to Cloud Files with Fileconveyor
Many file synchronization applications exist, but few are as versatile as the open-source File Conveyor. The File Conveyor source code is fully documented, and the tool is easy to install. In a matter of minutes, you can have the project synchronizing the local files on your server to a destination like Rackspace Cloud Files.
Using File Conveyor to synchronize files to the content delivery network (CDN) enables you to use ecommerce solutions like Magento or CMS applications like Drupal or WordPress with Cloud Files without relying on a plug-in to handle the file transfers.
This article documents File Conveyor version 0.3. You can run File Conveyor version 0.3 on Linux or Mac OS X operating systems. Microsoft Windows is not supported for this version.
Your system must have Python 2.5 or later installed.
Installation of File Conveyor requires Git and the Python package manager pip. If these are not already installed, use the following instructions:
You can download Git from the project’s website. Most Linux distributions also have Git in their main package repository, under the package name
If you don’t have pip installed, the easiest way to get it is to install the Python setuptools package. You can download the installer from its website. As an alternative you can use a Linux package manager to install setuptools. On most distributions the package name is
After you have installed setuptools, you can install pip by running the following command:
sudo easy_install pip
Install File Conveyor
Change to the directory that you want to hold the File Conveyor files, and then run the following command:
sudo pip install -e git+https://github.com/wimleers/fileconveyor@master#egg=fileconveyor
The File Conveyor source files are downloaded to the
src/fileconveyor directory, relative to where you run the
pip command. For example, if you run
pip in the
/usr/local directory, the File Conveyor script directory is in
Running the installation with sudo (or as root) lets pip handle the installation of dependencies like Django and python-cloudfiles.
Configure File Conveyor
Before running File Conveyor, you need to configure it by creating a file named
config.xml in the same directory as the
If you are starting in the directory that you were in when you started the installation, you can run the following command:
sudo nano src/fileconveyor/fileconveyor/config.xml
For a simple configuration that will synchronize the contents of a directory with a Cloud Files container, paste the following text into the
<?xml version="1.0" encoding="UTF-8"?> <config> <!-- Sources --> <sources ignoredDirs=""> <source name="test" scanPath="/var/www/html/test" /> </sources> <!-- Servers --> <servers> <server name="Rackspace Cloud Files" transporter="cloudfiles"> <username>USERNAME</username> <api_key>APIKEY</api_key> <container>CONTAINER</container> </server> </servers> <!-- Rules --> <rules> <rule for="test" label="Test"> <destinations> <destination server="Rackspace Cloud Files" path="test" /> </destinations> </rule> </rules> </config>
Note: “USERNAME” in the example above is your Rackspace Cloud account username.
Modify the configuration to fit your environment and account details:
<sources>section, change the
scanPathproperty to the directory that you want to synchronize.
api_keyto match your credentials, and set
containerto the name of the container that will hold the synchronized files.
<rules>section set the
pathproperty to the subdirectory to synchronize to in the container. Leave the value blank to synchronize to the root of the container (path=””).
You can perform more complex synchronizations by using multiple rules, synchronizing from multiple sources, or having File Conveyor change the file name or some of a file’s properties before copying it to Cloud Files (using “processors”). You can find details in the File Conveyor documentation and on the project’s website.
Run File Conveyor
With the configuration set, run File Conveyor for its initial synchronization. The
arbitrator.py script launches File Conveyor’s various components:
sudo python src/fileconveyor/fileconveyor/arbitrator.py
The File Conveyor program runs as a console script, without an included init script or means of forking the process to run as a daemon. For testing purposes, you can run the script directly from a command line. For persistent use, you should either set up an init script or run the program from a screen session, as in the following example:
screen python src/fileconveyor/fileconveyor/arbitrator.py
After the initial synchronization is complete, you should be able to see the results in the target container within the Cloud Control Panel.
The sample configuration provided in this article is simple, and you can do much more with File Conveyor to customize its operation for your needs. See the documentation in the source directory and the project web page for full details, but following are a few more options:
verify.pyto check the source directory against the Cloud Files container to confirm that the files synchronized correctly.
This article instructs you to install and run via sudo, but File Conveyor doesn’t require root privileges to run. You can also change the owner (
chown) of the
fileconveyordirectory and its contents to an unprivileged user.
The application runs off a Django back end source to connect to the various servers. You can safely disregard any DeprecationWarning entries in the log.
The less files per minute you chose to update with fileconveyor, the more efficiently fileconveyor runs. Try to stay below updating 100 files per minute for the best results.
You can edit values in the
settings.pyfile to make the locations of the SQLite databases, the
pidfile, and other system files more permanent. Following is an example:
RESTART_AFTER_UNHANDLED_EXCEPTION = True RESTART_INTERVAL = 10 LOG_FILE = '/var/log/fileconveyor.log' PID_FILE = '/var/run/fileconveyor/fileconveyor.pid' PERSISTENT_DATA_DB = '/etc/fileconveyor/persistent_data.db' SYNCED_FILES_DB = '/etc/fileconveyor/synced_files.db' WORKING_DIR = '/tmp/fileconveyor' MAX_FILES_IN_PIPELINE = 50 MAX_SIMULTANEOUS_PROCESSORCHAINS = 1 MAX_SIMULTANEOUS_TRANSPORTERS = 10 MAX_TRANSPORTER_QUEUE_SIZE = 1 QUEUE_PROCESS_BATCH_SIZE = 20 CALLBACKS_CONSOLE_OUTPUT = False CONSOLE_LOGGER_LEVEL = logging.INFO FILE_LOGGER_LEVEL = logging.DEBUG RETRY_INTERVAL = 30
Continue the conversation in the Rackspace Community.
©2016 Rackspace US, Inc.
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License