Package: | Rclone |
---|---|
Description: | command line tool for managing cloud storage |
For more information: | https://rclone.org/ |
Categories: | |
License: | OpenSource (MIT) |
Sometimes called the \"The Swiss army knife of cloud storage\", Rclone is a command line program to manage files on cloud storage. It is a feature rich alternative to cloud vendors' web storage interfaces. Over 40 cloud storage products support rclone including S3 object stores, business and consumer file storage services, as well as standard transfer protocols.
Rclone has powerful cloud equivalents to the unix commands rsync, cp, mv, mount, ls, ncdu, tree, rm, and cat. Its familiar syntax includes shell pipeline support, and --dry-run protection.
This module will add the rclone command to your path.
Before you can effectively start using rclone to copy files or synchronize
directories, you need to define one or more remotes which
define your storage backends. This is further explained below.
These remotes contain information about
what Cloud storage provider is being used as well as your authentication
and authorization information. This data is stored in your rclone configuration
file, normally ~/.config/rclone/rclone.conf
.
|
The
~/.config/rclone/rclone.conf file
will contain access credentials to your cloud storage. BE SURE
TO PROTECT THIS FILE. Rclone will protect it by default,
but be sure not to unprotect it. Anyone with read access
to this file can access your cloud storage as you. You might want to
also consider encrypting this config file with a master password.
|
When you define a remote, you give it a name of your own choosing,
which you use with a colon suffix to refer to files
and directories on the associated Cloud storage provider. You can then issue
various rclone subcommands to move data back and forth. E.g., if you defined
a remote 'gdrive' to access your personal Google Drive, you could use a command
like rclone copy ./MyFile.txt gdrive:CopyOfMyFile.txt
to copy
the local file 'MyFile.txt' to your Google drive as 'CopyOfMyFile.txt'.
This is elaborated in the usage section below.
This section lists the available versions of the package Rcloneon the different clusters.
Version | Module tags | CPU(s) optimized for | GPU ready? |
---|---|---|---|
1.59.1 | rclone/1.59.1 | zen2 | Y |
1.57.0 | rclone/1.57.0 | zen2 | Y |
Rclone has an interactive config
subcommand which allows
you to define, delete, copy, and edit "remotes". It also allows you to
password encrypt your rclone configuration file (which contains authorization
tokens to your storage backends) and which is something you might wish to consider
if any data on your Cloud providers should have additional security. This will
prevents anyone from using the information in the rclone config file from accessing
your data on the Cloud storage providers unless they know the password --- unfortunately
that also means that you need to enter the password with every rclone invocation.
Rclone uses "remotes" to define the various Cloud storage or other storage backends
you wish to use with the rclone command, as well as authentication and authorization
information and other settings. When you define a remote, you give it a name of your
choosing, which is how you refer to files using the provider. For example, if you
defined a remote 'gdrive' to attach to your personal Google drive, then
gdrive:SomeFile.txt
would refer to a file named 'SomeFile.txt' on your
Google drive. You can define multiple 'remotes' attached to the same Cloud storage
provider backend, which can sometimes be useful (e.g. to make distinct remotes for your
personal and for team Google drives). Because you will be using the name of the
remote on the command line a lot, it is advised to keep it somewhat short and to
avoid characters which are not convenient to use in the shell (i.e. use only letters,
numbers, and maybe hyphens/underscores).
All management of your remotes is done with the interactive rclone config
command. Rclone supports many different storage backends, and many options on all of
these backends, so the configuration of backends can be somewhat complicated. The rclone
config command will step you through it, asking questions, most of which have reasonable
defaults.
Because many backends primarily authenticate via the web, you are likely to
need a web browser to complete the authentication. The HPC cluster login nodes no
longer provide web browsers, so the procedures described below usually assume that
you are at your workstation with an ssh session from your workstation to the HPC
login node in one window, and a web browser (running on your workstation) in
another window. Some storage backends (e.g. Box) might require a third window on
your workstation running a local (workstation) command prompt, and might require
a reasonably current version of rclone
be
installed on your workstation. Binaries for
installing rclone on various platforms are available from the
main Rclone site.
The rclone web documentation site details all of the backends it supports as well as how to configure remotes for these backends. We will discuss two of the Cloud providers most used at UMD below, but see the rclone documentation for more detail.
Be sure to see the the UMD Box service catalog entry for more information on this service, including restrictions on what data can be stored on Box and how to set up your account if you have not already done so.
To access Box via rclone on a login node of one of the HPC clusters, you need to configure rclone first. This needs to be done once per cluster. You will need three windows on your workstation: one running your web broswer of choice, one at the command prompt on your workstation, and one in which you have ssh-ed to the login node of the appropriate HPC cluster.
You will also need to have a recent version of rclone installed on your workstation in order to complete the configuration on the HPC cluster. Binaries for installing rclone on various platforms are available from the main Rclone site.
Box App Client Id
. Just hit return to accept the default.Box App Client Secret
. Just hit return to accept the default.Box App config.json location
. Just hit return to accept the default.box_sub_type
, with options "user" and "enterprise". For UMD
users, you should be choosing "enterprise" (which is not the
default).Edit advanced config?
Unless you know what you are doing, I
would recommend 'n'.Use auto config?
: Type 'n' for No. (not the default).
Rclone will instruct you to run a command on your workstation and prompts
for the result
. Switch to your workstation windowrclone authorize box
.Grant access to Box
.
If that window does not appear in your browser, copy the URL printed in your
workstation command prompt (something like "htpp://127.0.0.1:...") into the
URL bar on your browser. You should get the Grant access to Box
button.Grant access to Box
buttonrclone authorize box
should
have printed out some text and finished. Copy the text between the Paste the
following into your remote machine --->
and <--End paste
lines
into the result
prompt on your login node rclone config prompt and
hit return.See also:
You can now use the rclone commands with this remote as discussed below.
Be sure to see the Google drive service catalog entry for more information on this service, including restrictions on what data can be stored there and how to set up your account if you have not done so already.
To access Google drive via rclone on a login node of one of the HPC clusters, you need to configure rclone first. This needs to be done once per cluster. You will need two windows on your workstation: one running your web broswer of choice, and one in which you have ssh-ed to the login node of the appropriate HPC cluster. You do need to have rclone installed on your workstation, preferably the same version as is running on the cluster..
The configuration of Google Drive remote for rclone proceeds as follows:
Google Application Client Id
. Just hit return for the
default.
Google Application Client Secret
. If you created
your own Google Application Client id and used it in the previous step,
cut and paste the Client secret from the Google API Console Credentials
page. If you entered the default blank string at the previous prompt,
just do so again.
Scope that rclone should use when requeting access from drive
.
It will provide different scopes for access. Common selections are:
Full access all files, "drive"
, meaning rclone
will have full access to all of your files on Google drive.
This is probably the best option for most people.
Access to files created by rclone only. "drive.file"
only allows rclone to access files that it placed on Google
drive. This will prevent rclone from accessing files put on
Google Drive by other means, which might be useful if you have
some more sensitive data on Google Drive.
Read-only access to file metadata and file contents,
"drive.readonly"
. This will allow you to list and download
files from Google drive, but not upload or modify files.
ID of the root folder
. It is recommended to leave this
blank (the default).
Service Account Credentials JSON file path
. It is
recommended to leave this blank (the default).
Edit advanced config
. Unless you know what you are doing, I
would recommend 'n'.Use auto config?
: Type 'n' for No. (not the default).
Rclone will then print out a command to run on your workstation, something
like rclone authorize "drive" "...(long random string)..."
.
@umd.edu
account), and authorize rclone for access.
The rclone command should then return a secret token, with instructions to
paste it into your remote machine. Copy this into your clipboard.
See also:
You can now use the rclone commands with this remote as discussed below.
Rclone commands generally follow the format:
rclone SUBCOMMAND SOURCE DEST
although a few commands omit the DEST. The SUBCOMMAND tells rclone what it is you want to do, and the SOURCE and DEST, if given, are what is acted upon. SOURCE and DEST are paths to files or directories, either local or in a Cloud or other storage provider. To reference files or directories on a Cloud or storage provider, the SOURCE or DEST specification should start with the name of one of the remotes you defined followed by a colon (':') and optionally any additional path components needd.
A complete description of all of the rclone subcommands, etc. can be found on the rclone documentation site. But below is a brief description of some of the more commonly used subcommands: (in examples below, we assume that 'gdrive' is a remote for Google drive, and 'box' on for Box)
rclone ls SOURCE
: this will list files in SOURCE.
E.g. rclone ls gdrive:MyFolder
will list files in MyFolder in Google drive.rclone lsd SOURCE
: this will list folders under SOURCE.
E.g. rclone ls gdrive:
will list all folders under the root folder on Google
driverclone copy SOURCE DEST
:
this will copy file SOURCE to DEST. E.g.
rclone copy box:MyFile.txt .
will copy MyFile.txt from Box to the current
directory on the local system.
rclone sync SOURCE DEST
:
will modify the directory DEST to make it identical with SOURCE. E.g.
rclone sync gdrive:ImportantStuff box:Copy
will add/delete/copy files in the Copy folder of Box as needed to synchronize it with
the ImportantStuff folder in Google drive.
Many more subcommands are available, see the man page or the rclone documentation site for more information.