Rclone: command line tool for managing cloud storage

Contents

  1. Overview of package
  2. Overview of package
    1. General usage
  3. Availability of package by cluster
  4. Configuring rclone for a storage backend
    1. Configuring rclone for Box
    2. Configuring rclone for Google Drive
  5. Using rclone

Overview of package

General information about package
Package: Rclone
Description: command line tool for managing cloud storage
For more information: https://rclone.org/
Categories:
License: OpenSource (MIT)

General usage information

Sometimes called the \"The Swiss army knife of cloud storage\", Rclone is a command line program to manage files on cloud storage. It is a feature rich alternative to cloud vendors' web storage interfaces. Over 40 cloud storage products support rclone including S3 object stores, business and consumer file storage services, as well as standard transfer protocols.

Rclone has powerful cloud equivalents to the unix commands rsync, cp, mv, mount, ls, ncdu, tree, rm, and cat. Its familiar syntax includes shell pipeline support, and --dry-run protection.

This module will add the rclone command to your path.

Before you can effectively start using rclone to copy files or synchronize directories, you need to define one or more remotes which define your storage backends. This is further explained below. These remotes contain information about what Cloud storage provider is being used as well as your authentication and authorization information. This data is stored in your rclone configuration file, normally ~/.config/rclone/rclone.conf.

WARNING
The ~/.config/rclone/rclone.conf file will contain access credentials to your cloud storage. BE SURE TO PROTECT THIS FILE. Rclone will protect it by default, but be sure not to unprotect it. Anyone with read access to this file can access your cloud storage as you. You might want to also consider encrypting this config file with a master password.

When you define a remote, you give it a name of your own choosing, which you use with a colon suffix to refer to files and directories on the associated Cloud storage provider. You can then issue various rclone subcommands to move data back and forth. E.g., if you defined a remote 'gdrive' to access your personal Google Drive, you could use a command like rclone copy ./MyFile.txt gdrive:CopyOfMyFile.txt to copy the local file 'MyFile.txt' to your Google drive as 'CopyOfMyFile.txt'. This is elaborated in the usage section below.

Available versions of the package Rclone, by cluster

This section lists the available versions of the package Rcloneon the different clusters.

Available versions of Rclone on the Deepthought2 cluster (RHEL8)

Available versions of Rclone on the Deepthought2 cluster (RHEL8)
Version Module tags CPU(s) optimized for GPU ready?
1.51.0 rclone/1.51.0 ivybridge Y

Available versions of Rclone on the Juggernaut cluster

Available versions of Rclone on the Juggernaut cluster
Version Module tags CPU(s) optimized for GPU ready?
1.51.0 rclone/1.51.0 x86_64 Y

Available versions of Rclone on the Deepthought2 cluster (RHEL6) [DEPRECATED]

Available versions of Rclone on the Deepthought2 cluster (RHEL6) [DEPRECATED]
Version Module tags CPU(s) optimized for GPU ready?
1.52.2 rclone/1.52.2 x86_64 N
1.47.0 rclone/1.47.0 x86_64 N
1.43.1 rclone/1.43.1 x86_64 N

Configuring rclone for a storage backend

Rclone has an interactive config subcommand which allows you to define, delete, copy, and edit "remotes". It also allows you to password encrypt your rclone configuration file (which contains authorization tokens to your storage backends) and which is something you might wish to consider if any data on your Cloud providers should have additional security. This will prevents anyone from using the information in the rclone config file from accessing your data on the Cloud storage providers unless they know the password --- unfortunately that also means that you need to enter the password with every rclone invocation.

Rclone uses "remotes" to define the various Cloud storage or other storage backends you wish to use with the rclone command, as well as authentication and authorization information and other settings. When you define a remote, you give it a name of your choosing, which is how you refer to files using the provider. For example, if you defined a remote 'gdrive' to attach to your personal Google drive, then gdrive:SomeFile.txt would refer to a file named 'SomeFile.txt' on your Google drive. You can define multiple 'remotes' attached to the same Cloud storage provider backend, which can sometimes be useful (e.g. to make distinct remotes for your personal and for team Google drives). Because you will be using the name of the remote on the command line a lot, it is advised to keep it somewhat short and to avoid characters which are not convenient to use in the shell (i.e. use only letters, numbers, and maybe hyphens/underscores).

All management of your remotes is done with the interactive rclone config command. Rclone supports many different storage backends, and many options on all of these backends, so the configuration of backends can be somewhat complicated. The rclone config command will step you through it, asking questions, most of which have reasonable defaults.

Because many backends primarily authenticate via the web, you are likely to need a web browser to complete the authentication. The HPC cluster login nodes no longer provide web browsers, so the procedures described below usually assume that you are at your workstation with an ssh session from your workstation to the HPC login node in one window, and a web browser (running on your workstation) in another window. Some storage backends (e.g. Box) might require a third window on your workstation running a local (workstation) command prompt, and might require a reasonably current version of rclone be installed on your workstation. Binaries for installing rclone on various platforms are available from the main Rclone site.

The rclone web documentation site details all of the backends it supports as well as how to configure remotes for these backends. We will discuss two of the Cloud providers most used at UMD below, but see the rclone documentation for more detail.

Configuring rclone for Box

Be sure to see the the UMD Box service catalog entry for more information on this service, including restrictions on what data can be stored on Box and how to set up your account if you have not already done so.

To access Box via rclone on a login node of one of the HPC clusters, you need to configure rclone first. This needs to be done once per cluster. You will need three windows on your workstation: one running your web broswer of choice, one at the command prompt on your workstation, and one in which you have ssh-ed to the login node of the appropriate HPC cluster.

You will also need to have a recent version of rclone installed on your workstation in order to complete the configuration on the HPC cluster. Binaries for installing rclone on various platforms are available from the main Rclone site.

  1. On the login node, issue the command "module load rclone"
  2. [login node], issue the command "rclone config". This will list any remotes you currently have configured, and leave you at an interactive rclone config prompt.
  3. [login node, rclone config prompt] Type 'n' to create a new remote
  4. [login node, rclone config prompt] Enter the name you wish to use for this remote. This will prefix all paths for the remote system, so you will probably want something somewhat short and shell friendly, e.g. 'box' or 'umdbox'
  5. [login node, rclone config prompt] Select the type of backend you wish to configure. The available options are in alphabetical order, but the numbering changes between rclone versions. Since we are configuring for the Box service, type "box"
  6. [login node, rclone config prompt] The system will then prompt for Box App Client Id. Just hit return to accept the default.
  7. [login node, rclone config prompt] The system will then prompt for Box App Client Secret. Just hit return to accept the default.
  8. [login node, rclone config prompt] The system will then prompt for Box App config.json location. Just hit return to accept the default.
  9. [login node, rclone config prompt] The system will then prompt for box_sub_type, with options "user" and "enterprise". For UMD users, you should be choosing "enterprise" (which is not the default).
  10. [login node, rclone config prompt] The system will then prompt for Edit advanced config? Unless you know what you are doing, I would recommend 'n'.
  11. [login node, rclone config prompt] The system will then prompt for Use auto config?: Type 'n' for No. (not the default). Rclone will instruct you to run a command on your workstation and prompts for the result. Switch to your workstation window
  12. On your workstation command prompt, run the command rclone authorize box.
  13. The browser show a new window with a button Grant access to Box. If that window does not appear in your browser, copy the URL printed in your workstation command prompt (something like "htpp://127.0.0.1:...") into the URL bar on your browser. You should get the Grant access to Box button.
  14. [workstation, browser]: Click on the Grant access to Box button
  15. [workstation, command prompt]: The rclone authorize box should have printed out some text and finished. Copy the text between the Paste the following into your remote machine ---> and <--End paste lines into the result prompt on your login node rclone config prompt and hit return.
  16. [login node, rclone config prompt]: Rclone should parse the string you gave and print back some basic information (e.g. type is Box, box_sub_type is enterprise, etc), and ask for confirmation. Type y at the prompt.
  17. [login node, rclone config prompt]: Rclone should now list your configured remotes again, and your box entry (whatever you named it in step 4) should be appearing. You can type q to quit.
Configuration is now complete.

See also:

You can now use the rclone commands with this remote as discussed below.

Configuring rclone for Google Drive

Be sure to see the Google drive service catalog entry for more information on this service, including restrictions on what data can be stored there and how to set up your account if you have not done so already.

To access Google drive via rclone on a login node of one of the HPC clusters, you need to configure rclone first. This needs to be done once per cluster. You will need two windows on your workstation: one running your web broswer of choice, and one in which you have ssh-ed to the login node of the appropriate HPC cluster. You do not need to have rclone installed on your workstation.

Before starting the rclone configuration, it is advised that you create your own Google Drive client ID for rclone. If you do not, rclone can use its own internal client_id. But that is shared by everyone else using rclone who did not set up their own client ID. This client ID is used by Google when rate limitting the access to Google Drive, which means by using the Rclone's internal client ID you may be rate limitted by what other users are doing. So while this step is optional, it is advised that you create your own client ID for Rclone for best performance, as described below:

  1. From the web browser on your workstation, go to the Google API Console and log in.
  2. On the blue menu bar near the top, choose Select a project, and either select an existing project (if you have any), or click the NEW PROJECT button in the upper right of the popup. If creating a new project, you need to give it a name (e.g. your_username-rclone) and a parent organization or folder (e.g. Self-Service Projects). Then hit the blue CREATE button at the bottom left.
  3. From the project dashboard, select the Dashboard option under APIs & Services in the menu on the left.
  4. Near the top, there should be an option to ENABLE APIS AND SERVICES. Click that link.
  5. In the search field, enter Drive, and click on the Google Drive API option. This should open up a Google Drive API page, click on the blue ENABLE button.
  6. You should now be on a page with a CREATE CREDENTIALS button near the upper right, click on that button.
  7. The page should have a drop down labelled Select an API; select Google Drive API. It will then ask What date will you be accessing?; you probably want to check the User data. Then click the NEXT button.
  8. You will now be in a section labelled OAuth Consent Screen. Fill out the required fields:
    1. App name: rclone is fine
    2. User support email: Enter your email address
    3. App logo: leave blank
    4. Developer contact information: Enter your email address
    Then click the SAVE AND CONTINUE button.
  9. You will now be in a section labelled Scopes (optional). Just click the SAVE AND CONTINUE button.
  10. You will now be in a section labelled OAuth Client ID. In the drop-down labelled Application Type, enter Desktop app. You can leave default Name or change it to something meaningful like rclone. Click the blue CREATE button.
  11. You will now be in the Your Credentials section. It will display a Client ID, basically a string of digits followed by a hyphen and a string of alphanumerics followed by .apps.googleusercontent.com. There should also be a link to your credentials page. Follow that link.
  12. You should now be on the Credentials page on your Google Drive API page. This should list all credentials you have created. Under the OAuth 2.0 Client IDs you should see the rclone credential you just created. Click on the pencil icon to the right of it in order to edit it. This should display the client id and client secret. You will need these to provide to the rclone config command in the process below. I recommend keeping that browser tab open, or at least copying the data somewhere. NOTE: the secret code should be carefully guarded. If you copy it, make sure you do so securely.

We now proceed with the configuration of the Google Drive remote for rclone.

    1. On the login node, issue the command "module load rclone"
    2. [login node], issue the command "rclone config". This will list any remotes you currently have configured, and leave you at an interactive rclone config prompt.
    3. [login node, rclone config prompt] Type 'n' to create a new remote
    4. [login node, rclone config prompt] Enter the name you wish to use for this remote. This will prefix all paths for the remote system, so you will probably want something somewhat short and shell friendly, e.g. 'gdrive' or 'google'
    5. [login node, rclone config prompt] Select the type of backend you wish to configure. The available options are in alphabetical order, but the numbering changes between rclone versions. Since we are configuring for the Google Drive service, type "drive"
    6. [login node, rclone config prompt] The system will then prompt for Google Application Client Id. If you created your own Google Application Client id as discussed above, now is the time to enter the Client ID created. This should be a string of digits followed by a hyphen and a string of alphanumerics followed by .apps.googleusercontent.com. You should cut and paste it from the Google API Console Credentials page. If you do not want to bother with creating your own Client ID, you can just hit return for the default, but this will be much lower performance.
    7. [login node, rclone config prompt] The system will then prompt for Google Application Client Secret. If you created your own Google Application Client id and used it in the previous step, cut and paste the Client secret from the Google API Console Credentials page. If you entered the default blank string at the previous prompt, just do so again.
    8. [login node, rclone config prompt] The system will then prompt for Scope that rclone should use when requeting access from drive. It will provide different scopes for access. Common selections are:
      • Full access all files, "drive", meaning rclone will have full access to all of your files on Google drive. This is probably the best option for most people.
      • Access to files created by rclone only. "drive.file" only allows rclone to access files that it placed on Google drive. This will prevent rclone from accessing files put on Google Drive by other means, which might be useful if you have some more sensitive data on Google Drive.
      • Read-only access to file metadata and file contents, "drive.readonly". This will allow you to list and download files from Google drive, but not upload or modify files.
    9. [login node, rclone config prompt] The system will then prompt for ID of the root folder. It is recommended to leave this blank (the default).
    10. [login node, rclone config prompt] The system will then prompt for Service Account Credentials JSON file path. It is recommended to leave this blank (the default).
    11. [login node, rclone config prompt] The system will then prompt for Edit advanced config. Unless you know what you are doing, I would recommend 'n'.
    12. [login node, rclone config prompt] The system will then prompt for Use auto config?: Type 'n' for No. (not the default). Rclone will then print out a long URL. Copy and paste that into the web browser running on your workstation.
    13. On your workstation web browser, go to the URL listed by rclone on the login node. Authenticate as your @umd.edu account.
    14. [workstation, web browser]: You will get a Google page stating that rclone wants to access your Google account (the application name rclone might vary). Click on the Allow button.
    15. [workstation, web browser]: The Google page will then print an access code (a long string of characters). Copy this and then paste it into the Enter verification code rclone prompt on the login node.
    16. [login node, rclone config prompt] Paste the access code from Google into the Enter verification code prompt and hit return.
    17. [login node, rclone config prompt] Rclone will then prompt as to whether you wish to configure this as a team drive. Answer appropriately (if unsure, type No (the default)). If you configure it as a shared drive, you will be given a list of shared drived and prompted to select which one to use.
    18. [login node, rclone config prompt] Rclone will then print out a summary of the configuration for this new remote (include type=drive, the client id you selected, the scope you selected, and a long string for token), and ask for confirmation. Enter y for Yes.
    19. [login node, rclone config prompt]: Rclone should now list your configured remotes again, and your gdrive entry (whatever you named it in step 4) should be appearing. You can type q to quit.

    See also:

    You can now use the rclone commands with this remote as discussed below.

    Using rclone

    Rclone commands generally follow the format:

    rclone SUBCOMMAND SOURCE DEST

    although a few commands omit the DEST. The SUBCOMMAND tells rclone what it is you want to do, and the SOURCE and DEST, if given, are what is acted upon. SOURCE and DEST are paths to files or directories, either local or in a Cloud or other storage provider. To reference files or directories on a Cloud or storage provider, the SOURCE or DEST specification should start with the name of one of the remotes you defined followed by a colon (':') and optionally any additional path components needd.

    A complete description of all of the rclone subcommands, etc. can be found on the rclone documentation site. But below is a brief description of some of the more commonly used subcommands: (in examples below, we assume that 'gdrive' is a remote for Google drive, and 'box' on for Box)

    • rclone ls SOURCE: this will list files in SOURCE. E.g. rclone ls gdrive:MyFolder will list files in MyFolder in Google drive.
    • rclone lsd SOURCE: this will list folders under SOURCE. E.g. rclone ls gdrive: will list all folders under the root folder on Google drive
    • .
    • rclone copy SOURCE DEST: this will copy file SOURCE to DEST. E.g. rclone copy box:MyFile.txt . will copy MyFile.txt from Box to the current directory on the local system.
    • rclone sync SOURCE DEST: will modify the directory DEST to make it identical with SOURCE. E.g. rclone sync gdrive:ImportantStuff box:Copy will add/delete/copy files in the Copy folder of Box as needed to synchronize it with the ImportantStuff folder in Google drive.

    Many more subcommands are available, see the man page or the rclone documentation site for more information.





  •