SaltyCrane Blog — Notes on JavaScript and web development

A hack to copy files between two remote hosts using Python

I sometimes need to copy a file (such as a database dump) between two remote hosts on EC2. Normally this involves a few steps: scp'ing the ssh keyfile to Host 1, ssh'ing to Host 1, looking up the address for Host 2, then scp'ing the desired file from Host 1 to Host 2.

I was excited to read in the man page that scp can copy files between two remote hosts directly. However, it didn't work for me. Apparently, running scp host1:myfile host2: is like running ssh host1 scp myfile host2: so I still need the address of host2 and my ssh keyfile on host1.

My inablility to let go of this small efficiency increaser, led me to (what else?) write a Python script. I know this is a hack so if you know of a better way of doing this, let me know.

The script parses my ~/.ssh/config file to find the ssh keyfile and address for host 2, uses scp to copy the ssh keyfile to host 1, then runs the ssh host1 scp ... command with the appropriate options filled in. The script captures all of the ssh options for host 2 and passes them on the command line to scp via the -o command-line option. Note, I only tested this to set the User option– I don't know if all ssh options will work.

Warning: the script disables the StrictHostKeyChecking SSH option, so you are more vunerable to a man-in-the-middle attack.

Update 2010-02-16: I've found there is already a SSH config file parser in the paramiko library. The source can be viewed on github.

Update 2010-05-04: I modified my code to use the paramiko library and also allow command line options to be passed directly to the scp command. The latest code is available in my github repository remote-tools.

import itertools
import os
import re
import sys

SSH_CONFIG_FILE = '/home/saltycrane/.ssh/config'

def main():
    host1, path1 = sys.argv[1].split(':', 1)
    host2, path2 = sys.argv[2].split(':', 1)

    o = get_ssh_options(host2)
    keyfile_remote = '/tmp/%s' % os.path.basename(o['identityfile'])
    ssh_options = ' -o'.join(['='.join([k, v]) for k, v in o.iteritems()
                              if k != 'hostname' and k != 'identityfile'])

    run('scp %s %s:%s' % (o['identityfile'], host1, keyfile_remote))
    run('ssh %s scp -p -i %s -oStrictHostKeyChecking=no -o%s %s %s:%s' % (
            host1, keyfile_remote, ssh_options, path1, o['hostname'], path2))

def get_ssh_options(host):
    """Parse ~/.ssh/config file and return a dict of ssh options for host
    Note: dict keys are all lowercase
    """
    def remove_comment(line):
        return re.sub(r'#.*$', '', line)
    def get_value(line, key_arg):
        m = re.search(r'^\s*%s\s+(.+)\s*$' % key_arg, line, re.I)
        if m:
            return m.group(1)
        else:
            return ''
    def not_the_host(line):
        return get_value(line, 'Host') != host
    def not_a_host(line):
        return get_value(line, 'Host') == ''

    lines = [line.strip() for line in file(SSH_CONFIG_FILE)]
    comments_removed = [remove_comment(line) for line in lines]
    blanks_removed = [line for line in comments_removed if line]
    top_removed = list(itertools.dropwhile(not_the_host, blanks_removed))[1:]
    goodpart = itertools.takewhile(not_a_host, top_removed)
    return dict([line.lower().split(None, 1) for line in goodpart])

def run(cmd):
    print cmd
    os.system(cmd)

if __name__ == '__main__':
    main()

Here is an example ~/.ssh/config file:

Host testhost1
  User root
  Hostname 48.879.24.567
  IdentityFile /home/saltycrane/.ssh/test_keyfile

Host testhost2
  User root
  Hostname 56.384.58.212
  IdentityFile /home/saltycrane/.ssh/test_keyfile

Here is an example run. It copies /tmp/testfile from testhost1 to the same path on testhost2.

python scp_r2r.py testhost1:/tmp/testfile testhost2:/tmp/testfile

Here is the console output:

scp /home/saltycrane/.ssh/test_keyfile testhost1:/tmp/test_keyfile
test_keyfile                                              100% 1674     1.6KB/s   00:00
ssh testhost1 scp -p -i /tmp/test_keyfile -oStrictHostKeyChecking=no -ouser=root /tmp/testfile 56.384.58.212:/tmp/testfile

One inconvenience is that it doesn't show the progress for the main transfer. If anyone knows how I can fix this, please let me know.

Comments


#1 anonymous commented on :

Was this for a contest to see how many compound statements could be used while obfuscating a very simple idea inside as many iterators as humanly possible? At times I am very glad I do not think like an Engineer. :)

Neat concept though, I can see the need for this at times.


#2 gwyn commented on :

Hi Eliot, thanks for your post and especially the paramiko hint. I ran into the same problems and wrote an improved scp_r2r script. It uses SSH agent forwarding so the private key remains safe on the local computer. It also fixes some of the issues you mentioned:

  • StrictHostChecking is enabled and the user can type "yes" (interactively)
  • the progress bar is shown (using ssh's -t option and starting the process with stdout=sys.stdout)

http://gwynconnor.blogspot.com/2010/04/copying-files-between-two-remote-hosts.html

gwyn


#3 Eliot commented on :

gwyn: This is awesome! Thank you for figuring this out and posting it. You have made a really nice tool instead of my hack. I will definitely give this a try. I don't know if you think you'll make any updates, but if you do, it'd be great if it were on github or PyPI or something. (Again sorry about your comment getting marked as spam.)