What I’ve learned: setting up Bash/Ubuntu/Win10 for Ansible + Vagrant + VirtualBox

My Goal: test the use of this Ansible Role from Windows 10, using a combination of Windows and Bash for Ubuntu on Windows 10 tools.  Favour the *nix tools wherever possible, for maximum compatibility with the all-Linux production environment.

Preconditions

Here is the software/shell arrangement that worked for me in my Win10 box:

  • Runs in Windows: Virtualbox, Vagrant
  • Runs in Bash/Ubuntu: Ansible (in part because of this)

In this setup, I’m using a single Virtualbox VM in default network configuration, whereby Vagrant ends up reporting the host listening on 127.0.0.1 and SSH listening on TCP port 2222.  Substitute your actual values as required.

Also note the versions of software I’m currently running:

  • Windows 10: Anniversary Update, build 14393.51
  • Ansible (*nix version in Bash/Ubuntu/Win10): 1.5.4
  • VirtualBox (Windows): 5.0.26
  • Vagrant (Windows): 1.8.1

Run the Windows tools from a Windows shell

  • C:\> vagrant up
  • (or launch a Bash shell with cbwin support:  C:\>outbash, then try running /mnt/c/…/Vagrant.exe up from the bash environment)

Start the Virtualbox VMs using Vagrant

  • Vagrant (Bash) can’t just do vagrant up where VirtualBox is installed in Windows – it depends on being able to call the VBoxManage binary
    • Q: can I trick Bash to call VBoxManage.exe from /mnt/c/Program Files/Oracle/VirtualBox?
    • If not, is it worth messing around with Vagrant (Bash)?  Or should I relent and try Vagrant (Windows), either using cbwin or just running from a different shell?
  • Vagrant (Windows) runs into the fscking rsync problem (as always)
    • Fortunately you can disable rsync if you don’t need the sync’d folders
    • Disabling the synced_folder requires editing the Vagrantfile to add this in the Vagrant.configure section:
      config.vm.synced_folder “.”, “/vagrant”, disabled: true

Setup the inventory for management

  • Find the IP’s for all managed boxes
  • Organize them (in one group or several) in the /etc/ansible/hosts file
  • Remember to specify the SSH port if non-22:
    [test-web]
    127.0.0.1 ansible_ssh_port=2222
    # 127.0.0.1 ansible_port=2222 when Ansible version > 1.9
    • While “ansible_port” is said to be the supported parameter as of Ansible 2.0, my own experience with Ansible under Bash on Windows was that ansible wouldn’t connect properly to the server until I changed the inventory configuration to use “ansible_ssh_port”, even though ansible –version reported itself as 2.1.1.0
    • Side question: is there some way to predictably force the same SSH port every time for the same box?  That way I can setup an inventory in my Bash environment and keep it stable.

Getting SSH keys on the VMs

  • (Optional: generate keys if not already) Run ssh-keygen -t rsa
  • (Optional: if you’ve destroyed and re-generated the VM with vagrant destroy/up, wipe out the existing key for the host:port combination by running the following command that is recommended when ssh-copy-id fails): ssh-keygen -f “/home/mike/.ssh/known_hosts” -R [127.0.0.1]:2222
  • Run ssh-copy-id vagrant@127.0.0.1 -p 2222 to push the public key to the target VM’s vagrant account

Connect to the VMs using Ansible to test connectivity

  • [from Windows] vagrant ssh-config will tell you the IP address and port of your current VM
  • [from Bash] ansible all -u vagrant -m ping will check basic Ansible connectivity
    • (ansible all -c local -m ping will go even more basic, testing Ansible itself)

Run the playbook

  • Run ansible-playbook [playbook_name.yml e.g. playbook.yml] -u vagrant
    • If you receive an error like “SSH encountered an unknown error” with details that include “No more authentication methods to try.  Permission denied (publickey,password).”, make sure to remember to specify the correct remote user (i.e. one that trusts your SSH key)
    • If you receive an error like “stderr: E: Could not open lock file /var/lib/dpkg/lock – open (13: Permission denied)”, make sure your remote user runs with root privilege – e.g. in the [playbook.yml], ensure sudo: true is included
  • Issue: if you receive an error like “fatal: [127.0.0.1]: UNREACHABLE! => {“changed”: false, “msg”: “Failed to connect to the host via ssh.”, “unreachable”: true}”, check that your SSH keys are trusted by the remote user you’re using (e.g. “-u vagrant” may not have the SSH keys already trusted)
  • If you wish to target a subset of servers in your inventory (e.g. using one or more groups), add the “-l” parameter and name the inventory group, IP address or hostname you wish to target
    e.g. ansible-playbook playbook.yml -u vagrant -l test-web
    or ansible-playbook playbook.yml -u vagrant -l 127.0.0.1

Protip: remote_user

If you want to stop having to add -u vagrant to all the fun ansible commands, then go to your /etc/ansible/ansible.cfg file and add remote_user = vagrant in the appropriate location.

Rabbit Hole Details for the Pedantically-Inclined

03ec8fe3bb146924423af6381eb99ea9

Great Related Lesson: know the difference between vagrant commands

  • Run vagrant ssh to connect to the VM [note: requires an SSH app installed in Windows, under this setup]
  • Run vagrant status to check what state the VM is in
  • Run vagrant reload to restart the VM
  • Run vagrant halt to stop the VM
  • Run vagrant destroy to wipe the VM

Ansible’s RSA issue when SSH’ing into a non-configured remote user

  • The following issue occurs when running ansible commands to a remote SSH target
    e.g. ansible all -m ping
  • This occurs even when the following commands succeed:
    • ansible -c local all -m ping
    • ssh vagrant@host.name [port #]
    • ssh-copy-id -p [port #] vagrant@host.name
  • Also note: prefixing with “sudo” doesn’t seem to help – just switches whose local keys you’re using
  • I spent the better part of a few hours (spaced over two days, due to rage quit) troubleshooting this situation
  • Troubleshooting this is challenging to say the least, as ansible doesn’t intelligently hint at the source of the problem, even though this must be a well-known issue
    • There’s nothing in the debug output of ssh/(openssl?) that indicates that there are no trusted SSH keys in the account of the currently-used remote user
    • Nor is it clear which remote user is being impersonated – sure, I’ll bet someone that fights with SSH & OpenSSL all day would have noticed the subtle hints, but for those of us just trying to get a job done, it’s like looking through foggy glass
  • Solution: remember to configure the remote user under which you’re connecting (i.e. a user with the correct permissions *and* who trusts the SSH keys in use)
    • Solution A: add the -u vagrant parameter
    • Solution B: specify remote_user = vagrant in the ansible.cfg file under [defaults]
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ansible-playbook role.yml -vvvv

PLAY [all] ********************************************************************

GATHERING FACTS ***************************************************************
<127.0.0.1> ESTABLISH CONNECTION FOR USER: mike
<127.0.0.1> REMOTE_MODULE setup
<127.0.0.1> EXEC ['ssh', '-C', '-tt', '-vvv', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/home/mike/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'Port=2222', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'ConnectTimeout=10', '127.0.0.1', "/bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1471378875.79-237810336673832 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1471378875.79-237810336673832 && echo $HOME/.ansible/tmp/ansible-tmp-1471378875.79-237810336673832'"]
fatal: [127.0.0.1] => SSH encountered an unknown error. The output was:
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: auto-mux: Trying existing master
debug1: Control socket "/home/mike/.ansible/cp/ansible-ssh-127.0.0.1-2222-mike" does not exist
debug2: ssh_connect: needpriv 0
debug1: Connecting to 127.0.0.1 [127.0.0.1] port 2222.
debug2: fd 3 setting O_NONBLOCK
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug3: timeout: 10000 ms remain after connect
debug3: Incorrect RSA1 identifier
debug3: Could not load "/home/mike/.ssh/id_rsa" as a RSA1 public key
debug1: identity file /home/mike/.ssh/id_rsa type 1
debug1: identity file /home/mike/.ssh/id_rsa-cert type -1
debug1: identity file /home/mike/.ssh/id_dsa type -1
debug1: identity file /home/mike/.ssh/id_dsa-cert type -1
debug1: identity file /home/mike/.ssh/id_ecdsa type -1
debug1: identity file /home/mike/.ssh/id_ecdsa-cert type -1
debug1: identity file /home/mike/.ssh/id_ed25519 type -1
debug1: identity file /home/mike/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.7p1 Debian-5+deb8u1
debug1: match: OpenSSH_6.7p1 Debian-5+deb8u1 pat OpenSSH* compat 0x04000000
debug2: fd 3 setting O_NONBLOCK
debug3: put_host_port: [127.0.0.1]:2222
debug3: load_hostkeys: loading entries for host "[127.0.0.1]:2222" from file "/home/mike/.ssh/known_hosts"
debug3: load_hostkeys: found key type ECDSA in file /home/mike/.ssh/known_hosts:2
debug3: load_hostkeys: loaded 1 keys
debug3: order_hostkeyalgs: prefer hostkeyalgs: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug2: kex_parse_kexinit: curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: kex_parse_kexinit: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-dss-cert-v01@openssh.com,ssh-rsa-cert-v00@openssh.com,ssh-dss-cert-v00@openssh.com,ssh-ed25519,ssh-rsa,ssh-dss
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se
debug2: kex_parse_kexinit: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: zlib@openssh.com,zlib,none
debug2: kex_parse_kexinit: zlib@openssh.com,zlib,none
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: kex_parse_kexinit: curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1
debug2: kex_parse_kexinit: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ssh-ed25519
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com
debug2: kex_parse_kexinit: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: kex_parse_kexinit: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: kex_parse_kexinit: none,zlib@openssh.com
debug2: kex_parse_kexinit: none,zlib@openssh.com
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: mac_setup: setup hmac-sha1-etm@openssh.com
debug1: kex: server->client aes128-ctr hmac-sha1-etm@openssh.com zlib@openssh.com
debug2: mac_setup: setup hmac-sha1-etm@openssh.com
debug1: kex: client->server aes128-ctr hmac-sha1-etm@openssh.com zlib@openssh.com
debug1: sending SSH2_MSG_KEX_ECDH_INIT
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ECDSA 07:f3:2f:b0:86:b5:b6:2b:d9:f5:26:71:95:6e:d9:ce
debug3: put_host_port: [127.0.0.1]:2222
debug3: put_host_port: [127.0.0.1]:2222
debug3: load_hostkeys: loading entries for host "[127.0.0.1]:2222" from file "/home/mike/.ssh/known_hosts"
debug3: load_hostkeys: found key type ECDSA in file /home/mike/.ssh/known_hosts:2
debug3: load_hostkeys: loaded 1 keys
debug1: Host '[127.0.0.1]:2222' is known and matches the ECDSA host key.
debug1: Found key in /home/mike/.ssh/known_hosts:2
debug1: ssh_ecdsa_verify: signature correct
debug2: kex_derive_keys
debug2: set_newkeys: mode 1
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug2: set_newkeys: mode 0
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug2: key: /home/mike/.ssh/id_rsa (0x7fffbdbd5b80),
debug2: key: /home/mike/.ssh/id_dsa ((nil)),
debug2: key: /home/mike/.ssh/id_ecdsa ((nil)),
debug2: key: /home/mike/.ssh/id_ed25519 ((nil)),
debug1: Authentications that can continue: publickey,password
debug3: start over, passed a different list publickey,password
debug3: preferred gssapi-with-mic,gssapi-keyex,hostbased,publickey
debug3: authmethod_lookup publickey
debug3: remaining preferred: ,gssapi-keyex,hostbased,publickey
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/mike/.ssh/id_rsa
debug3: send_pubkey_test
debug2: we sent a publickey packet, wait for reply
debug1: Authentications that can continue: publickey,password
debug1: Trying private key: /home/mike/.ssh/id_dsa
debug3: no such identity: /home/mike/.ssh/id_dsa: No such file or directory
debug1: Trying private key: /home/mike/.ssh/id_ecdsa
debug3: no such identity: /home/mike/.ssh/id_ecdsa: No such file or directory
debug1: Trying private key: /home/mike/.ssh/id_ed25519
debug3: no such identity: /home/mike/.ssh/id_ed25519: No such file or directory
debug2: we did not send a packet, disable method
debug1: No more authentication methods to try.
Permission denied (publickey,password).

TASK: [ansible-role-unattended-upgrades | add distribution-specific variables] ***
FATAL: no hosts matched or all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/mike/role.retry

127.0.0.1                  : ok=0    changed=0    unreachable=1    failed=0

Ansible’s permissions issue when trying to run non-trivial commands without sudo

  • ansible -m ping will work fine without local root permissions, making you think that you might be able to do other ansible operations without sudo
  • Haha! You would be wrong, foolish apprentice
  • Thus, the SSH keys for enabling ansible to work will have to be (a) generated for the local root user and (b) copied to the remote vagrant user
mike@MIKE-WIN10-SSD:~/code/ansible-role-unattended-upgrades$ ansible-playbook -u vagrant role.yml

PLAY [all] ********************************************************************

GATHERING FACTS ***************************************************************
ok: [127.0.0.1]

TASK: [ansible-role-unattended-upgrades | add distribution-specific variables] ***
ok: [127.0.0.1]

TASK: [ansible-role-unattended-upgrades | install unattended-upgrades] ********
failed: [127.0.0.1] => {"failed": true, "item": ""}
stderr: E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?

msg: 'apt-get install 'unattended-upgrades' ' failed: E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?


FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/mike/role.retry

127.0.0.1                  : ok=2    changed=0    unreachable=0    failed=1

 

Articles I reviewed while doing the work outlined here

https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys–2

http://blog.publysher.nl/2013/07/infra-as-repo-using-vagrant-and-salt.html

https://github.com/devopsgroup-io/vagrant-digitalocean

https://github.com/mitchellh/vagrant/issues/4073

http://stackoverflow.com/questions/23337312/how-do-i-use-rsync-shared-folders-in-vagrant-on-windows

https://github.com/mitchellh/vagrant/issues/3230

https://www.vagrantup.com/docs/synced-folders/basic_usage.html

http://docs.ansible.com/ansible/intro_inventory.html

http://stackoverflow.com/questions/36932952/ansible-unable-to-connect-to-aws-ec2-instance

http://serverfault.com/questions/649659/ansible-try-to-ping-connection-between-localhost-and-remote-server

http://stackoverflow.com/questions/22232509/vagrant-provision-works-but-i-cannot-send-an-ad-hoc-command-with-ansible

http://stackoverflow.com/questions/21670747/what-user-will-ansible-run-my-commands-as#21680256

Windows pre-WSL: fighting with vagrant, virtualbox and Cygwin

These are notes I kept for myself while struggling to find a way to use Vagrant and Virtualbox (among other tools like git, ansible and ssh) to test and debug virtual server builds without having to dual-boot into a Linux box.
Note to DavidP: you’ll see throughout the rsync errors that made this whole thing not *quite* a functional vagrant environment.
Obligatory: “this was of course a foolish mission that many loudly warned me against”.
Obligatory: “I thought I was smarter than most, and could probably decipher the grotty issues” arrogance.
Futile: “I hope these notes help someone else somewhere down the line with a similar issue”.
Important: my work was liberally stealing great ideas from Stefan van Essen:

Basic Setup

Details of what I did to make sure I can recreate my environment later:
  • Installed Cygwin64 with tools like bash, curl, gcc-core, make, nano, openssh, openssl, python, wget (and keep uninstalling python-cryptography because it interferes with my almost-working “ansible-under-cygwin” setup)
  • Installed apt-cyg to make cygwin package management easier
  • Installed VirtualBox for Windows with default paths (leaving the VMs in %username%/VirtualBox VMs)
  • Installed Vagrant for Windows with no changes
  • Created a symlink in the cygwin home directory to the VMs for ease of access (ln -s “/cygdrive/c/Users/Mike/VirtualBox VMs” ~/VMs)

First sign of trouble, selection of build target

  • Problem: “vagrant up” runs into a couple of problems consistently, which might have something to do with me instantiating the vagrant box under Windows (where there wasn’t an available rsync program to enable Vagrant to do whatever it needs when first “up”ing a new Vagrant box)
    • There seemed to be an SSH issue: 
      ==> default: Waiting for machine to boot. This may take a few minutes...
         default: SSH address: 127.0.0.1:2222
         default: SSH username: vagrant
         default: SSH auth method: private key
         default: Warning: Remote connection disconnect. Retrying...
         default: Warning: Remote connection disconnect. Retrying...
      ==> default: Machine booted and ready!
    • Then appeared a “hangover” rsync issue:
      ==> default: Rsyncing folder: /cygdrive/c/Users/Mike/VirtualBox VMs/BaseDebian/ => /vagrant
      There was an error when attempting to rsync a synced folder.
      Please inspect the error message below for more info.
      Host path: /cygdrive/c/Users/Mike/VirtualBox VMs/BaseDebian/
      Guest path: /vagrant
      Command: rsync --verbose --archive --delete -z --copy-links --chmod=ugo=rwX --no-perms --no-owner --no-group --rsync-path sudo rsync -e ssh -p 2222 -o ControlMaster=auto -o ControlPath=C:/cygwin64/tmp/ssh.570 -o ControlPersist=10m -o StrictHostKeyChecking=no -o IdentitiesOnly=true -o UserKnownHostsFile=/dev/null -i 'C:/Users/Mike/VirtualBox VMs/BaseDebian/.vagrant/machines/default/virtualbox/private_key' --exclude .vagrant/ /cygdrive/c/Users/Mike/VirtualBox VMs/BaseDebian/ vagrant@127.0.0.1:/vagrant
      Error: Warning: Permanently added '[127.0.0.1]:2222' (ECDSA) to the list of known hosts.
      mm_receive_fd: no message header
      process_mux_new_session: failed to receive fd 0 from slave
      mux_client_request_session: read from master failed: Connection reset by peer
      Failed to connect to new control master
      rsync: connection unexpectedly closed (0 bytes received so far) [sender]
      rsync error: error in rsync protocol data stream (code 12) at io.c(226) [sender=3.1.2]
    • I figured rather than treat this box as a pet, I’d just destroy it and try over again [good practice for breaking my old habits of trying to root cause catastrophic, but sometimes irreproducible, issues]
    • But no – these issues reappear for a freshly-init’d instance of the same vagrant box (debian/jessie64)
    • However, the next vagrant box I tried (hashicorp/precise64) (where I’ll put my dev tools) seemed close enough to fine, no red errors:
      ==> default: Machine booted and ready!
      ==> default: Checking for guest additions in VM...
         default: The guest additions on this VM do not match the installed version of
         default: VirtualBox! In most cases this is fine, but in rare cases it can
         default: prevent things such as shared folders from working properly. If you see
         default: shared folder errors, please make sure the guest additions within the
         default: virtual machine match the version of VirtualBox you have installed on
         default: your host and reload your VM.
         default:
         default: Guest Additions Version: 4.2.0
         default: VirtualBox Version: 5.0
      ==> default: Mounting shared folders...
         default: /vagrant => C:/Users/Mike/VirtualBox VMs/UbuntuDev
    • I also tried with (box-cutter/ubuntu1404-desktop) – worked great!
      ==> default: Machine booted and ready!
      ==> default: Checking for guest additions in VM...
      ==> default: Mounting shared folders...
         default: /vagrant => C:/Users/Mike/VirtualBox VMs/Ubuntu-box-cutter
    • Conclusion: using the box-cutter/ubuntu1404 box for my dev work

Configuration woes

  • Installed git (with suggested additions)
  • Installed ansible
  • Generated ssh keys, ssh-copy-id copied to mike@192.168.1.14 (debian1sttry), then ran “cp ~/.ssh/authorized_keys /root/.ssh” as root
  • Next I wanted to share a single Code folder between Windows and the Ubuntu VM:
    • 1st idea: create a symlink in the shared “vagrant” folder (C:\Users\Mike\VirtualBox VMs\Ubuntu-box-cutter) that would be addressable from inside the Ubuntu VM:
      [from Cygwin on host] ln -s /cygdrive/c/Users/Mike/Code Code
      [from Ubuntu VM] 
      vagrant@vagrant:~$ ls /vagrant/Code
      /vagrant/Code
      vagrant@vagrant:~$ cd /vagrant/Code
      bash: cd: /vagrant/Code: Not a directory
    • 2nd idea: try a native-Windows symlink, not a Cygwin version
      [from CMD on host] 
      C:\Users\Mike\VirtualBox VMs\Ubuntu-box-cutter>mklink /D Code C:\Users\Mike\Code
      symbolic link created for Code <<===>> C:\Users\Mike\Code
      [from Ubuntu VM]
      vagrant@vagrant:/vagrant$ cd /vagrant/Code
      bash: cd: /vagrant/Code: Not a directory
    • 3rd idea: create a new synced folder via vagrant
  • Meanwhile, I started out a new repo (https://github.com/MikeTheCanuck/jQuery-infra-update) to contain the suggested Ansible code (using the Git Bash shell from Git for Windows):
    • mkdir code
      cd code
      touch 50unattended-upgrades
      touch 02periodic
      git add 50unattended-upgrades
      git add 02periodic
    • [then tried git commit but got prompted for identity]
    • git config --global user.email "mikethecanuck@gmail.com"
      git config --global user.name "Mike Lonergan"
      git commit -m "First commit"
      git push origin master
    • [fun: Windows Credential manager prompted for my Github creds]
  • Now to follow the canonical git-flow http://nvie.com/posts/a-successful-git-branching-model/:
    • git checkout -b develop
      git checkout -b default-file-contents develop
      git commit -m "populated default content" code/02periodic code/50unattended-upgrades
      git push
    • [I didn’t think it worked – I didn’t see any changes on the upstream repo – so I tried over]
    • git reset
      git reset --hard
      git commit -m "Added default config file contents" -a
      git push
    • [which resulted in the warning “fatal: The current branch default-file-contents has no upstream branch.  To push the current branch and set the remote as upstream, use…(command)”, so I did]
    • git push --set-upstream origin default-file-contents
  • Result: I have two commits to the branch on my Git repo

Ansible testing environment setup

  • 1st: setup my inventory in /etc/ansible/hosts
  • 2nd: test connectivity “ansible dev -m ping -u root” [cause my target doesn’t have a vagrant account – yeah yeah, I’ll get to building a target using vagrant soon]
  • 3rd: move the files over for testing:
     ansible dev -u root -m copy -a "backup=yes src=~/code/jQuery-infra-update/code/50unattended-upgrades dest=/etc/apt/apt.conf.d/"
     ansible dev -u root -m copy -a "backup=yes src=~/code/jQuery-infra-update/code/02periodic dest=/etc/apt/apt.conf.d/"

Testing vagrant on Debian clones

  • I installed a base Debian box and ignored this error during “vagrant up” that I hoped wouldn’t be a problem:
    ==> default: Checking for guest additions in VM...
       default: No guest additions were detected on the base box for this VM! Guest
       default: additions are required for forwarded ports, shared folders, host only
       default: networking, and more. If SSH fails on this machine, please install
       default: the guest additions and repackage the box to continue.
       default:
       default: This is not an error message; everything may continue to work properly,
       default: in which case you may ignore this message.
    ==> default: Rsyncing folder: /cygdrive/c/Users/Mike/VirtualBox VMs/BaseDebian/ => /vagrant
    There was an error when attempting to rsync a synced folder
    Please inspect the error message below for more info.
    Host path: /cygdrive/c/Users/Mike/VirtualBox VMs/BaseDebian/
    Guest path: /vagrant
    Command: rsync --verbose --archive --delete -z --copy-links --chmod=ugo=rwX --no-perms --no-owner --no-group --rsync-path sudo rsync -e ssh -p 2222 -o ControlMaster=auto -o ControlPath=C:/cygwin64/tmp/ssh.275 -o ControlPersist=10m -o StrictHostKeyChecking=no -o IdentitiesOnly=true -o UserKnownHostsFile=/dev/null -i 'C:/Users/Mike/VirtualBox VMs/BaseDebian/.vagrant/machines/default/virtualbox/private_key' --exclude .vagrant/ /cygdrive/c/Users/Mike/VirtualBox VMs/BaseDebian/ vagrant@127.0.0.1:/vagrant
    Error: Warning: Permanently added '[127.0.0.1]:2222' (ECDSA) to the list of known hosts.
    mm_receive_fd: no message header
    process_mux_new_session: failed to receive fd 0 from slave
    mm_send_fd: sendmsg(2): Broken pipe
    mux_client_request_session: send fds failed
    rsync: connection unexpectedly closed (0 bytes received so far) [sender]
    rsync error: error in rsync protocol data stream (code 12) at io.c(226) [sender=3.1.2]
  • Now I’m working from the BaseDebian box I’d setup, and tried to clone a couple of boxes from it, but found that they complained that the VBox Additions weren’t installed and wouldn’t allow me to proceed:
    ==> default: Checking for guest additions in VM...
        default: No guest additions were detected on the base box for this VM! Guest
        default: additions are required for forwarded ports, shared folders, host only
        default: networking, and more. If SSH fails on this machine, please install
        default: the guest additions and repackage the box to continue.
        default:
        default: This is not an error message; everything may continue to work properly,
        default: in which case you may ignore this message.
    ==> default: Mounting shared folders...
        default: /vagrant => C:/Users/Mike/VirtualBox VMs/DebianTest1
    Failed to mount folders in Linux guest. This is usually because
    the "vboxsf" file system is not available. Please verify that
    the guest additions are properly installed in the guest and
    can work properly. The command attempted was:
    
    mount -t vboxsf -o uid=`id -u vagrant`,gid=`getent group vagrant | cut -d: -f3` vagrant /vagrant
    mount -t vboxsf -o uid=`id -u vagrant`,gid=`id -g vagrant` vagrant /vagrant
    
    The error output from the last command was:
    
    stdin: is not a tty
    mount: unknown filesystem type 'vboxsf'
  • OK, so I followed this article and have pulled together these commands to try to get the Additions installed on the base box (here, I destroyed the clones and am working to repair the box from which I cloned)
     sudo apt-get install gcc
     sudo apt-get install make

    (reboot the box)

     sudo mount /dev/cdrom /media/cdrom
     cd /media/cdrom
     sudo ./VBoxLinuxAdditions.run
  • Sadly, this procedure did *not* repair the rsync issue that I was seeing originally before packaging from that box, but I tried vagrant package and clone from this box again, in case what I did happens to have solve the “unknown filesystem type ‘vboxsf'” error
  • Nope, same problem on the cloned box – “unknown filesystem type ‘vboxsf'”
    • Just in case, I tried installing the additions in the cloned box
    • They won’t install – complains about the following:
      Building the VirtualBox Guest Additions kernel modules
      The headers for the current running kernel were not found. If the following
      module compilation fails then this could be the reason.
      
      Building the main Guest Additions module ...fail!
      (Look at /var/log/vboxadd-install.log to find out what went wrong)
      Doing non-kernel setup of the Guest Additions ...done.
    • And that log file says:
      /tmp/vbox.0/Makefile.include.header:97: *** Error: unable to find the sources of your current Linux kernel. Specify KERN_DIR=<directory> and run Make again.  Stop.
      Creating user for the Guest Additions.
      Creating udev rule for the Guest Additions kernel module.
    • So I tried this article next, but that didn’t entirely help – the headers still aren’t available
    • However, instead of erroring out, the VBox Guest Additions installation said this at the same place:
      Building the VirtualBox Guest Additions kernel modules
      The headers for the current running kernel were not found. If the following
      module compilation fails then this could be the reason.
      
      Building the main Guest Additions module ...done.
      Building the shared folder support module ...done.
      Building the graphics driver module ...done.
      Doing non-kernel setup of the Guest Additions ...done.
      Starting the VirtualBox Guest AdditionsInstalling the Window System drivers
      Could not find the X.Org or XFree86 Window System, skipping.
      ...done.
    • HOLY SHIT GUYS, it actually worked:
       default: Mounting shared folders...
         default: /vagrant => C:/Users/Mike/VirtualBox VMs/DebianTest1
    • So I rebuilt the base Debian box with this procedure – sadly it didn’t address the rsync issue (that was a valiant wish), but at least I should be able to clone this box and avoid the ‘vboxsf’ issue
    • To address the rsync issue, I tried a variant on this article’s top answer – placed the cwrsync folder on my C:\ root, then copied the script to an rsync.bat file

Issues I never solved

  • Q: what’s the source of the actual problem with the rsync on my BaseDebian box?
    • TODO: Follow this article for some ideas on troubleshooting/narrowing down the problem
  • Q: what’s the source of the problem that causes my “vagrant up” to fail twice on the SSH auth?

Other references

Hashicorp Vault + Ansible + CD: open source infra, option 2

“How can we publish our server configuration scripts as open source code without exposing our secrets to the world?”

In my first take on this problem, I fell down the rabbit hole of Ansible’s Vault technology – a single-password-driven encryption implementation that encrypts whole files and demands they be decrypted by interactive input or static filesystem input at runtime. Not a bad first try, but feels a little brittle (to changes in the devops team, to accidental inclusion in your git commits, or to division-of-labour concerns).

There’s another technology actively being developed for the devops world, by the Hashicorp project, also (confusingly/inevitably) called Vault. [I’ll call it HVault from here on, to distinguish from Ansible Vault >> AVault.]

HVault is a technology that (at least from a cursory review of the intro) promises to solve the brittle problems above. It’s an API-driven lockbox and runtime-proxy for all manner of secrets, making it possible to store and retrieve static secrets, provision secrets to some roles/users and not others, and create limited-time-use credentials for applications that have been integrated with HVault.

Implementation Options

So for our team’s purposes, we only need to worry about static secrets so far. There’s two possible ways I can see us trying to integrate this:

  1. retrieve the secrets (SSH passphrases, SSL private keys, passwords) directly and one-by-one from HVault, or
  2. retrieve just an AVault password that then unlocks all the other secrets embedded in our Ansible YAML files (using reinteractive’s pseudo-leaf indirection scheme).

(1) has the advantage of requiring one fewer technologies, which is a tempting decision factor – but it comes at the expense of creating a dependency/entanglement between HVault and our Ansible code (in naming and managing the key-value pairs for each secret) and of having to find/use a runtime solution to injecting each secret into the appropriate file(s).

(2) simplifies the problem of injecting secrets at runtime to a single secret (i.e. AVault can accept a script to insert the AVault password) and enables us to use a known quantity (AVault) for managing secrets in the Ansible YAMLs, but also means that (a) those editing the “secret-storing YAMLs” will still have to have access to a copy of the AVault password, (b) we face the future burden to plan for breaking changes introduced by both AVault and HVault, and (c) all secrets will be dumped to disk in plaintext on our continuous deployment (CD) server.

Thoughts on Choosing For Our Team

Personally, I favour (1) or even just using AVault alone. While the theoretical “separation of duties” potential for AVault + HVault is supposed to be more attractive to a security geek like me, this just seems like needless complexity for effectively very little gain. Teaching our volunteers (now and in the future) how to manage two secrets-protecting technologies would be more painful, and we double the risks of dealing with a breaking change (or loss of active development) for a necessary and non-trivially-integrated technology in our stack.

Further, if I had to stick with one, I’d stay “single vendor” and use AVault rather than spread us across two projects with different needs & design philosophies. Once we accept that there’s an occasional “out of band initialization” burden for setting up either vault, and that we’d likely have to share access to larger numbers of secrets with a wider set of the team than ideal, I think the day-to-day management overhead of AVault is no worse (and possibly lighter) than HVault.

Pseudo-Solution for an HVault-only Implementation

Assuming for the moment that we proceed with (1), this (I think) is the logical setup to make it work:

  • Setup an HVault instance
  • Design a naming scheme for secrets
  • Populate HVault with secrets
  • Install Consul Template as a service
  • Rewrite all secret-containing Ansible YAMLs with Consul Template templating variables (matching the HVault naming)
  • Rewrite CD scripts to pull HVault secrets and rewrite all secret-containing Ansible YAMLs
  • Populate the HVault environment variables to enable CD scripts to authenticate to HVault

Operational Concerns

If the HVault instance is running on a server in the production infrastructure, can HVault be configured to only allow connections from other servers that require access to the HVault secrets? This would reduce the risk that knowledge of the HVault (authentication token and address as used here) wouldn’t provide instant access to the secrets from anywhere on the Internet. This would be considered a defense-in-depth measure in case ip_tables and SSH protections could be circumvented to allow incoming traffic at the network level.

The HVault discussions about “flexibility” and “developer considerations” lead me to conclude that – for a volunteer team using part-time time slivers to manage an open source project’s infrastructure – HVault Cubbyhole just isn’t low-impact, fully-baked enough at this time to make it worth the extra development effort to create a full solution for our needs. While Cubbyhole addresses an interesting edge case in making on-the-wire HVault tokens less vulnerable, it doesn’t substantially mitigate (for us, at least) the bootstrapping problem, especially when it comes to a single-server HVault+deployment service setup.

Residual Security Issues

  • All this gyration with HVault is meant to help solve the problems of (a) storing all Ansible YAML-bound secrets in plaintext, (b) storing a static secret (the AVault password) in plaintext on our CD server, and (c) finding some way to keep any secrets from showing up in our github repo.
  • However, there’s still the problem of authenticating a CD process to HVault to retrieve secret(s) in the first place
  • We’re still looking to remove human intervention from standard deployments, which means persisting the authentication secret (token, directory-managed user/pass, etc) somewhere on disk (e.g. export VAULT_TOKEN=xxxx)
  • Whatever mechanism we use will ultimately be documented – either directly in our github repo, or in documentation we end up publishing for use by other infrastructure operators and those who wish to follow our advice

 

This is not the final word – these are merely my initial thoughts, and I’m looking forward to members of the team bringing their take to these technologies, comparisons and issues.  I’m bound to learn something and we’ll check back with the results.

Reading List

Intro to Hashicorp Vault: https://www.vaultproject.io/intro/

Blog example using HVault with Chef: https://www.hashicorp.com/blog/using-hashicorp-vault-with-chef.html

Example Chef Recipe for using HVault https://gist.github.com/sethvargo/6f1a315094fbd1a18c6d

Ansible lookup module to retrieve secrets from HVault https://github.com/jhaals/ansible-vault

Ansible modules for interacting with HVault https://github.com/TerryHowe/ansible-modules-hashivault

Ansible Vault for an open source project: adventures in simplified indirection

“How can we publish our server configuration scripts as open source code without exposing our secrets to the world?”

It seemed like a simple enough mission. There are untold numbers of open source projects publishing directly to github.com; most large projects have secrets of one form or another. Someone must have figured out a pattern for keeping the secrets *near* the code without actually publishing them (or a key leading to them) as plaintext *in* the code, yes?

However, a cursory examination of tutorials on Ansible Vault left me with an uneasy feeling. It appears that a typical pattern for this kind of setup is to partition your secrets as variables in an Ansible Role, encrypt the variables, and unlock them at runtime with reference to a password file (~/.vault_pass.txt) [or an interactive prompt at each Ansible run *shudder*]. The encrypted content is available as an AES256 blob, and the password file… well, here’s where I get the heebie-jeebies:

  1. While AES256 is a solid algorithm, it still feels…weird to publish such files to the WORLD. Distributed password cracking is quite a thing; how ridiculous of a password would we need to have to withstand an army of bots grinding away at a static password, used to unlock the encrypted secrets? Certainly not a password that anyone would feel comfortable typing by hand every time it’s prompted.
  2. Password files need to be managed, stored, backed up and distributed/distributable among project participants. Have you ever seen the docs for PGP re: handling the master passphrase? Last time I remember looking with a friend, he showed me four places where the docs said “DON’T FORGET THE PASSPHRASE”. [Worst case, what happens if the project lead gets hit by a bus?]

I guess I was expecting some kind of secured, daemon-based query-and-response RPC server, the way Jan-Piet Mens envisioned here.

Challenges

  • We have a distributed, all-volunteer team – hit-by-a-bus scenarios must be part of the plan
  • (AFAIK) We have no permanent “off-the-grid” servers – no place to stash a secret that isn’t itself backed up on the Internet – so there will have to be at least periodic bootstrapping, and multiple locations where the vault password will live

Concerns re: Lifecycle of Ansible Vault secrets:

  1. Who should be in possession of the master secret? Can this be abstracted or does anyone using it have to know its value?
  2. What about editing encrypted files? Do you have to decrypt them each time and re-encrypt, or does “ansible-vault edit” hand-wave all that for you?
    • Answer: no, “ansible-vault edit” doesn’t persist the decrypted contents to disk, just sends them to your editor and transparently re-encrypts on save.
  3. Does Ansible Vault use per-file AES keys or a single AES key for all operations with the same password (that is, is the vault password a seed for the key or does it encrypt the key)?
    • Answer: not confirmed, but perusing the source code and the docs never mention per-file encryption, and the encrypted contents do not appear to store an encrypted AES key, so it looks like one AES key per vault password.
  4. Where to store the vault password if you want to integrate it into a CD pipeline?
    • Answer: –vault-password-file ~/.vault_pass.txt OR EVEN –vault-password-file ~/.vault_pass.py, where the script sends the password to stdout]
  5. Does anyone have a viable scheme that doesn’t require a privileged operator to be present during every deployment (–ask-vault-pass)?
    • i.e. doesn’t that mean you’re in danger of including ~/.vault_pass.txt in your git commit at some point? If not, where does that secret live?
  6. If you incorporate LastPass into your workflow to keep a protected copy of the vault password, can *that* be incorporated into the CD pipeline somehow?
  7. Are there any prominent OSS projects that have published their infrastructure and used Ansible Vault to publish encrypted versions of their secrets?

Based on my reading of the docs and blogs, it seems like this is the proferred solution for maximum automation and maintainability:

  • Divvy up all your secrets as variables and use pseudo-leaf indirection (var files referencing prefixed variables in a separate file) as documented here.
  • Encrypt the leaf-node file(s) using a super-complex vault password
  • Store the vault password in ~/.vault_pass.txt
  • Call all ansible and ansible-playbook commands using the –vault-password-file option
  • Smart: wire up a pre-commit step in git to make sure the right files are always encrypted as documented here.
  • Backup the vault password in a password manager like LastPass (so that only necessary participants get access to that section)
  • Manually deploy the ,vault_pass.txt file to your Jenkins server or other CI/CD master and give no one else access to that server/root/file.
  • Limit the number of individuals who need to edit the encrypted file(s), and make sure they list.vault_pass.txt in their .gitignore file.

P.S. Next up – look into the use of Hashicorp’s Vault project.

Reading List

Ansible Vault Docs:
http://docs.ansible.com/ansible/playbooks_vault.html

This is an incredibly useful article of good practices for using Ansible (and Ansible Vault) in a reasonably productive way:
https://www.reinteractive.net/posts/167-ansible-real-life-good-practices