Hashicorp Vault + Ansible + CD: open source infra, option 2

“How can we publish our server configuration scripts as open source code without exposing our secrets to the world?”

In my first take on this problem, I fell down the rabbit hole of Ansible’s Vault technology – a single-password-driven encryption implementation that encrypts whole files and demands they be decrypted by interactive input or static filesystem input at runtime. Not a bad first try, but feels a little brittle (to changes in the devops team, to accidental inclusion in your git commits, or to division-of-labour concerns).

There’s another technology actively being developed for the devops world, by the Hashicorp project, also (confusingly/inevitably) called Vault. [I’ll call it HVault from here on, to distinguish from Ansible Vault >> AVault.]

HVault is a technology that (at least from a cursory review of the intro) promises to solve the brittle problems above. It’s an API-driven lockbox and runtime-proxy for all manner of secrets, making it possible to store and retrieve static secrets, provision secrets to some roles/users and not others, and create limited-time-use credentials for applications that have been integrated with HVault.

Implementation Options

So for our team’s purposes, we only need to worry about static secrets so far. There’s two possible ways I can see us trying to integrate this:

  1. retrieve the secrets (SSH passphrases, SSL private keys, passwords) directly and one-by-one from HVault, or
  2. retrieve just an AVault password that then unlocks all the other secrets embedded in our Ansible YAML files (using reinteractive’s pseudo-leaf indirection scheme).

(1) has the advantage of requiring one fewer technologies, which is a tempting decision factor – but it comes at the expense of creating a dependency/entanglement between HVault and our Ansible code (in naming and managing the key-value pairs for each secret) and of having to find/use a runtime solution to injecting each secret into the appropriate file(s).

(2) simplifies the problem of injecting secrets at runtime to a single secret (i.e. AVault can accept a script to insert the AVault password) and enables us to use a known quantity (AVault) for managing secrets in the Ansible YAMLs, but also means that (a) those editing the “secret-storing YAMLs” will still have to have access to a copy of the AVault password, (b) we face the future burden to plan for breaking changes introduced by both AVault and HVault, and (c) all secrets will be dumped to disk in plaintext on our continuous deployment (CD) server.

Thoughts on Choosing For Our Team

Personally, I favour (1) or even just using AVault alone. While the theoretical “separation of duties” potential for AVault + HVault is supposed to be more attractive to a security geek like me, this just seems like needless complexity for effectively very little gain. Teaching our volunteers (now and in the future) how to manage two secrets-protecting technologies would be more painful, and we double the risks of dealing with a breaking change (or loss of active development) for a necessary and non-trivially-integrated technology in our stack.

Further, if I had to stick with one, I’d stay “single vendor” and use AVault rather than spread us across two projects with different needs & design philosophies. Once we accept that there’s an occasional “out of band initialization” burden for setting up either vault, and that we’d likely have to share access to larger numbers of secrets with a wider set of the team than ideal, I think the day-to-day management overhead of AVault is no worse (and possibly lighter) than HVault.

Pseudo-Solution for an HVault-only Implementation

Assuming for the moment that we proceed with (1), this (I think) is the logical setup to make it work:

  • Setup an HVault instance
  • Design a naming scheme for secrets
  • Populate HVault with secrets
  • Install Consul Template as a service
  • Rewrite all secret-containing Ansible YAMLs with Consul Template templating variables (matching the HVault naming)
  • Rewrite CD scripts to pull HVault secrets and rewrite all secret-containing Ansible YAMLs
  • Populate the HVault environment variables to enable CD scripts to authenticate to HVault

Operational Concerns

If the HVault instance is running on a server in the production infrastructure, can HVault be configured to only allow connections from other servers that require access to the HVault secrets? This would reduce the risk that knowledge of the HVault (authentication token and address as used here) wouldn’t provide instant access to the secrets from anywhere on the Internet. This would be considered a defense-in-depth measure in case ip_tables and SSH protections could be circumvented to allow incoming traffic at the network level.

The HVault discussions about “flexibility” and “developer considerations” lead me to conclude that – for a volunteer team using part-time time slivers to manage an open source project’s infrastructure – HVault Cubbyhole just isn’t low-impact, fully-baked enough at this time to make it worth the extra development effort to create a full solution for our needs. While Cubbyhole addresses an interesting edge case in making on-the-wire HVault tokens less vulnerable, it doesn’t substantially mitigate (for us, at least) the bootstrapping problem, especially when it comes to a single-server HVault+deployment service setup.

Residual Security Issues

  • All this gyration with HVault is meant to help solve the problems of (a) storing all Ansible YAML-bound secrets in plaintext, (b) storing a static secret (the AVault password) in plaintext on our CD server, and (c) finding some way to keep any secrets from showing up in our github repo.
  • However, there’s still the problem of authenticating a CD process to HVault to retrieve secret(s) in the first place
  • We’re still looking to remove human intervention from standard deployments, which means persisting the authentication secret (token, directory-managed user/pass, etc) somewhere on disk (e.g. export VAULT_TOKEN=xxxx)
  • Whatever mechanism we use will ultimately be documented – either directly in our github repo, or in documentation we end up publishing for use by other infrastructure operators and those who wish to follow our advice

 

This is not the final word – these are merely my initial thoughts, and I’m looking forward to members of the team bringing their take to these technologies, comparisons and issues.  I’m bound to learn something and we’ll check back with the results.

Reading List

Intro to Hashicorp Vault: https://www.vaultproject.io/intro/

Blog example using HVault with Chef: https://www.hashicorp.com/blog/using-hashicorp-vault-with-chef.html

Example Chef Recipe for using HVault https://gist.github.com/sethvargo/6f1a315094fbd1a18c6d

Ansible lookup module to retrieve secrets from HVault https://github.com/jhaals/ansible-vault

Ansible modules for interacting with HVault https://github.com/TerryHowe/ansible-modules-hashivault

Ansible Vault for an open source project: adventures in simplified indirection

“How can we publish our server configuration scripts as open source code without exposing our secrets to the world?”

It seemed like a simple enough mission. There are untold numbers of open source projects publishing directly to github.com; most large projects have secrets of one form or another. Someone must have figured out a pattern for keeping the secrets *near* the code without actually publishing them (or a key leading to them) as plaintext *in* the code, yes?

However, a cursory examination of tutorials on Ansible Vault left me with an uneasy feeling. It appears that a typical pattern for this kind of setup is to partition your secrets as variables in an Ansible Role, encrypt the variables, and unlock them at runtime with reference to a password file (~/.vault_pass.txt) [or an interactive prompt at each Ansible run *shudder*]. The encrypted content is available as an AES256 blob, and the password file… well, here’s where I get the heebie-jeebies:

  1. While AES256 is a solid algorithm, it still feels…weird to publish such files to the WORLD. Distributed password cracking is quite a thing; how ridiculous of a password would we need to have to withstand an army of bots grinding away at a static password, used to unlock the encrypted secrets? Certainly not a password that anyone would feel comfortable typing by hand every time it’s prompted.
  2. Password files need to be managed, stored, backed up and distributed/distributable among project participants. Have you ever seen the docs for PGP re: handling the master passphrase? Last time I remember looking with a friend, he showed me four places where the docs said “DON’T FORGET THE PASSPHRASE”. [Worst case, what happens if the project lead gets hit by a bus?]

I guess I was expecting some kind of secured, daemon-based query-and-response RPC server, the way Jan-Piet Mens envisioned here.

Challenges

  • We have a distributed, all-volunteer team – hit-by-a-bus scenarios must be part of the plan
  • (AFAIK) We have no permanent “off-the-grid” servers – no place to stash a secret that isn’t itself backed up on the Internet – so there will have to be at least periodic bootstrapping, and multiple locations where the vault password will live

Concerns re: Lifecycle of Ansible Vault secrets:

  1. Who should be in possession of the master secret? Can this be abstracted or does anyone using it have to know its value?
  2. What about editing encrypted files? Do you have to decrypt them each time and re-encrypt, or does “ansible-vault edit” hand-wave all that for you?
    • Answer: no, “ansible-vault edit” doesn’t persist the decrypted contents to disk, just sends them to your editor and transparently re-encrypts on save.
  3. Does Ansible Vault use per-file AES keys or a single AES key for all operations with the same password (that is, is the vault password a seed for the key or does it encrypt the key)?
    • Answer: not confirmed, but perusing the source code and the docs never mention per-file encryption, and the encrypted contents do not appear to store an encrypted AES key, so it looks like one AES key per vault password.
  4. Where to store the vault password if you want to integrate it into a CD pipeline?
    • Answer: –vault-password-file ~/.vault_pass.txt OR EVEN –vault-password-file ~/.vault_pass.py, where the script sends the password to stdout]
  5. Does anyone have a viable scheme that doesn’t require a privileged operator to be present during every deployment (–ask-vault-pass)?
    • i.e. doesn’t that mean you’re in danger of including ~/.vault_pass.txt in your git commit at some point? If not, where does that secret live?
  6. If you incorporate LastPass into your workflow to keep a protected copy of the vault password, can *that* be incorporated into the CD pipeline somehow?
  7. Are there any prominent OSS projects that have published their infrastructure and used Ansible Vault to publish encrypted versions of their secrets?

Based on my reading of the docs and blogs, it seems like this is the proferred solution for maximum automation and maintainability:

  • Divvy up all your secrets as variables and use pseudo-leaf indirection (var files referencing prefixed variables in a separate file) as documented here.
  • Encrypt the leaf-node file(s) using a super-complex vault password
  • Store the vault password in ~/.vault_pass.txt
  • Call all ansible and ansible-playbook commands using the –vault-password-file option
  • Smart: wire up a pre-commit step in git to make sure the right files are always encrypted as documented here.
  • Backup the vault password in a password manager like LastPass (so that only necessary participants get access to that section)
  • Manually deploy the ,vault_pass.txt file to your Jenkins server or other CI/CD master and give no one else access to that server/root/file.
  • Limit the number of individuals who need to edit the encrypted file(s), and make sure they list.vault_pass.txt in their .gitignore file.

P.S. Next up – look into the use of Hashicorp’s Vault project.

Reading List

Ansible Vault Docs:
http://docs.ansible.com/ansible/playbooks_vault.html

This is an incredibly useful article of good practices for using Ansible (and Ansible Vault) in a reasonably productive way:
https://www.reinteractive.net/posts/167-ansible-real-life-good-practices

Occupied Neurons, early July 2016: security edition

Who are you, really: Safer and more convenient sign-in on the web – Google I/O 2016

Google shared some helpful tips for web developers to make it as easy as possible for users to securely sign in to your web site, from the Google Chrome team:

  • simple-if-annoying-that-we-still-have-to-use-these attributes to add to your forms to assist Password Manager apps
  • A Credential Management API that (though cryptically explained) smoothes out some of the steps in retrieving creds from the Chrome Credential Manager
  • This API also addresses some of the security threats (plaintext networks, Javascript-in-the-middle, XSS)
  • Then they discuss the FIDO UAF and U2F specs – where the U2F “security key” signs the server’s secondary challenge with a private key whose public key is already enrolled with the online identity the server is authenticating

The U2F “security key” USB dongle idea is cute and useful – it requires the user’s interaction with the button (can’t be automatically scraped by silent malware), uses RSA signatures to provide strong proof of possession and can’t be duplicated. But as with any physical “token”, it can be lost and it requires that physical interface (e.g. USB) that not all devices have. Smart cards and RSA tokens (the one-time key generators) never entirely caught on either, despite their laudable security laurels.

The Credential Manager API discussion reminds me of the Internet Explorer echo chamber from 10-15 years ago – Microsoft browser developers adding in all these proprietary hooks because they couldn’t imagine anyone *not* fully embracing IE as the one and only browser they would use everywhere. Disturbing to see Google slip into that same lazy arrogance – assuming that web developers will assume that their users will (a) always use Chrome and (b) be using Chrome’s Credential Manager (not an external password manager app) to store passwords.

Disappointing navel-gazing for the most part.

Google’s password-free logins may arrive on Android apps by year-end

Project Abacus creates a “Trust Score API” – an interesting concept which intends supplant the need for passwords or other explicit authentication demands, by taking ambient readings from sensors and user interaction patterns with their device to determine how likely it is that the current holder/user is equivalent to the identity being asserted/authenticated.

This is certainly more interesting technology, if only because it allows for the possibility that any organization/entity that wishes to set their own tolerance/threshold per-usage can do so, using different “Trust Scores” depending on how valuable the data/API/interaction is that the user is attempting. A simple lookup of a bank balance could require a lower score than making a transfer of money out of an account, for example.

The only trick to this is the user must allow Google to continuously measure All The Thingz from the device – listen on the microphone, watch all typing, observe all location data, see what’s in front of the camera lens. Etc. Etc. Etc.

If launched today, I suspect this would trip over most users’ “freak-out” instinct and would fail, so kudos to Google for taking it slow. They’re going to need to shore up the reputation of Android phones and their inscrutably cryptic if comprehensive permissions model and how well that’s sandboxed if they’ll ever get widespread trust for Google to watch everything you’re doing.

MICROSOFT WANTS TO PROTECT US FROM OUR OWN STUPID PASSWORDS

Looks like Microsoft is incorporating “widely-used hacked passwords” into the set of password rules that Active Directory can enforce against users trying to establish a weak password. Hopefully this’ll be less frustrating than the “complex passwords” rules that AD and some of Microsoft’s more zealous customers like to enforce, making it nigh-impossible to know what the rules are let alone give a sentient human a chance of getting a password you might want to type 20-50 times/day. [Not that I have any PTSD from that…]

Unfortunately, they do a piss-poor job of explaining how “Smart Password Lockout” works. I’m going to take a guess how this works, and hopefully someday it’ll be spelled out. It appears they’ve got some extra smarts in the AD password authentication routine that runs at the server-side – it can effectively determine whether the bad password authentication attempt came from an already-known device or not. This means that AD is keeping a rolling cache of the “familiar environments” – likely one that ages out the older records (e.g. flushing anything older than 30 days). What’s unclear is whether they’re recording remote IP addresses, remote computer names/identities, remote IP address subnets, or some new “cookie”-like data that wasn’t traditionally sent with the authentication stream.

If this is based on Kerberos/SAML exchanges, then it’s quite possible to capture the remote identity of the computer from which the exchange occurred (at least for machines that are part of the Active Directory domain). However, if this is meant as a more general-purpose mitigation for accounts used in more Internet (not Active Directory domain) setting, then unless Active Directory has added cookie-tracking capabilities it didn’t have a decade ago, I’d imagine they’re operating strictly on the remote IP address enveloped around any authentication request (Kerberos, NTLM, Basic, Digest).

Still seems a worthwhile effort – if it allows AD to lockout attackers trying to brute-force my account from locations where no successful authentication has taken place – AND continues to allow me to proceed past the “account lockout” at the same time – this is a big win for end users, especially where AD is used in Internet-facing settings like Azure.

Do You Demo? Do you act on the feedback? No? Then you ain’t agile

I am convinced that there are few practices in Agile (aka SCRUM to most people) that can’t be revised, bent, paused or outright abandoned – in the pursuit of a healthy, adaptive and productive engineering squad.

However, the end-of-iteration demo is one I am vehement about.  If you aren’t doing demos well, it’s my believe that you shouldn’t be calling yourself agile (let alone Agile(tm)).

Aside: AgilePDX is a local meetup community of people zealous about improving our employers’ ability to deliver better product – faster, more transparently and most importantly, with more value to our customers.

Our last pub lunch roundtable discussion was “The End of Agile?”, and Billy McGee posed a great question that still rings in my head:

Which of the rituals/rules/ceremonies can we abandon and still call ourselves Agile?

My initial (silent) response to this was “almost all of them”.

As I actually contributed to the discussion, I still claim that one of The Most Important ceremonies to stick with is the end-of-iteration DEMO. Get your feedback early and often (if more frequently than end-of-iteration, even better), and for dogs’ sake get at least *one* person to say something who’s from *outside* the team.  Then take an action on that feedback.

code demo

Until you get outside-the-team inspection, you can’t hope to truly adapt to the trouble you’ve just gotten into by being away from outsiders for that long.  Groupthink, too-close-to-the-problem, love for the solution you derived – it all clouds the objective reality that you’ve almost certainly done it wrong.

If you do this one thing [demo/feedback/react] consistently/more-than-haphazardly, and *act* on the feedback (discuss, change a story, throw away code, replan your roadmap), you will have done more to imbue confidence in your team from the rest of the organization, and they’ll give you a lot more room to be experimental and not “plan-to-perfection” drowned.

The most frustrating thing for management/leadership that aren’t there every day is feeling like we have no CONTROL over getting the right things delivered more effectively.  Not seeing the work that goes into delivering the product, the engineering process becomes just a black box – even for those who’ve been in the trenches, distance + time just leads the mind to question.

When things feel out of control, leaders use what few tools they have to do what they can to keep things steered correctly – demand reports and metrics to give them *something* to wrap hands around, start dictating process to get more ‘structure’ around this messy process, and asking for more detailed plans.

These are all proxies for “how can I help make sure we’re doing the right things?”  And like they say, a picture’s worth a thousand words.  Heck, a customer’s reaction is worth at least that much too.

I’ve seen myself react *far* better to the growing uncertainties when I get to see what’s been delivered so far – live UI, screenshots, even a tour of the source code.  Takes away so much anxiety just to *see* something is there, let alone whether it truly meets my personal objectives.  Give me something – anything – to comment on, and the fact that I engage in comments on the thing means (a) I’m asking for help to buy in, (b) I’m getting committed to the thing in front of me and (c) you’re getting early clues what I’ll want to see before the last responsible moment.

Even this is better than no feedback, no matter how painful it might be:

code quality

Coda: one of my colleagues asked me, “So, abandon retrospectives?”  It’s a good question.  They’re one of the few rituals that’s meant to help the team evolve to better performance and outcomes.  And here’s where I’m going to take a radical/lazy/wait-and-see position:

Personally, I’m inclined to skip (the process part of) the retro until and unless the team gets behind the “inspect and adapt” angle on the product in response to what happens during demo. My experience, engineers are more likely to start in that habit if it’s focused on their code, and if they’re not even willing to engage for the technology, I’m less inclined to skate uphill on the process side of things – as a PO/PM participant/observer, I’ve seen retro degrade quite often into a “here’s what happened”, “good/bad/ugly” historical review, and lose sight of the “what would you like to try changing next time?”

Occupied Neurons, late May 2016

Understanding Your New Google Analytics Options – Business 2 Community

Here’s where the performance analytics and “business analytics” companies need to keep an eye or two over their shoulder. This sounds like a serious play for the high-margin customers – a big capital “T” on your SWOT analysis, if you’re one of the incumbents Google’s threatening.

10 Revealing Interview Questions from Product Management Executives

Prep’ing for a PM/PO job interview? Here’s some thought-provoking questions you should think about ahead of time.

When To Decline A Job Offer

The hardest part of a job search (at least for me) is trying to imagine how I would walk away from a job offer, even if it didn’t suit my needs, career aspirations. Beyond the obvious red flags (dark/frantic mood around the office, terrible personality fit with the team/boss), it feels ungrateful to say “no” based on a gut feel or “there’s something better”. Here’s a few perspectives to bolster your self-worth algorithm.

The Golden Ratio: Design’s Biggest Myth

I’m one of the many who fell for this little mental sleight-of-hand. Sounds great, right? A magic proportion that will make any design look “perfect” without being obvious, and will help elevate your designs to the ranks of all the other design geeks who must also be using the golden ratio.

Except it’s crap, as much a fiction and a force-fit as vaccines and autism or oat bran and heart disease (remember that old saw?). Read the well-researched discussion.

Agile Is Dead

This well-meaning dude fundamentally misunderstands Agile and is yet so expert that he knows how to improve on it. “Shuffling Trello cards” and “shipping often” doesn’t even begin…

Not even convinced *he* has read the Manifesto. Gradle is great, CD is great, but if you have no strategy for Release Management or you’re so deep in the bowels of a Microservices forest that you don’t have to worry about Forestry Management, then I’d prefer you step back and don’t confuse those chainsaw-wielders who I’m trying to keep from cutting off their limbs (heh, this has been brought to you by the Tortured Analogies Department).

Perspectives on Product Management (if you’re asking)

As part of a recent job application, they asked for my responses to a number of interesting questions regarding my approach to Product Management.  In the spirit of Scott Hanselman’s “don’t waste your keystrokes“, I’m sharing my thoughts to give more folks the benefit of my perspective.

As a “Product Manager”, what are the product management challenges in a Start-Up (Private) company environment?

Key is determining which of the possible ideas and market gaps you’ve identified are real winners with significant and long-term revenue opportunity, without the benefit larger, older companies have of market/revenue history to guide your guesses.

Choosing among an infinite range of new product ideas is much harder and feels more arbitrary than choosing among the more focused features and enhancements that an established customer base can provide you.

How do the product management challenges of a Start-Up (Private) company differ/compare to an established Fortune 500 environment?

In the established Fortune 500 companies I worked for, the challenges included weighing the benefits/risks of cannibalizing existing products, making incremental market share improvements in mature/low-growth markets, and how to encourage existing customers to buy more of the products you’re offering when you’ve maxed out their capacity to buy the ones they already have.

Start-up companies have the opposite challenges: establishing *any* market share in pre-existing markets (trying to gain visibility and credibility with target customers), making the “first sale” and determining what are the actionable and actual barriers-to-purchase in the market when you have few customers to quiz for any leading/lagging indicators.

Please describe (2-3 sentences) your experience developing a software product or service in a product manager role.

I’ve managed a range of software opportunities, from those I’ve birthed from scratch myself (and managed through many major releases and business needs changes over the years), to managing a pair of employee-focused productivity solutions, to juggling a wide range of developer-focused software solutions that had competing and sometimes conflicting customer requirements.

I’ve always managed teams “too small for the job”, and always focused on making sure they are more confident and prepared to deliver the software their customers actually need (no matter how unclear the initial requirements may have been).

My “business value” focus is weighted towards ensuring that the primary use case is never difficult to follow, that we design for the user with the least experience with/attention to the system, and that we’re always focused on making incremental improvements based on actual customer feedback rather than infinite analysis paralysis that halts good experiment-driven development.

Please describe your product management experience where “need” has been identified, but not “demand”.

The product I managed the longest was a set of business applications that I led because I was tired of seeing my colleagues sending around spreadsheets, and knew that they would be much better prepared and more effective with a centralized, real-time solution. I further determined that the engineers who were the day-to-day users of the system needed not only to know what they were expected to do, but how they were expected to know when they had successfully completed the required tasks proscribed by my solution.

Neither of these focuses were requested by my stakeholders and customers – in fact the former was something I was actively encouraged *not* to pursue by my management, and the latter was something that my management believed was irrelevant to the purpose.

In the end, this solution went from a system no one asked for or cared about to the most critical piece of infrastructure that measured and enabled the Security Development Lifecycle across Intel.

How do you deal with frequent product goal changes?

I have two main strategies I pursue:

I reduce the amount of time I invest into “gold-plating” (grooming, refining, updating) the roadmap or the product Backlog artifacts – up front I’ll define their “why” and primary goals, but I spend as little time as possible (sometimes just a few minutes based on my intuition and initial impressions) to refine these artifacts from e.g. “could be 10-40 points of work” to “I’m pretty confident this is 10-15 points of work”. I take this approach with the knowledge that (a) as the timetable approaches when we’ll actually deliver the items, (a) many of them will have been discarded [for which any refining effort would have been entirely wasted], (b) we’ll usually have to significantly revise what we were initially focused on as new market demands and insights become available, and (c) we’ll spend the least amount of wasted engineering cycles doing the final evaluation & estimation of the effort to deliver that work, by delaying the detailed investigation to just before they need to be delivered.

I stay tuned into all market/customer feedback channels – listening closely to Sales, Support and my direct customer interactions, effectively “leaning in” to the market/customer volatility to get a clear idea what how the fluctuations at our customers are turning into fluctuations in their requirements of us. For example, when a customer’s business is radically changing, or they’re subject to significant changes in what they’re expected to deliver (e.g. new business, losing their old business, changes in management or *their* customer base), that has significant downstream impact – they’ll frequently change their mind, or forget the last thing they requested. In cases where we have that kind of apparent chaos in the signals we’re getting from significant customers, I’ve made the effort to get directly in touch with the customer and have a longer conversation to help us understand what’s going on behind the scenes – and to help them prioritize among the stream of conflicting requests. This effort to “lean in” and engage the customer directly also has the beneficial effect of helping me determine which channels and individuals (e.g. sales, support) are reliable sources of information, and which ones warrant fewer immediate “drop everything” reactions from us.

Product Management when the customer’s problem/pain is identified is easy. How do you manage a product when the customer has not identified the problem/pain?

The classic answer is “ask them ‘Why’ five times until you get to the root of their problem”. However it’s rarely that simple – some customers aren’t able to articulate, some get defensive, and sometimes it takes a few rounds of conversation (with thinking in between) for them to articulate/admit what’s really going on. Some customers can own up immediately if only you ask directly.

In my experience, I have learned to ask the following question, when they demand a specific change or enhancement to the software I’ve helped deliver: “What decisions will you be able to make with this new information, or actions will you be able to take with this change in the system, that you aren’t currently able to make without it?” That, or variations on this question, usually helps me and the customer sort between “ideas that sound good or that make me feel better” and “those that will have a material impact on our business”.

The former are worth considering too – sometimes the usability, the pleasure evoked in a smoother experience, makes for a much more ‘sticky’ product (i.e. one which the customer is more likely to renew/purchase again). However, in my experience if you can’t identity or you ignore the latter in favor of the former, you risk allowing frustration and dissatisfaction to fester and ultimately doom your relationship with the customer.

Occupied Neurons, early May 2016

The continuing story of the intriguing ideas and happenings that I can’t shake off…

Pigsinspace222

(Have you ever seen an episode of Pigs In Space?  If not, go sample one now, and you’ll get my droll reference)

Infinite Scrolling, Pagination or “Load More” Buttons? Usability Findings in eCommerce

https://www.smashingmagazine.com/2016/03/pagination-infinite-scrolling-load-more-buttons/

Summary (and something I plan to bias towards in future designs, under similar conditions): The “Load More” design pattern is the most well-received by users and creates a minimum of friction while still enabling access to the page footer.

How Spotify’s Poor API Hygiene Broke a Bunch of Hardware and Software

http://www.programmableweb.com/news/how-spotifys-poor-api-hygiene-broke-bunch-hardware-and-software/analysis/2016/02/23

This is a pretty epic rant on the fallout for independent Spotify developers from a haphazard approach to managing the APIs offered over the years by this consumer entertainment service. Having worked on the other side of these kinds of decisions, I can well imagine how this came to be: thin staffing levels keeping from putting adequate attention on developer communications and engineering maintenance, plus distracted attention by PMs (or possibly even frequent PM turnover) such that late in the game, no one even remembers lets alone still believes in the original value prop behind the original APIs.

It doesn’t excuse the broken promises behind the APIs, and especially not the lack of communication in obvious channels when changes were made (eliminated), but I’ve been in such positions as a Product guy and found myself making decisions that feel just as compromised – trading off one disappointment for a better-mitigated disappointment elsewhere. It happens, especially when the product being extended through those APIs has a pretty low profit margin, and when the staff devoted to managing those concerns are terribly compromised (higher priorities and all).

Theory of Constraints

https://en.m.wikipedia.org/wiki/Theory_of_constraints

At the Intel-sponsored Accelerate Results gathering, a few themes/durable concepts kept coming up (and have come up in this community repeatedly over the years). One is the Theory of Constraints, which seems popular among all systems thinkers, even in big software design (at least in concept if not in execution).

I firmly believe we have a duty to consider outside perspectives on our industry, even when they appear to have no direct applicability – myopia, tools bias and fad-driven design/execution are the restraints I make deliberate effort to resist in my own practices.

Standing on the Shoulders of Giants

http://www.business-improvement.eu/toc/Goldratt_Standing_On_The_Shoulders_Of_Giants.php

Eliyahu Goldratt is a huge influence on the thought leaders at the Accelerate Results conference, and many made reference to his seminal essay that seems to have kicked off this whole revolution. Worth a skim, even if it’s only to be able to nod thoughtfully when others keep talking about this.

Everyday Internet Users Can Stand Up for Encryption — Here’s How

https://blog.mozilla.org/blog/2016/03/30/everyday-internet-users-can-stand-up-for-encryption-heres-how/
I worked with Mark Surman a long time ago back in Toronto for a non-profit Internet Service Provider. It’s more than a little amazing to me to see how our paths have diverged and yet how he’s speaking about issues today that are very near and dear to my heart.

Occupied Neurons, April 2016

https://medium.com/@sproutworx/six-templates-for-aspiring-product-managers-a568d3115cfe#.swkk52f58
So many Product Managers are making it up as they go along – generating whatever kinds of artifacts will get them past the next checkpoint and keep all the spinning plates from veering off into ether. This is the first time in a long time I’ve seen someone propose some viable, useable and not totally generic tools for capturing their PM thinking. Well worth a look.

https://medium.com/swlh/mvpm-minimum-viable-product-manager-e1aeb8dd421
The “BUT” model for Product Management is a hot topic, and there’s a number of folks taking a kick at deciphering it in their context. I’ve got a spin on it that I’ll write about soon, but this is a great take on the model too.

https://schloss.quora.com/Design-doesnt-deserve-a-seat-at-the-table
Captures all my feelings about the complaint from Designers (and Security reviewers, and all others in the “product quality” disciplines) that they get left out of discussions they *should* be part of. My own rant on the subject doesn’t do this subject justice, but I’m convinced that we *earn* our right to a seat by helping steer, working through the messy quagmire that is real software delivery (not just throwing pixel-perfect portfolio fodder over the wall).

http://www.eventbrite.com/e/resilience-and-the-future-of-work-responsiveorg-un-conference-tickets-24045089510
An unconference to expand awareness of a movement among leading thinkers on how to organize work in the 21st century. Looks fascinating – unconference format is dense and high-learning, the subject is still pretty fresh and new (despite the myriad of books building up to this over the last decade), and the energy in the Portland community is bursting.

Meetups where you’ll find Mike’s hat, Spring 2016 edition

Occasionally I’ll tell people I meet about all the meetups I have so much fun at.

Or rather, I’ll try to enumerate them all, and fail each and every time.

Primarily because there’s so many meetups I like to check in on.

So occasionally I’ll enumerate them like this, so that my friends have a valiant hope of crossing paths with me before the amazing event has passed.

Meetups I’m slavishly devoted to

Meetups I’ll attend anytime they’re alive

Meetups I sample like caviar – occasionally and cautiously

Recent additions that may soon pass the test of my time

 

Pruning features via intelligent, comprehensive instrumentation

Today’s adventures in the LinkedIn Product Management group gave us this article:
http://product.hubspot.com/blog/the-5-whys-of-feature-bloat

The critical statement (i.e. the most/only actionable information) in the article is this:

Decide a “minimum bar of usage/value” that every feature must pass in order for it to remain a feature. If a new feature doesn’t hit that bar in some set period of time, prune it.

I’d love to hear from folks who are able to prove with data that a feature is not getting the level of usage that we need to justify its continued existence.  AFAIK, whether it be a desktop, mobile or web/cloud app, instrumenting all the things so that we have visibility into the usage of every potentially-killable feature is a non-trivial (and sometimes impractical) investment in itself.

I’m not even arguing for getting that work prioritized enough to put it in up front – that’s just something that if it’s technically feasible, we should *all* do to turn us from cavemen wondering what the stars mean to explorers actually measuring and testing hypotheses about our universe.

I’m specifically inquiring how it’s actually *done* in our typical settings.  I know from having worked at New Relic what’s feasible and what are the limits of doing this in web/cloud and mobile settings, and it’s definitely a non-trivial exercise to instrument *all* the things (especially when we’re talking about UI features like buttons, configuration settings and other directly-interactive controls).  It’s far harder in a desktop setting (does anyone still have a “desktop environment”?  Maybe it’s just my years of working at Microsoft talking…).

And I can see how hard it is to not only *instrument* the settings but gather the data and catalogue the resulting data in a way that characterizes both (a) the actual feature that was used and, even better, (b) the intended result the user was trying to achieve [i.e. not just the what or how but the *why*].

Developers think one way about naming the internals of their applications – MVC patterns, stackoverflow examples, vendor cultures – and end users (and likely/often, we product managers) think another way.  Intuitive alignment is great, but hardly likely and usually not there.  For example, something as simple as a “lookup” or “query” function (from the engineering PoV) is likely thought of as a “search” function by the users.  I’ve seen far more divergent, enough to assume I won’t follow if I’m just staring at the route/controller names.

If I’m staring at the auto-instrumented names of an APM vendor’s view into my application, I’m likely looking at the lightly-decorated functions/classes/methods as named by the engineers – in my experience, these are terribly cryptic to a non-engineer.  And for all of our custom code that wove the libraries together, I’m almost certainly going to have to have the engineers add in custom tracers to annotate all the really cool, non-out-of-the-box features we added to the application.  Those custom tracers, unless you’ve got an IA (information architecture) nut on the team to get involved in the naming, will almost certainly look like a foreign language.

Does that make it easy for me to find the traces of usage by the end users of a specific feature (e.g. an advanced filtering textbox in my search function)?  Nope, not often, but it’s sure a start.

So what do you do about this, to make it less messy down the road when you’re dying to know if anyone’s actually using those advanced filtering features?

  1. Start now with the instrumentation and the naming.  Add the instrumentation as a new set of acceptance criteria to your user stories/requirements/tickets.  If the app internals have been named in a way that you understand at a glance, awesome – encourage more of the same from the engineers, and codify those approaches into  a naming guideline if possible.  Then if you’re really lucky, just derive the named instrumentation from the beautiful code.
  2. If not, start the work of adding the mapped names in your custom instrumentation now – i.e. if they called it “query”, make sure that the custom instrumentation names it “search”.
  3. Next up, adding this instrumentation for all your existing features.  Here, you have some interesting decisions:
    • Do you instrument the most popular and baseline features? (If so, why?  What will you do with that data?)
    • Do you instrument the features that are about to be canned? (If so, will this be there to help you understand which of your early adopter customers are still using the features – and do you believe that segment of your market is predictive of the usage by the other segments?)
    • Or do you just pick on the lesser-known features?  THESE ARE THE ONES I’D RECOMMEND for the most benefit for the invested energy – the work to add and continue to support this instrumentation is the most likely to be actionable at a later date – assuming you’ve got the energy to invest in that tension-filled EOL plan (as the above article beautifully illustrates).
  4. Finally, all of this labour should have convinced you to be a little more judicious in how many of these dubious features you’re going to add to your product.

Enhancing your ability to correct for these mistakes later is great; factoring in the extra cost up front, and helping justify why you’re not doing it now is even better.

And all that said?  Don’t get too hung up on the word “mistakes”.  We’re learning, we’re moving forward, and some of us are learning that Failure Is An Option.  But mostly, we’re living life the only way it’s able to be lived.

success