The Challenges of Customer Feedback Curation: A Guide for Product Managers

You’re one of a team of PMs, constantly firehosed by customer feedback (the terribly-named “feature request”**) and you even have a system to stuff that feedback into so you don’t lose it, can cross-reference it to similar patterns and are ready to start pulling out a PRD from the gems of problems that strike the Desirability, Feasibility and Viability triad.

And then you got pulled into a bunch of customer escalations (whose notes you intend to transform into the River of Feedback system), haven’t checked in on the backlog of feedback for a few weeks (I’m gonna have to wait til I’ve got a free afternoon to really dig in again), and I forget if I’ve updated that delayed PRD with the latest competitive insights from those customer-volunteered win/loss feedback.

Suddenly you realise your curation efforts – constantly transforming free-form inputs into well-synthesised insights – are falling behind what your peers *must* be doing better than you. 

You suck at this. 

Don’t be like Lucy

Don’t feel bad. We all suck at this. 

Why? Curation is rewarding and ABSOLUTELY necessary, but that’s doesn’t mean it isn’t hard:

  • it never ends (until your products are well past time to retire)
  • It’s yet one more proactive, put-off-able interruption in a sea of reactive demands
  • It’s filled with way more noise than signal (“Executive reporting is a must-have for us”)
  • You can bucket hundreds of ideas in dozens of classification systems (you ever tried card-sorting navigation menus with independent groups of end users, only to realise that they *all* have an almost-right answer that never quite lines up with the others?), and it’s oh-so-tempting to throw every vaguely-related idea into the upcoming feature bucket (cause maybe those customers will be satisfied enough to stop bugging you even though you didn’t address their core operational problem)

What can you do?

  1. Take the Feedback River of Feedback approach – dip your toes in as often as your curiosity allows
  2. Don’t treat this feedback as the final word, but breadcrumbs to discovering real, underlying (often radically different) problems
  3. Schedule regular blocks of time to reach out to one of the most recent input’s customers (do it soon after, so they still have a shot of remembering the original context that spurred the Feature Request, and won’t just parrot the words because they forgot why it mattered in the first place)
  4. Spend enough time curating the feedback items so that *you* can remember how to find it again (memorable keywords as labels, bucket as high in the hierarchy as possible), and stop worrying about whether anyone else will completely follow your classification logic.
  5. Treat this like the messy black box it inevitably is, and don’t try to wire it into every other system. “Fully integrated” is a cute idea – integration APIs, customer-facing progress labels, pretty pictures – but just creates so much “initialisation” friction such that every time you want to satisfy your curiosity on what’s new, it means an hour or three of labour to perfectly “metadata-ise” every crumb of feedback.
Be like Skeletor

NECESSARY EMPHASIS: every piece of customer input is absolutely a gift – they took time they didn’t need to spend, letting the vendor know the vendor’s stuff isn’t perfect for their needs. AND every piece of feedback is like a game of telephone – warped and mangled in layers of translation that you need to go back to the source to validate.

Never rely on Written Feature Requests as the main input to your sprints. Set expectations accordingly. And don’t forget the 97% of all tickets must be rejected Rule coined by Rich Mironov

**Aside: what the hell do you mean that “Feature Request” is misnamed, Mike?

Premise: customers want us to solve their problems, make them productive, understood and happy. 

Problem: we have little to no context for where the problem exists, what the user is going to do with the outcome of your product, and why they’re not seeking a solution elsewhere. 

Many customers (a) think they’re smarty pants, (b) hate the dumb uncooperative vendor and (c) are too impatient to walk through the backstory. 

So they (a) work through their mental model of our platform to figure out how to “fix” it, (b) don’t trust that we’ll agree with the problem and (c) have way more time to prep than we have to get on the Zoom with them. 

And they come up with a solution and spend the entire time pitching us on why theirs is the best solution that every other customers needs critically. Which we encourage by talking about these as Feature Requests (not “Problem Ethnographic Study”) – and which they then expect since they’ve put in their order at the Customer Success counter, they then expect that this is *going* to be coming out of the kitchen anytime (and is frankly overdue by the time they check back). Which completely contradicts Mironov’s “95% still go into the later/never pile“.

Speed, Quality or Cost: Choose One

PM says: “The challenge is our history of executing post-mvp. We get things out the door and jump onto the next train, then abandon them.”

UX says: “We haven’t found the sweet spot between innovation speed & quality, at least in my 5 years.”

Customer says: “What’s taking so long? I asked you for 44 features two years ago, and you haven’t given me any of the ones I really wanted.”

Sound familiar? I’m sure you’ve heard variations on these themes – hell, I’ve heard these themes in every tech firm I’ve worked.

One of the most humbling lessons I keep learning: nothing is ever truly “complete”, but if you’re lucky some features and products get shipped.

I used to think this was just a moral failing of the people or the culture, and that there *had* to be a way this could get solved. Why can’t we just figure this shit out? Aren’t there any leaders and teams that get this right?

It’s Better for Creatives, Innit?

I’m a comics reader, and I like to peer behind the curtain and learn about the way that creators succeed. How do amazing writers and artists manage to ship fun, gorgeous comics month after month?

Some of the creators I’ve paid close attention to, say the same thing as even the most successful film & atV professionals, theatre & clown types, painters, potters and anyone creating discrete things for a living:

Without a deadline, lots of great ideas never quite get “finished”. And with a deadline, stuff (usually) gets launched, but it’s never really “done”. Damned if you do, damned if you don’t. Worst of both worlds.

In commercial comics, the deal is: we ship monthly, and if you want a successful book, you gotta get the comic to print every month on schedule. Get on the train when it leaves, and you’re shipping a hopefully-successful comic. And getting that book to print means having to let go even if there’s more you could do: more edits to revise the words, more perfect lines, better colouring, more detailed covers.

Doesn’t matter. Ship it or we don’t make the print cutoff. Get it out, move on to the next one.

Put the brush down, let the canvas dry. Hang up the painting.

No Good PM Goes Unpunished

I think about that a lot. Could I take another six months, talk to more research subjects, rethink the UX flow, wait til that related initiative gets a little more fleshed out, re-open the debate about the naming, work over the GTM materials again?

Absolutely!

And it always feels like the “right” answer – get it finished for real, don’t let it drop at 80%, pay better attention to the customers’ first impressions, get the launch materials just right.

And if there were no other problems to solve, no other needs to address, we’d be tempted to give it one more once-over.

But.

There’s a million things in the backlog.

Another hundred support cases that demand a real fix to another even more problematic part of the code.

Another rotting architecture that desperately needs a refactor after six years of divergent evolution from its original intent.

Another competitive threat that’s eating into our win-loss rate with new customers.

We don’t have time to perfect the last thing, cause there’s a dozen even-more-pressing issues we should turn our attention to. (Including that one feature that really *did* miss a key use case, but also another ten features that are getting the job done, winning over customers, making users’ lives better EVEN IN THEIR IMPERFECT STATE.)

Regrats I’ve Had a Few

I regret a few decisions I wish I’d spent more time perseverating on. There’s one field name that still bugs me every time I type it in, a workflow I wish I’d fought harder to make more intuitive, and an analytic output that I wish we’d stuck to our guns in reporting it as it comes out of the OS.

But I *more* regret the hesitations that have kept me from moving on, cutting bait, and getting 100% committed to the top three problems that I’m too often saying “Those are key priorities that are top of the list, we should get that kicked off shortly.” And then somehow let slip til next quarter, or end up six months later than a rational actor would have addressed.

What is it he said? “Let’s decide on this today as if we had just been fired, and now we’re the cleanup crew who stepped in to figure out what those last clowns couldn’t get past.”

Lesson I Learned At Microsoft

Folks used to say “always wait for version 3.0 for new Microsoft products” (back in the packaged binaries days – hah). And I bought into it. Years later I learned what was going on: Microsoft deliberately shipped v1.0 to gauge any market interest (and sometimes abandoned there), 2.0 to start refining the experience, and getting things mostly “right” and ready for mass adoption by 3.0.

If they’d waited to ship until they’d complete the 3.0 scope, they’d have way overinvested in some market dead-ends and built features that weren’t actually crucial to customers’ success and not had an opportunity to listen to how folks responded to the actual (incomplete, hardly perfect) product in situ.

What Was The Point Again?

Finding the sweet spot between speed and quality strikes me as trying to beat the Heisenberg Uncertainty Principle: the more you refine your understanding of position, the less sure you are about momentum. It’s not that you’re not trying hard to get both right: I have a feeling that trying to find the perfect balance is asymptotically unachievable, in part because that balance point (fulcrum) is a shifting target: market/competition forces change, we build better core competencies and age out others, we get distracted by shinies and we endure externalities that perturb rational decision-making.

We will always strive to optimize, and that we don’t ever quite get it right is not an individual failure but a consequence of Dunbar’s number, imperfect information flows, local-vs-global optimization tensions, and incredible complexity that will always challenge our desire to know “the right answer”. (Well, it’s “42” – but then the immediate next problem is figuring out the question.)

We’re awesome and fallible all at the same time – resolving such dualities is considered enlightenment, and I envy those who’ve gotten there. Keep striving.

(TL;DR don’t freak out if you don’t get it “right” this year. You’re likely to spend a lot of time in Cynefin “complex” and “chaos” domains for a while, and it’s OK that it won’t be clear what “right” is. Probe/Act-Sense-Respond is an entirely valid approach when it’s hard-to-impossible to predict the “right” answer ahead of time.)

Curation as Penance

Talking to one of my colleagues about a content management challenge, we arrived at the part of the conversation where I fixated on the classic challenge.

We’re wrangling inputs from customers and colleagues into our Feature Request (a challenging name for what boils down to qualitative research) and trying to balance the question of how to make it easy to find the feedback we’re looking for, among thousands of submissions.

AI art is a wonder – is that molten gold pouring from his nose?

The Creator’s Indifference

It’d be easy to find the desired inputs (such as all customers who asked for anything related to “provide sensor support for Windows on Apple silicon” – clearly an artificial example eh?) if the people submitting the requests knew how we’d categorise and tag them.

But most outsiders don’t have much insight into the cultural black box that is “how does one collection of humans, indoctrinated to a specific set of organisational biases, think about their problem space?” – let alone, those outsiders having the motivation or incentive to put in that extra level of metadata decorations.

Why should the Creators care how their inputs are classified? Their motivation as customers of a vendor are “let the vendor know what we need” – once the message has been thrown over the wall, that’s as much energy as any customer frankly should HAVE to expend. Their needs are the vendor’s problem to grok, not a burden for the customer to carry.

Heck, the very fact of any elucidated input the customer offers to the vendor is a gift. (Not every customer, especially the ones who are tired of sending feedback into a black hole, are in a gift-giving mood.)

The Seeker’s Pain

Without such detailed classifications, those inputs become an undifferentiated pile. In Productboard (our current feedback collection tool of choice) they’re called Insights, and there’s a linear view of all Insights that’s not very…insightful. (Nor is it intended to be – searching is free text but often means scrutinising every one of dozens or hundreds of records, which is time-consuming.)

This makes the process of taking considered and defensible actions based on this feedback not very scalable. This makes the Seeker’s job quite tedious, and in the past when I’ve faced that task I put it off far too often and for far too long.

The Curator’s Burden

Any good Product Management discipline regularly curates such inputs. Assigns them weights, ties them to renormalised descriptors like name, size, industry of customer, and groups them with similar requests to help find repeating patterns of problems-to-solve.

A little better from the AI – but what the heck is that franken-machine in the background?

A well-curated feedback system is productive – insightful – even correlated to better ROI of your spend of engineering time.

BUT – it costs. If the Creator and the Seeker have little incentive to do that curation, who exactly takes it on? And even if the CMS (content management system) has a well-architected information model up front, who is there to ensure

  • items are assigned to appropriate categories?
  • categories are added and retired as the product, business and market change?
  • supporting metadata is consistently added to group like with like along many dimensions?

The Curator role is crucial to an effective CMS – whether for product feedback (Productboard), or backlog curation (Jira) or customer documentation (hmm, we don’t use WordPress – what platform are we on this time?)

What’s most important is that the curation work – whether performed by one person (some fool like me in its early days), or by the folks most likely to benefit (the whole PM team today) – not that it happens with speed, but that it happens consistently over the life of the system.

Biggest challenge I’ve observed? In every CMS I’ve used or built, it’s ensuring adequate time and attention is spent consistently organising the content (as friction-free as it should be for the Creator) so that it can be efficiently and effectively consumed by the Seeker.

That Curator role is always challenging to staff or “volunteer”. It’s cognitively tiring work, doing it well rarely benefits the Curator, and the only time most Curators hear about it is when folks complain what a terrible tool it is for ever finding anything.

Best case it’s finding gems among more gems…
…worst case it’s some Kafkaesque fever dream

(“Tire Fire” or “garbage dump” are common epithets most mature, enterprise systems like Jira are described as by Creators and Seekers – except in the rare cases where the system is zealously, jealously locked down and heavily demanding on any input by the griping Creators.)

In our use of Productboard and Jira (or any other tool for grappling the feedback tsunami) we’re in the position most of my friends and colleagues across the industry find themselves – doing a decent job finding individual items, mostly good at having them categorised for most Seekers’ daily needs, and wondering if there’s a better a technology solution to a people & process problem.

(Hint: there aren’t.)

Curation is the price we need to pay to make easy inputs turn into effective outputs. Penance for most of us who’ve been around long enough to complain how badly organised things are, and who eventually recognise that we need to be the change we seek in the world.

“You either die a hero or live long enough to become the villain.” — Harvey Dent

DevOps status report: HackOregon 2019 season

One of my colleagues on the HackOregon project this year sent around “Nice post on infrastructure as code and what getting solid infra deploys in place can unlock” https://www.honeycomb.io/blog/treading-in-haunted-graveyards/

I felt immediately compelled to respond, saying:

Provocative thinking, and we are well on our way I’d say.

I’ve been the DevOps lead for HackOregon for three years now, and more often than not delivering 80% of the infrastructure each year – the CI/CD pipeline, the automation scripts for standardizing and migrating configuration and data into the AWS layers, and the troubleshooting and white-glove onboarding of each project’s teams where they touch the AWS infrastructure.

There’s great people to work with too – on the occasions when they’ve got the bandwidth to help debug some nasty problem, or see what I’ve been too bleary-eyed to notice is getting in our way, it’s been gratifying to pair up and work these challenges through to a workable (if not always elegant) solution.

My two most important guiding principles on this project have been:

  • Get project developers productive as soon as possible – ensure they have a Continuous Deployment pipeline that gets their project into the cloud, and allows them to see that it works so they can quickly see when a future commit breaks it
  • “working > good > fast” – get something working first, make it “good” (remove the hard-coding, the quick workarounds) second, then make it automated, reusable and documented

We’re married pretty solidly to the AWS platform, and to a CloudFormation-based orchestration model.  It’s evolved (slowly) over the years, as we’ve introspected the AWS Labs EC2 reference architecture, and as I’ve pulled apart the pieces of that stack one by one and repurposed that architecture to our needs.

Getting our CloudFormation templates to a place where we can launch an entirely separate test instance of the whole stack was a huge step forward from “welp, we always gotta debug in prod”. That goal was met about a month ago, and the stack went from “mysterious and murky” to “tractably refactorable and extensible”.

Stage two was digging deep enough into the graveyard to understand how the ECS parts fit together, so that we could swap EC2 for Fargate on a container-by-container basis. That was a painful transition but ultimately paid off – we’re well on our way, and can now add containerised tasks without also having to juggle a whole lot of maintenance of the EC2 boxes that are a velocity-sapping drag on our progress.

Stage 3 has been refactoring our ECS service templates into a standardised single template used by whole families of containerised tasks, from a spray of copypasta hard-coded replicas that (a) had to be curated by hand (much like our previous years’ containerised APIs has to be maintained one at a time), and (b) buried the lede on what unique configuration was being used in each service. Any of the goofy bits you need to know ahead of deploying the next container are now obvious and all in one place, the single master.yaml.

I can’t speak for everyone, but I’ve been pretty slavish about pushing all CF changes to the repo in branches and merging when the next round of stable/working infra has been reached. There’s always room for improvement, however:

  • smaller changes are always better
  • we could afford more folks who are trained and comfortable with the complex orchestration embedded in our infrastructure-as-code
  • which would mean being able to conduct good reviews before merge-to-master
  • I’d be interested in how we can automate the validation of commit-timed-upgrades (though that would require more than a single mixed-use environment).

Next up for us are tasks like:

  • refactoring all the containers into a separate stack (out of master.yaml)
  • parameterising the domains used for ALB routing
  • separating production assets from the development/staging environment
  • separating a core infra layer from the staging vs production side-by-side assets
  • refactoring the IAM provisions in our deployment (policies and attached roles)
  • pulling in more of the coupled resources such as DNS, certs and RDS into the orchestration source-controlled code
  • monitoring and alerting for real-time application health (not just infra-delivery health)
  • deploying *versioned* assets (not just :latest which becomes hard to trace backwards) automatically and version-locking the known-good production configuration each time it stabilises
  • upgrading all the 2017 and 2018 APIs to current deployment compatibility (looking for help here!)
  • assessing orchestration tech to address gaps or limitations in our current tools (e.g. YAML vs. JSON or TOML, pre-deploy validation, CF-vs.-terraform-vs-Kubernetes)
  • better use of tagging?
  • more use of delegated IAM permissions to certain pieces of the infra?

This snapshot of where we’re at doesn’t capture the full journey of all the late nights, painful rabbit holes and miraculous epiphanies

Occupied Neurons, Santa edition: lessons for software engineering

How to Be An Insanely Successful Software Manager

https://hackernoon.com/how-to-be-an-insanely-successful-software-manager-13efe08fd890

Aside from a few dubious quotes and phrasings, I believe someone channeled my life when they wrote this.  (Is that why I smelled burning sulfur recently?)  When the goal of a software org is getting the most value into customers hands as quickly as possible, shaving down every point of friction between “User Story” and “running in production” is an obsessive mission.

No one does phrasing better than Sterling Archer

Benefit vs Cost: How to Prioritize Your Product Roadmap

https://www.productplan.com/how-to-prioritize-product-roadmap/

I’ve been a data-driven, quantitative prioritization junkie in my Product work for years. When you want to have a repeatable, defensible, consensus-able (?) way for everyone to see what are the most valuable items in your Backlog, you ought to invest in a way to estimate Business Value just as much as you need the engineering team to estimate Effort.  Makes planning and communicating much easier in the long run.

Specific methodologies are a reflection of the rigour and particular characteristics an organization derives value for its customers and shareholders.  A heavily regulated industry might use “Regulatory Compliance” as a double-weighted factor in their ‘algorithm’; an internal IT team might focus .   Many teams put emphasis on Estimated Revenue Impact and Reducing Customer Churn, and I’ve personally ensured that UX (“Expected Frustration Reduction”) has a place at the table.  Numeric scales, “high-medium-low”, “S-M-L-XL” or “Y/N” can all factor in, to whatever degree of rigour is necessary to sufficiently order and prioritize your backlog – don’t overengineer a system when half as much effort will get a useful starting place for the final “sorting negotiation” among stakeholders.

 

Introduction to ES6 Promises

http://jamesknelson.com/grokking-es6-promises-the-four-functions-you-need-to-avoid-callback-hell/

Been hearing about Promises and async/await from my engineering colleagues for ages.  Conceptually they’re a great advancement – making code more efficient, reflecting the unpredictable nature of distributed software systems, breaking the serialization bias of every new programmer yet again.

However, the truth of implementing Promises, for those who have never wrangled such code, is far more complex than I expected.  Just reading the explanations by accomplished programmers, with all their multi-layered assumptions and skipping-ahead-without-clarification, makes me feel dense in a way that doesn’t have to be true.  I’m sure if every element of the canonical use of a Promise object was explained (where it’s used, in what order, by what consumer), it would be much easier to get it to work.  I’ll keep hunting for that pedagogical example.

GraphQL vs REST Overview

https://philsturgeon.uk/api/2017/01/24/graphql-vs-rest-overview/

I’m hearing a lot of developers extoll the virtues of GraphQL in their side projects (and professional work, where they have room to advocate up).  I haven’t managed a Product shipping GraphQL services yet, so I’ve been curious what the folks already implementing these are learning.

One problem this article highlights is “deprecation” – when is it time to no longer support a field or endpoint in your API?  The latter is easy to see how many requests you’re currently receiving; the former is trickier in a REST environment, and GraphQL supports “sparse field sets”.

The question I don’t see addressed here is: does GraphQL require every request to specify every field they’re going to obtain?  Or is there also support to request all fields (cf. SELECT * from TABLE), in which case that benefit quickly vanishes?  If only some of your requests specify which fields they’re using, and the rest just demand them all, then you still don’t know whether that field you want to depreciate is up for grabs, nor which users are still using it.  You can infer some educated guesses based on the data you do have, but it’s still down to guesswork.

(Edit: I’ve concluded that fields must be explicit in GraphQL requests)

REST is the New SOAP

https://medium.com/@pakaldebonchamp/rest-is-the-new-soap-97ff6c09896d

OK OK I get it – REST is challenging in many ways when trying to deal with the reality of API behaviours.  Thank you for writing an article that outlines in specific what your problems are.  (OTOH, wouldn’t it be nice to see an article that acknowledges the problems *and* extolls the remaining virtues – see author’s own words, “I don’t doubt that some smart people out there will provide cases where REST shines”?  Or even better, talks about when to use this solution and when to use a solution that works better for a specified scenario/architecture – and not just offhandedly mention something they “heard that one time”?)

And why do I have that itchy feeling in the back of my brain that newer alternatives like GraphQL will put us in a state of complexity that we ran from the last time we did this to ourselves in the name of “giving ourselves all the tools we might ever need” (aka SOAP)?

It’s smart to select a relevant architecture for the problem space – it just makes me worry every time I watch someone put in place all sorts of “just in case” features that they have no need for now – and can’t even articulate a specific problem for which this is a great solution – but are sure there’ll be some use for in the future.  I haven’t delved deeply enough into GraphQL (obviously) but my glancing analysis made it seem much more flexible – and the last time I saw “eminently flexible” was when I saw OAuth 2 described to me as “puts all the grenades you’ll ever need in your hands and pulls the pins”.

Kitty Quinn captures my unease very well here, and quotes Allen Sherman to boot:

‘Cause they promise me miracles, magic, and hope,
But, somehow, it always turns out to be SOAP

When will DevSecOps resemble DevOps?

https://www-forbes-com.cdn.ampproject.org/c/s/www.forbes.com/sites/jasonbloomberg/2017/11/20/mitigate-digital-transformation-cybersecurity-risk-with-devsecops/amp/

Another substance-free treatise on the glories of DevSecOps.

“Security is everyone’s job”, “everyone should care about security” and “we can’t just automate this job” seems to be the standard mantra, a decade on.

Which is entirely frustrating to those of us who are tired of security people pointing out the problems and then running as soon as there’s talk of the backbreaking labour of actually fixing the security issues, let alone making substantive system improvements that reduce their frequency in the future.

Hell, we even get a subheading that implies it’ll advance security goals in a CI/CD world: “The Role of Tooling in DevSecOps”. Except that there’s nothing more than a passing wave hello to Coverity (a decent static analysis vendor, but not the start nor the finish of the problem space) and more talk of people & process.

Where’s the leading thinkers on secure configuration of your containers? Where’s the automated injection of tools that can enforce good security IAM and correct for the bad?

I am very tired of chasing Lucy’s football:

lucy-football

I’m tired of going out to DevSecOps discussions at meetups and conferences and hearing nothing that sounds like they “get” DevOps.

DevOps works in service of the customers, developers and the business in helping to streamline, reduce the friction of release and make it possible to get small chances out as fast and frequently as possible.

I’ve asked at each of those discussions, “What tools and automation can you recommend that gets security integrated into the CI/CD chain?”

And I’ve heard a number of unsatisfying answers, from “security is everyone’s job and they should be considering it before their code gets committed” all the way through to “we can’t talk about tools until we get the culture right”. Which are all just tap-dancing dodges around the basic principle: the emperor has no clothes.

If DevSecOps is nothing more than “fobbing the job off on developers” and “we don’t recommend or implement any tools in the CI/CD chain”, then you have no business jumping on the DevOps bandwagon as if you’re actively participating in the process.

If you’re reliant merely on the humans (not the technology) to improve security, and further that you’re pushing the problem onto the people *least* expert in the problem space, how can you possibly expect to help the business *accelerate* their results?

Yes I get that DevOps is more than merely tools, but if you believe Gene Kim (as I’m willing to do), it’s about three principles for which tools are an essential component:

  1. Flow (reduce the friction of delivery) and systems thinking (not kicking the can down to some other poor soul)
  2. Amplify feedback loops (make it easy and obvious to learn from mistakes)
  3. Create a culture of learning from failure.

Now, which of those does your infosec approach support?

Hell, tell me I’m wrong and you’ve got a stack of tooling integrated into your DevOps pipeline. Tell me what kinds of tools/scripts/immutable infrastructure you’ve got in that stack. I will kiss your feet to find out what the rest of us are missing!

Edit: thoughts

  • Obviously I’m glossing over some basic tools everyone should be using: linters.  Not that your out-of-the-box linter is going to directly catch any significant security issues, no – but that if you don’t even have your code following good coding standards, how the hell will your senior developers have the attention and stamina to perform high-quality, rapid code reviews when they’re getting distracted by off-pattern code constructions?
  • Further, all decent linters will accept custom rules, disabled/info-only settings to existing rules – giving you the ability to converge on an accepted baseline that all developers can agree to follow, and then slowly expand the footprint of those rules as the obvious issues get taken care of in early rounds.
  • Oh, and I stumbled across the DevSecCon series, where there are likely a number of tantalizing tidbits

Edit: found one!

Here’s a CI-friendly tool: Peach API Security

  • Good news: built to integrate directly into the DevOps CI pipeline, testing the OWASP Top Ten against your API.
  • Bad news: I’d love to report something good about it, but the evaluation experience is frustratingly disjointed and incomplete.  I’m guessing they don’t have a Product Manager on the job, because there are a lot of missing pieces in the sales-evaluation-and-adoption pipeline:
    • Product Details are hosted in a PDF file (rather than online, as is customary today), linked as “How to Download” but titled “How to Purchase”
    • Most “hyperlinks” in the PDF are non-functional
    • Confusing user flow to get to additional info – “Learn More” next to “How to Download” leads to a Data Sheet, the footer includes a generic “Datasheets” link that leads to a jumbled mass over overly-whitespaced links to additional documents on everything from “competitive cheatsheets” to “(randomly-selected-)industry-specific discussion” to “list of available test modules”
    • Documents have no common look-and-feel, layout, topic flow or art/branding identity (almost as if they’re generated by individuals who have no central coordination)
    • There are no browseable/downloadable evaluation guides to explain how the product works, how to configure it, what commands to use to integrate it into the various CI pipelines, how to read the output, example scripts to parse and alert on the output – lacking this, I can’t gain confidence that this tool is ready for production usage
    • No running/interrogable sample by which to observe the live behaviour (e.g. an AWS instance running against a series of APIs, whose code is hosted in public GitHub repos)
  • I know the guys at Deja Vu are better than this – their security consulting services are awesome – so I’m mystified why Peach Tech seems the forgotten stepchild.

Edit: found another!

Neuvector is fielding a “continuous container security” commercial tool.  This article is what tipped me off about them, and it happens to mention a couple of non-commercial ideas for container security that are worth checking out as well:

Edit: and an open source tool!

Zed Attack Proxy (ZAProxy), coordinated by OWASP, and hosted on github.  Many automatable, scripted capabilities to search for security vulnerabilities in your web applications.

 

 

The Equifax breach – reckless endangerment of the US citizenry

UN-fucking-believable. I was hoping that this would turn out to be a situation where at the very least, Equifax had built defense-in-depth measures to limit the amount or type of information someone *could* get if an attacker exploited one of the innumerable vulnerabilities that exist on every modern software platform.

Nope – pretty much EVERY piece of sensitive personal data they have on more than half the US adult population was exposed as a result of this attack. Everything that any reasonable check of your identity or financial fitness would use to verify someone is you. Pretty nearly all the info a malicious individual would use to impersonate you, to obtain loans in your name, or file a tax return to get a refund, or screw with you life in many other highly-damaging ways.

Some choice quotes from https://arstechnica.com/information-technology/2017/09/why-the-equifax-breach-is-very-possibly-the-worst-leak-of-personal-info-ever/:

By providing full names, Social Security numbers, birth dates, addresses, and, in some cases, driver license numbers, it provided most of the information banks, insurance companies, and other businesses use to confirm consumers are who they claim to be.

That means well more than half of all US residents who rely the most on bank loans and credit cards are now at a significantly higher risk of fraud and will remain so for years to come.

Meanwhile, in the hours immediately following the breach disclosure, the main Equifax website was displaying debug codes, which for security reasons, is something that should never happen on any production server, especially one that is a server or two away from so much sensitive data. A mistake this serious does little to instill confidence company engineers have hardened the site against future devastating attacks [editorializing:…or even that the company’s engineers have half a clue what they can do to prevent the rest of the US’ personal data from leaking – if there’s even any left in their databases left to find].

The management and executives of this company should not only resign, but be brought on charges of criminal, reckless negligence on behalf of all Americans. They (along with the other two credit reporting agencies, and dozens of grey-market data hoarders) are stewards and power brokers over our lives, central/single points of failure in an economy that is nearly all digital, and which so fragily transacts on such thin premises of trust and explicit, positive assertions of identity.

We should not only be scared of how terribly their negligence endangers our lives for the rest of our lives, but be rationally and irrationally angry that the lobbyists and oligarchs have set up a system where these careless morons can and will walk away with a slap on the wrists, a cost-of-doing-business fine and strictures, for foreseeably ruining millions of lives and livelihoods.

What to do

I froze my credit after one of the big health insurer breaches a while back, and so far my life hasn’t been significantly inconvenienced – but the very fact that we each are forced to opt in to this measure, and insult-to-injury forced to pay for the privilege of preventing something none of us asked for, is just downright Mafia tactics.

You should probably freeze your credit too ASAP, because even if you weren’t affected this time, inevitably you were in the past or will be in the future. This brittle negligence and lack of accountability is what the US economy runs on

ImportError: No module named ‘rest_framework_swagger’

Summary

Building our Django app locally (i.e. no Docker container wrapping it) works great. Building the same app in Docker fails. Hint: make sure you know which requirements.txt file you’re using to build the app.  (And get familiar with the -f parameter for Docker commands.)

Problem

When I first started build the Docker container, I was getting the ImportError error after the container successfully builds:

ImportError: No module named 'rest_framework_swagger'

Research

The only half-useful hit on StackOverflow was this one, and it didn’t seem like it explicitly addressed my issue in Docker:

http://stackoverflow.com/questions/27369314/django-rest-framework-swagger-ui-importerror-no-module-named-rest-framework

…And The Lightning Bolt Struck

However, with enough time and desperation I finally understood that that article wasn’t wrong either.  I wasn’t using the /requirements.txt that contained all the dependencies – I was using the incomplete/abandoned /budget_proj/requirements.txt file, which lacked a key dependency.

Aside

I wasn’t watching the results of pip install closely enough – and when running Docker-compose up --build multiple times, the layer of interest won’t rebuild if there’s no changes to that layer’s inputs. (Plus this is a case where there’s no error message thrown, just one or two fewer pip installs – and who notices that until they’ve spent the better part of two days on the problem?)

Detailed Diagnostics

If you look closely at our project from that time, you’ll notice there are actually two copies of requirements.txt – one at the repo root and one in the /budget_proj/ folder.

Developers who are just testing Django locally will simply launch pip install -r requirements.txt from the root directory of their clone of the repo.  This is fine and good.  This is the result of the pip install -r requirements.txt when using the expected file:

$ pip install -r requirements.txt 
Collecting appdirs==1.4.0 (from -r requirements.txt (line 1))
 Using cached appdirs-1.4.0-py2.py3-none-any.whl
Collecting Django==1.10.5 (from -r requirements.txt (line 2))
 Using cached Django-1.10.5-py2.py3-none-any.whl
Collecting django-filter==1.0.1 (from -r requirements.txt (line 3))
 Using cached django_filter-1.0.1-py2.py3-none-any.whl
Collecting django-rest-swagger==2.1.1 (from -r requirements.txt (line 4))
 Using cached django_rest_swagger-2.1.1-py2.py3-none-any.whl
Collecting djangorestframework==3.5.4 (from -r requirements.txt (line 5))
 Using cached djangorestframework-3.5.4-py2.py3-none-any.whl
Requirement already satisfied: packaging==16.8 in ./budget_venv/lib/python3.5/site-packages (from -r requirements.txt (line 6))
Collecting psycopg2==2.7 (from -r requirements.txt (line 7))
 Using cached psycopg2-2.7-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting pyparsing==2.1.10 (from -r requirements.txt (line 8))
 Using cached pyparsing-2.1.10-py2.py3-none-any.whl
Collecting requests==2.13.0 (from -r requirements.txt (line 9))
 Using cached requests-2.13.0-py2.py3-none-any.whl
Requirement already satisfied: six==1.10.0 in ./budget_venv/lib/python3.5/site-packages (from -r requirements.txt (line 10))
Collecting gunicorn (from -r requirements.txt (line 12))
 Using cached gunicorn-19.7.0-py2.py3-none-any.whl
Collecting openapi-codec>=1.2.1 (from django-rest-swagger==2.1.1->-r requirements.txt (line 4))
Collecting coreapi>=2.1.1 (from django-rest-swagger==2.1.1->-r requirements.txt (line 4))
Collecting simplejson (from django-rest-swagger==2.1.1->-r requirements.txt (line 4))
 Using cached simplejson-3.10.0-cp35-cp35m-macosx_10_11_x86_64.whl
Collecting uritemplate (from coreapi>=2.1.1->django-rest-swagger==2.1.1->-r requirements.txt (line 4))
 Using cached uritemplate-3.0.0-py2.py3-none-any.whl
Collecting coreschema (from coreapi>=2.1.1->django-rest-swagger==2.1.1->-r requirements.txt (line 4))
Collecting itypes (from coreapi>=2.1.1->django-rest-swagger==2.1.1->-r requirements.txt (line 4))
Collecting jinja2 (from coreschema->coreapi>=2.1.1->django-rest-swagger==2.1.1->-r requirements.txt (line 4))
 Using cached Jinja2-2.9.5-py2.py3-none-any.whl
Collecting MarkupSafe>=0.23 (from jinja2->coreschema->coreapi>=2.1.1->django-rest-swagger==2.1.1->-r requirements.txt (line 4))
Installing collected packages: appdirs, Django, django-filter, uritemplate, requests, MarkupSafe, jinja2, coreschema, itypes, coreapi, openapi-codec, simplejson, djangorestframework, django-rest-swagger, psycopg2, pyparsing, gunicorn
 Found existing installation: appdirs 1.4.3
 Uninstalling appdirs-1.4.3:
 Successfully uninstalled appdirs-1.4.3
 Found existing installation: pyparsing 2.2.0
 Uninstalling pyparsing-2.2.0:
 Successfully uninstalled pyparsing-2.2.0
Successfully installed Django-1.10.5 MarkupSafe-1.0 appdirs-1.4.0 coreapi-2.3.0 coreschema-0.0.4 django-filter-1.0.1 django-rest-swagger-2.1.1 djangorestframework-3.5.4 gunicorn-19.7.0 itypes-1.1.0 jinja2-2.9.5 openapi-codec-1.3.1 psycopg2-2.7 pyparsing-2.1.10 requests-2.13.0 simplejson-3.10.0 uritemplate-3.0.0

However, because our Django application (and the related Docker files) is contained in a subdirectory off the repo root (i.e. in the /budget_proj/ folder) – and because I was an idiot at the time and didn’t know about the -f parameter for docker-compose , so I was convinced I had to run docker-compose from the same directory as docker-compose.yml – docker-compose didn’t have access to files in the parent directory of wherever it was launched.  Apparently Docker effectively “chroots” its commands so it doesn’t have access to ../bin/requirements.txt for example.

So when docker-compose launched pip install -r requirements.txt, it could only access this one and gives us this result instead:

Step 12/12 : WORKDIR /code
 ---> 8626fa515a0a
Removing intermediate container 05badf699f66
Successfully built 8626fa515a0a
Recreating budgetproj_budget-service_1
Attaching to budgetproj_budget-service_1
web_1 | Running docker-entrypoint.sh...
web_1 | [2017-03-16 00:31:34 +0000] [5] [INFO] Starting gunicorn 19.7.0
web_1 | [2017-03-16 00:31:34 +0000] [5] [INFO] Listening at: http://0.0.0.0:8000 (5)
web_1 | [2017-03-16 00:31:34 +0000] [5] [INFO] Using worker: sync
web_1 | [2017-03-16 00:31:34 +0000] [8] [INFO] Booting worker with pid: 8
web_1 | [2017-03-16 00:31:35 +0000] [8] [ERROR] Exception in worker process
web_1 | Traceback (most recent call last):
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 578, in spawn_worker
web_1 | worker.init_process()
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 126, in init_process
web_1 | self.load_wsgi()
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 135, in load_wsgi
web_1 | self.wsgi = self.app.wsgi()
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/app/base.py", line 67, in wsgi
web_1 | self.callable = self.load()
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 65, in load
web_1 | return self.load_wsgiapp()
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 52, in load_wsgiapp
web_1 | return util.import_app(self.app_uri)
web_1 | File "/usr/local/lib/python3.5/site-packages/gunicorn/util.py", line 376, in import_app
web_1 | __import__(module)
web_1 | File "/code/budget_proj/wsgi.py", line 16, in <module>
web_1 | application = get_wsgi_application()
web_1 | File "/usr/local/lib/python3.5/site-packages/django/core/wsgi.py", line 13, in get_wsgi_application
web_1 | django.setup(set_prefix=False)
web_1 | File "/usr/local/lib/python3.5/site-packages/django/__init__.py", line 27, in setup
web_1 | apps.populate(settings.INSTALLED_APPS)
web_1 | File "/usr/local/lib/python3.5/site-packages/django/apps/registry.py", line 85, in populate
web_1 | app_config = AppConfig.create(entry)
web_1 | File "/usr/local/lib/python3.5/site-packages/django/apps/config.py", line 90, in create
web_1 | module = import_module(entry)
web_1 | File "/usr/local/lib/python3.5/importlib/__init__.py", line 126, in import_module
web_1 | return _bootstrap._gcd_import(name[level:], package, level)
web_1 | ImportError: No module named 'rest_framework_swagger'
web_1 | [2017-03-16 00:31:35 +0000] [8] [INFO] Worker exiting (pid: 8)
web_1 | [2017-03-16 00:31:35 +0000] [5] [INFO] Shutting down: Master
web_1 | [2017-03-16 00:31:35 +0000] [5] [INFO] Reason: Worker failed to boot.
budgetproj_web_1 exited with code 3

Coda

It has been pointed out that not only is it redundant for the project to have two requirements.txt files (I agree, and when we find the poor soul who inadvertently added the second file, they’ll be sacked…from our volunteer project ;)…

…but also that if we’re encapsulating our project’s core application in a subdirectory (called budget_proj), then logically that is where the “legit” requirements.txt file belongs – not at the project’s root, just because that’s where you normally find requirements.txt in a repo.

Notes to self: merging my fork with upstream

It’s supposed to be as natural as breathing, right?  See a neat repository on Github, decide you want to use the code and make some minor changes to it right?  So you fork the sucker, commit some change, maybe push a PR back to the original repo?

Then, you want to keep your repo around – I dunno, maybe it’s for vanity, or maybe you’re continuing to make changes or use the project (and maybe, just maybe, you’ll find yourself wanting to push another PR in the future?).  Or maybe messages like this just bother your OCD:

github-branch-is-xx-commits-behind

Eventually, most developers will run into a situation in which they wish to re-sync their forked version of a project with the updates that have been made in “upstream”.

Should be dead easy, yes?  People are doing this all the time, yes?  Well, crap.  If that’s the case, then I’m an idiot because I’d tried this a half-dozen times and never before arrived at the beautiful message “This branch is even with…”.  So I figured I’d write it out (talk to the duck), and in so doing stumble on the solution.

GitHub help is supposed to help, e.g. Syncing a fork.  Which depends on Configuring a remote for a fork, and which is followed by Pushing to a remote.

Which for a foreign repo named e.g. “hackers/hackit” means the following stream of commands (after I’ve Forked the repo in GitHub.com and git clone‘d the repo on my local machine):

git remote add upstream git@github.com:hackers/hackit.git
git fetch upstream
git checkout master
git merge upstream/master

That last command will often result in a bunch of conflicts, if you’ve made any changes, e.g.:

git merge upstream/master
Auto-merging package.json
CONFLICT (content): Merge conflict in package.json
Auto-merging README.md
Auto-merging .travis.yml
CONFLICT (content): Merge conflict in .travis.yml
Auto-merging .babelrc
Automatic merge failed; fix conflicts and then commit the result.

At this point I temporarily abandon the command line and dive into my favourite editor (Visual Studio Code with a handful of extensions) to resolve the conflicting files.

Once I’d merged changes from both sources (mine and upstream), then it was a simple matter of the usual commands:

git add .
git commit -m "merged changes from upstream"
git push

And the result is…

github-branch-is-xx-commits-ahead

(No it wasn’t quite the “even” paradise, but I’ll take it.)

Aside

I somehow got myself into a state where I couldn’t get the normal commands to work.  For example, when I ran git push origin master, I get nowhere:

git push origin master
fatal: 'origin' does not appear to be a git repository
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

Or git push:

git push
ERROR: Permission to hackers/hackit.git denied to MikeTheCanuck.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

Then when I added upstream…:

git remote add upstream git@github.com:hackers/hackit.git

…and ran git remote -v…:

git remote -v
upstream git@github.com:hackers/hackit.git (fetch)
upstream git@github.com:hackers/hackit.git (push)

…it appears I no longer had a reference to origin. (No idea how that happened, but hopefully these notes will help me not go astray again.)  Adding back the reference to origin seemed the most likely solution, but I didn’t get the kind of results I wanted:

git remote add origin git@github.com:mikethecanuck/hackit.git
git remote -v
origin git@github.com:mikethecanuck/hackit.git (fetch)
origin git@github.com:mikethecanuck/hackit.git (push)
upstream git@github.com:hackers/hackit.git (fetch)
upstream git@github.com:hackers/hackit.git (push)
git push origin master
To github.com:mikethecanuck/hackit.git
 ! [rejected]        master -> master (fetch first)
error: failed to push some refs to 'git@github.com:mikethecanuck/hackit.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

 

And when I pushed with no params, I went right back to the starting place:

git push
ERROR: Permission to hackers/hackit.git denied to MikeTheCanuck.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

(I finally rm -rf‘d my forked repo, cloned it again, and started over – that’s how I got to the first part of the article.)

How Do I Know What Success Looks Like?

I was asked recently what I do to ensure my team knows what success looks like.  I generally start with a clear definition of done, then factor usage and satisfaction into my evaluation of success-via-customers.

Evaluation Schema

Having a clear idea of what “done” looks like means having crisp answers to questions like:

  • Who am I building for?
    • Building for “everyone” usually means it doesn’t work well for anyone
  • What problem is it fixing for them?
    • I normally evaluate problems-to-solve based on the new actions or decisions the user can take *with* the solution that they can’t take *without* it
  • Does this deliver more business value than other work we’re considering?
    • Delivering value we can believe in is great, and obviously we ought to have a sense that this has higher value than the competing items on our backlog

What About The Rest?

My backlog of “ideas” is a place where I often leave things to bake.  Until I have a clear picture in my mind who will benefit from this (and just as importantly, who will not), and until I can articulate how this makes the user’s life measurably better, I won’t pull an idea into the near-term roadmap let alone start breaking it down for iteration prioritization.

In my experience there are lots of great ideas people have that they’ll bring to whoever they believe is the authority for “getting shit into the product”.  Engineers, sales, customers – all have ideas they think should get done.  One time my Principal Engineer spent an hour talking me through a hyper-normalized data model enhancement for my product.  Another time, I heard loudly from many customers that they wanted us to support their use of MongoDB with a specific development platform.

I thanked them for their feedback, and I earnestly spent time thinking about the implications – how do I know there’s a clear value prop for this work?

  • Is there one specific user role/usage model that this obviously supports?
  • Would it make users’ lives demonstrably better in accomplishing their business goals & workflows with the product as they currently use it?
  • Would the engineering effort support/complement other changes that we were planning to make?
  • Was this a dealbreaker for any user/customer, and not merely an annoyance or a “that’s something we *should* do”?
  • Is this something that addresses a gap/need right now – not just “good engineering that should become useful in the future”?  (There’s lots of cool things that would be fun to work on – one time I sat through a day-long engineering wish list session – but we’re lucky if we can carve out a minor portion of the team’s capacity away from the things that will help right now.)

If I don’t get at least a flash of sweat and “heat” that this is worth pursuing (I didn’t with the examples mentioned), then these things go on the backlog and they wait.  Usually the important items will come back up, again and again.  (Sometimes the unimportant things too.)  When they resurface, I test them against product strategy, currently-prioritized (and sized) roadmap and our prioritization scoring model, and I look for evidence that shows me this new idea beats something we’re already planning on doing.

If I have a strong impression that I can say “yes” to some or all of these, then it also usually comes along with a number of assumptions I’m willing to test, and effort I’m willing to put in to articulate the results this needs to deliver [usually in a phased approach].

Delivery

At that point we switch into execution and refinement mode – while we’ve already had some roughing-out discussions with engineering and design, this is where backlog grooming hammers out the questions and unknowns that bring us to a state where (a) the delivery team is confident what they’re meant to create and (b) estimates fall within a narrow range of guesses [i.e. we’re not hearing “could take a day, could take a week” – that’s a code smell].

Along the way I’m always emphasizing what result the user wants to see – because shit happens, surprises arise, priorities shift, the delivery team needs a solid defender of the result we’re going to deliver for the customer.  That doesn’t mean don’t flex on the details, or don’t change priorities as market conditions change, but it does mean providing a consistent voice that shines through the clutter and confusion of all the details, questions and opinions that inevitably arise as the feature/enhancement/story gets closer to delivery.

It also means making sure that your “voice of the customer” is actually informed by the customer, so as you’re developing definition of Done, mockups, prototypes and alpha/beta versions, I’ve made a point of taking the opportunity where it exists to pull in a customer or three for a usability test, or a customer proxy (TSE, consultant, success advocate) to give me their feedback, reaction and thinking in response to whatever deliverables we have available.

The most important part of putting in this effort to listen, though, is learning and adapting to the feedback.  It doesn’t mean rip-sawing in response to any contrary input, but it does mean absorbing it and making sure you’re not being pig-headed about the up-front ideas you generated that are more than likely wrong in small or big ways.  One of my colleagues has articulated this as Presumptive Design, whereby your up-front presumptions are going to be wrong, and the best thing you can do is to put those ideas in front of customers, users, proxies as fast and frequently as possible to find out how wrong you are.

Evaluating Success

Up front and along the way, I develop a sense of what success will look like when it’s out there, and that usually takes the form of quantity and quality – useage of the feature, and satisfaction with the feature.  Getting instrumentation of the feature in place is a brilliant but low-fidelity way of understanding whether it was deemed useful – if numbers and ratios are high in the first week and then steadily drop off the longer folks use it, that’s a signal to investigate more deeply.  The user satisfaction side – post-hoc surveys, customer calls – to get a sense of NPS-like confidence and “recommendability” are higher-fidelity means of validating how it’s actually impacting real humans.