Categorizing Products the STYLIGHT Way


Machine learning is quite a hot topic nowadays, being used for a variety of purposes– from recommending products and movies to  analysis of business data. Here at STYLIGHT, we use Machine Learning for a completely different purpose.
Our partner shops send us thousands of products every day, and we are putting them into our own fashion categories ranging from very broad ones, e.g. clothing, to quite detailed ones, e.g. mini skirts. With almost 800 classes, this task can be very time-consuming when conducted manually; therefore, we have implemented a machine learning system which allows us to predict the fashion categories of the products in a fast, scalable and automated way.

System Overview

A classical machine learning architecture consists of the steps shown in Figure 1. For each block, an individually specialized algorithm can be chosen depending on which type of data is available (images, documents, numbers, etc.).

Machine learning @STYLIGHT

Figure 1: Classical machine learning architecture

Feature Extraction

This is the process of getting features which might help to predict the given product. For example, if one wanted to predict the sizes of a T-shirt, one would get the name, brand, width, height, color and price of a T-shirt. These attributes form a feature vector as can be seen in Figure 2. However, some information might be completely unnecessary and getting rid of it is part of the next step.

[name, brand, width, height, color, price]T

Figure 2: Feature vector of a T-shirt


Feature Selection

As stated above, in this step only important features are selected. This means features that carry no information are discarded. Imagine we want to predict the sizes of a T-shirt (S, M, L) and let us assume the size is only depending on the width and length of a T-shirt. All additional information like the price or brand are then completely unnecessary. Fewer features usually lead to less noise, less memory usage and faster times for classification. However, the tricky part is to find these relevant features.


Well known classifiers which are often used are k-NN, Naive Bayes, Decision Trees, Random Forests, Support Vector Machines and Neural Networks, just to name a few. Each one of these classifiers has its own strengths and weaknesses. Some have long training phases, some cannot deal with high dimensional feature spaces and others make some assumption about the a priori distribution of the data points. The selection of the classifier is a crucial part in the machine learning pipeline and can greatly influence the final outcome. Therefore, one should carefully think before choosing one of them.



Figure 3: Example of a classifier


While the basic principles for machine learning tasks as shown above are quite obvious, the devil is as always in the details.

For example, the outcome of a machine learning experiment is hardly predictable. Meaning you have to run an actual experiment in order to know how good your system will perform. Designing and conducting such tests is, however, very time consuming, but absolutely necessary to do some proper model selection and cross-validation. It is especially important to be able to make some assumptions about the generalizability of the machine learning system.

Here at Stylight, we perform a nested cross-validation to find the best model and its presumable performance with new unseen data. We tried different splits for training and testing sets and found out that a classical 80/20 ratio works best for us. However, this number can vary depending on the data but can be seen as a rule of thumb.

As we have an enormous amount of products to process we developed our own fully automated testing and validation environment in order to generate new models. It is heavily parallelized and has a low memory footprint. This allows us to test new ideas very quickly. In addition to that, we use confusion matrices amongst other things for visualizing our results. There we can see changes we make directly and identify fashion categories which might be harder to predict than others.
Last but not least, our attempts with machine learning are quite promising and we definitely want to do more in this direction. This is the reason why we are continually exploring new ways in which we can apply machine learning to make our daily work more efficient.


The Millennial Woman Codes: STYLIGHTers take on Rails Girls Munich

Rails Girls is global, non-profit volunteer community which hosts one-day workshops all over the world for women to learn Ruby on Rails, having the aim to “give tools and a community for women to understand technology and to build their ideas.”  When we at STYLIGHT found out about the Rails Girls event taking place dahoam in Munich, we felt immediately compelled to get involved, as it goes perfectly hand in hand with STYLIGHT’s vision to support the millennial woman and our values as an engineering company focused on diversity; therefore, we became sponsors for this great event, and in addition to supplying customized STYLIGHT “Hallo World!” goodie bags and STYLIGHT cake pops, we supported as coaches and as attendees.


My fellow STYLIGHT techie María and I decided to be coaches at Rails Girls, and we had five ambitious STYLIGHTers from the HR, PR, and QA departments who joined as eager-to-learn participants! Additionally, after finding out about STYLIGHT’s focus on engineering and that we had quite a few women in our IT department, the organizer asked that one of us present at the workshop about our experiences as women in tech.


STYLIGHT girl developers ready to go!


By 9am(!) Saturday morning, all the Rails Girls attendees, from high school girls to women in their thirties and forties, had arrived at the Wayra office in Munich, and with their STYLIGHT goodie bags in hand, munched on some Brezn and installed Ruby on Rails on their laptops.  After introductions and some beginner exploratory exercises, the girls were split into groups and coaches were assigned. Later in the morning, the coaches helped guide the girls through the beginning of the Rails Girls App Tutorial and answered their many insightful questions along the way!  By lunch many of the girls had already made good progress on the tutorial and were beginning to branch out with their own ideas.


STYLIGHT’s Software Developer Julie


During the lightning talks I was able to speak to the girls about my own late transition into computer science halfway through university and how my experiences had ultimately led me to STYLIGHT. I was also able to share with them what a day in the life of a STYLIGHT software developer is like and why at the end of the day, I can proudly say I love my job.

María and I held a short Q&A afterwards, and we were so happy to have had girls and other coaches coming up to us throughout the rest of the day to ask us more about our general experiences in engineering as women and also more specific questions about our jobs at STYLIGHT.  It was awesome to help the girls finish up their apps: many of them even got their apps deployed online– amazing to think girls with no prior programming experience could make their own live website in one short day!


Training session with Maria, Junior Android Developer at STYLIGHT
Training session with Maria, Junior Android Developer at STYLIGHT


The conclusion of the workshop involved the presentations of the finished apps and an after party where the girls got to enjoy champagne and STYLIGHT cake pops!


Rails Girls want sweet moments, too!

We at STYLIGHT were so proud to be involved in the success of the Rails Girls Munich workshop.  We feel so lucky as employees to have such supportive mentorship at STYLIGHT, and we were humbled to have had the opportunity to give back to the community.  It was an inspiration to meet so many amazing and ambitious women and other female developers in companies based in Munich, and we are furthermore encouraged that events like these do so much to inspire women into pursuing careers often seen as inaccessible or traditionally male-dominated. Just think about the fact that in the US the percentage of computing jobs held by women has actually fallen over the past 23 years, according to a new study of the American Association of University Women: in 2013, in fact, just 26 percent of computing jobs were held by women.

That’s something we at STYLIGHT  are definitely working on! And maybe we’ll even see some of the girls who joined the event as developers at STYLIGHT in the next few years!

In the meantime, we still have many open engineering positions.  Know any ladies up for the challenge?


Velocity vs. Cycle time – or ‘How to predict how much work will get done by a given time’

Both velocity and cycle time make predictions based on completed work in the past. This approach is sometimes referred to as ‘yesterday’s weather’. For the remainder of this post I will assume that this work is chunked into user stories. Basing predictions on past events assumes a stable (enough) context. In our case the most important factor here is a stable team.

In either case these user stories can be estimated – usually with story points – or split to roughly the same size and then just counted. The later being faster and more outcome-focussed. Assuming same size stories simplifies measurement and calculation in both cases and is just as accurate as using more precise estimates, but requires more care when creating stories. For the remainder of this post I will assume that user stories are roughly the same size and counted instead of using more detailed estimates. If you are a firm believer in estimates the following also holds true, but requires an extra step of converting user story points into number of stories.

Velocity measures completed stories per iteration. The unit is work per time, e.g. 4 stories per 2 week iteration.

Cycle time measures the amount of time passed working on a story. The unit is time per work, e.g. 2.5 days per story.

Velocity and cycle time can be (sorta) converted into each other (neglecting times in between finishing and starting user stories and in the simplified case of WIP = 1) and are therefore (kinda) equivalent in the information they carry. They merely represent different points of view on the same thing – how long does it take to get chunks of work done. For a more in depth discussion on the actual math, check out the comments on this blog post.


Another concept of planning is commonly mangled up in discussions about predictions, but is actually orthogonal to it: Iteration planning vs. variable input queues.

Iteration planning fixes the scope for the next iteration, e.g. 4 stories in the next 2 weeks. Velocity is often used in this context, since the unit of velocity, work per time, can be used to support the decision how much to forecast for the next iteration.

A variable input queue provides a steady stream of work. User stories can be re-sorted, added or removed anytime as long as work on them hasn’t started yet. This provides more flexibility compared to iteration planning. Measuring time per user story (cycle time) matches this well.

When using a variable input queue and no iteration planning, iteration goals and the associated commitment also disappear. This can be replaced with commitments to OKRs, which we do quarterly. Three months are long enough to accomplish significant things, while still short enough to have a sense of urgency.

Fixing the length of the input queue provides a mechanism to trigger refilling the queue as well as preventing planning to far into the future. The optimal queue length depends on story size, team capacity and predictability of stories. An amount of stories that everyone can easily keep in their heads (around 7) prevents overhead in reviewing and re-prioritizing the queue. When using a fixed input queue planning is triggered by an empty slot in the queue rather than in specific intervals. It also is finer-grained – possibly one story at a time. When integrating this in a daily standup an extra planning meeting can be avoided.

Additionally to tracking the time a story is worked on (cycle time) one can easily also measure the time from when a story is added to the queue until it is done (lead time). This makes it easier to get an idea by when a story will be finished once it is added. This is very effective if visualized at the end of the queue with something like “Your expected wait time today from this point is between x and y days.”, aka Disneyland wait time.


Velocity is usually used with iteration planning and cycle time with variable input queues. This feels more natural because measuring work per time with iterations is a good match, as is measuring time per work on a variable input queue. But there is nothing preventing you from combining it differently if it makes more sense in your context.

Discussions about velocity vs cycle time are a substitute discussion for iteration planning vs variable input queues. This is where the real differences are. Velocity and cycle time are merely different points of view on the same thing. They just happen to match one approach more than the other.


Two lessons the Infrastructure team stole from Ops, Dev and other departments at STYLIGHT

Posts by SysAdmins are usually rants. Or patronize you about someone’s philosophy. This blogpost won’t be either. It definitely could be both, but I will instead admit to shamelessly stealing from my colleagues. In the end, we use some pretty nice technologies and I’d like to give you a glimpse of what we’re doing at STYLIGHT Infra. These are two of the lessons I learned (stole and dubbed to be my own) from my peers.

Lesson 1: Configuration management is not exclusively for servers.

In Ops, our colleagues used Puppet for quite a while. They still use it, like it and will promote it if asked. However, as powerful as Puppet is, it has an equally steep learning curve and took some time until support for Windows arrived. Taking a closer look at the competition, Chef might have set the bar for integrating Windows by collaborating with Microsoft on their Desired State Configuration System for Powershell, but it is still to be considered a static Configuration Management System.

Being more recent players in the game, Ansible and Salt created some buzz as they took what was good from Puppet/Chef and extended that for a Remote Execution Engine. In a scenario that requires you not only to maintain reliable configurations but also ad hoc troubleshooting and fast access to remote machines, this comes in more than handy and seemed right away perfectly suited for our environment. For reasons I can’t fully remember anymore I ignored Ansible at the time entirely, spun up a Salt-Master and pushed the first Salt-Minions to my testing OU (GPO + script = wohoo). After the promising testing phase, I pushed it to all Clients in the Active Directory– even our (ever increasing) fleet of Macs are now Minions.

Right now Salt serves us in two ways:

  1.  Leveraging Salt States, we push custom applications to people and apply base sets of software to certain departments on login. We also set up a couple neat helpers such as an ELK stack or prototypes of potential tools.
  2. The real juice, however, lies in the Remote Execution Engine. Troubleshooting in a Windows world usually involves the GUI, and while it is nice to stay in touch with your colleagues by simply swinging by their desks, sometimes you just don’t want to/can’t afford the time. Powershell is neat for automating repetitive troubleshooting, and the Salt-Master serves as the central storage for the scripts. When necessary, we simply use Salt to run scripts on client machines to fix problems of all sorts. Installing fonts (and every Windows admin can tell you about the pain of doing so remotely!), installing/uninstalling Office or other applications. Even fine grained configurational changes can be either achieved out of the box using one of the (great) modules of Salt or by writing a script in Powershell.

The usage of Salt is not yet as extensive as it could be– that has to be admitted. The possibilities are there, and we look to migrate step-by-step from GPOs to Salt states where possible.

Lesson 2: Agile, SCRUM and Kanban preserve your sanity.

Agile principles and SCRUM are by their origins nothing SysAdmins, especially working on help desk or infrastructure tasks, would naturally see as something tailored for their daily work. You would see the developers at that fancy startup across the street sticking Post-it after Post-it to their walls and windows and keep wondering what the hell they are doing.

Well, not for us. The nature of the job dictates the distinction between projects and help desk. This is fairly easy– project related work screams to be tracked and managed using SCRUM (our awesome analog corner board pictured below) whereas a help desk is begged to be handled in a (digital) Kanban board.

awesome analog board

For both we use JIRA. Emails from our colleagues requesting help will be converted automatically to tickets in our help desk Kanban and all projects are handled using some parts of the SCRUM tool palette. To be honest with you, our Agile coach told me multiple times already that we created some mutated and only remotely related offspring of SCRUM, but for us it still works and has had a couple of positive effects for us:

  • The split solves the problem of justifying your attention. Should you handle help desk tickets first? Or is project work more important? Help desk tickets can be now tracked using a SLA and your project work is prioritized so you should always be in the clear which fire to extinguish first.
  • Transparency and awe for users. Walk-ins are often baffled by the projects currently displayed on our physical board and show appreciation.
  • Staying in sync with the business. A sprint planning every two weeks helps the product owner and other stakeholders to get their projects reprioritized when needed and still allow the bigger picture to be kept in mind.
  • Getting rid of hazardous processes. Even though the help desk runs as Kanban, the periodic retrospectives are used to reflect on the last two weeks.

As a final note I want to put a strong emphasis on something often simply forgotten by a lot of people working in the industry. As a SysAdmin your colleagues are a constant source of work. They might not understand networks (at all), probably bitch about the WiFi and crash your file server. However, don’t doubt their competence in their field of expertise – learn best practices and processes from your surrounding departments and evaluate them. Chances are the silver bullet is still for you to forge, but they might give you all the resources you will need.

Cross-functionality is a function over time

In my last blog post I described why we formed cross-functional business teams. In this blog post I am writing about team composition, that it changes over time and consequences of that. 

When we talk about the composition of cross-functional teams we usually have something like this in mind. The labels usually read developer, tester, designer and UX researcher or something along those lines. For the sake of this article we abstract from the specific role and just call them red, orange, yellow and brown experts.

constant expertise

This visualisation of the team composition is ignoring the fact that the amount of needed expertise to build a product changes over time. In reality it looks something more like this. 

needed expertise varies

During product development there might be a phase where there is a lot of yellow work needed (Feb) while sometime later there is almost none (Apr) and then it’s picking up again. 

There is a certain threshold up to which it makes sense to have someone with a specific expertise full-time on the team. If the needed expertise goes under that threshold that expert won’t be fully utilised. Which is OK, if it’s just a dip, but will get boring and frustrating if persistent. 


Looking at this, one might argue that the yellow expert should leave the team by mid-February and just be available to the team as needed. The orange expert joins the team around that time. The brown expert would leave sometime later around mid-March. This makes it effectively impossible to form a stable team that has the chance to gel and perform at it’s peak effectiveness.

Having T-shaped people on the team helps with this since they can help out in other disciplines than their own. This lowers the threshold in our graphical visualisation.

T-shaped: T-shaped people have two kinds of characteristics, hence the use of the letter “T” to describe them. The vertical stroke of the “T” is a depth of skill that allows them to contribute to the creative process. That can be from any number of different fields: an industrial designer, an architect, a social scientist, a business specialist or a mechanical engineer. The horizontal stroke of the “T” is the disposition for collaboration across disciplines.
IDEO CEO Tim Brown 

lower threshold

Now it makes sense to keep the yellow and brown experts for longer and bring on the orange one sooner.

Having M-shaped people on the team helps even more since they combine two or more needed disciplines. This makes it easier to stay above the threshold.

M-shaped: Building on top of the metaphor of T-shaped persons, M-shaped persons have expertise in two or more fields.

So, if our yellow expert was also an expert in the brown discipline, she would combine the areas below both of these lines resulting in the green line.

combined disciplines

Now, leaving the team because of under-utilisation is out of the picture.

Apart from having people on the team that are valuable in more than one discipline there are of course other options to deal with slack than leaving the team. How about some Kaizen? Helping someone else working on a stuck task, going to that conference, reading that book, finally doing that refactoring or writing that blog post are just a few of them.

Specialist teams are another option for experts that are having an effect here and there, but are not constantly needed on a team. In order to not create dependencies and thereby crippling autonomy of teams these specialist teams should be enablers and teachers helping teams. This means ownership stays with the teams, not with the specialists. At STYLIGHT we have for instance a platform team.


Forming stable cross-functional teams in the face of changing needs of expertise over time is not trivial. Being aware of this and having strategies how to deal with slack for a specific discipline (T-shaped people, Kaizen) still make it a viable strategy though. For us the advantages of cross-functional teams outweigh these difficulties.

Batch Size DOES matter

How does the batch size of work influence the performance of a (production) process? As a child I wanted to be a mad scientist, so now as an agile coach I conducted a little experiment to find out. Repeat it on your own and post us your findings.

I have seen it at one of the numerous agile gatherings I attended and have ever since repeated it several times.

Experiment Setup

Here is the list of things and people you need for the experiment.

  • a table
  • 20 coins
  • some cardboard
  • 4 people (the workers)
  • either
    • a video camera (a mobile phone would do) or
    • another 4 people (managers) with stopwatches

So here is our setup. I asked Julie (developer) and Marina (office management), Ben (our infrastructure guy), and Anselm (one of our founders) if they wanted to be part of a little game of coin flipping. Thanks again for your 10 minutes!

Our setup
Our setup

The Coin flipping Rules

There are just three simple rules.

  • Work in batches
  • Flip every coin of the batch
  • Pass on flipped batch to the next worker

An experiment is no experiment without collection of some data. :-)

What to Measure?

Let’s face it, the only thing that counts is value to the customer.

  • first value delivered to customer: the time between the first coin entering the system and when it is spilled out.
  • full value delivered to customer: time between first coin in to last coin out

And because we are a business we like to know the utilization of our workers.

  • Utilization: the time from the first coin in to the last coin out for each worker

All set?


Just watch the videos for an impression. The hard facts I extracted from the videos come later.

What do you think? Just by looking at it, I was amazed by the tremendous increase in throughput as batch size is reduced!

Hard Facts

Here are the results extracted from the videos by watching them over and over and over again.

It took 48,9 seconds to deliver the batch of 20 coins. On the other extreme it only took 24 seconds for the full value (20 coins) to be delivered when working in batches of 2. And the first value (2 coins) was delivered after 6 short seconds!


How faster were we compared with the one batch of 20 coins?


That means the first batch of 2 coins was delivered only 12,3% of the time it took to deliver value in the 20-coin batch. And it took less then half the time (49,1%) to deliver the full value in batches of 2 coins.


Now look at it as value over time. In the 20-coin batch, over the course of 50 seconds you would have 20 coins for 2 second (20*2 value over time). On the other extreme you would have the same value over time (even more) after 14 seconds in the batches of 2!


So, knowing that time is money :-), how much more value you would have over time with the different batch sizes? Hold your breath!


OK, with the one batch of 20 we had 20 coins for 2 seconds after 50 seconds. By reducing the batch size to 10 we got 10 coins for 8 seconds and 20 for 13 seconds. So the value over time cumulates to 340.  So by cutting the big batch in half we produced a gain in value over time of 850%!!! The smallest size of 2 would bring us to a plus of 1760%. Are you still a friend of big long running projects?

 It comes at a Cost


The cost is a higher utilization of the single worker. But from a company’s perspective this is not really bad, is it? I’d rather concentrate on a single task and get work done than being idle for 75% of my time.

Bottom Line

Smaller batches deliver value to the customer faster. Much faster. As a result there is more value over time for the customer. The utilization is also much better. So smaller batch sizes serve both your company and your customer.

The 3 Commandments of UX Research


Earlier this week, we decided to remove our UX researcher from the Scrum teams she was in. The primary reason was that the main person in charge of research (me!) is working with three teams; the Magazine, the Shopping, and Mobile team. Joining the Scrum meetings of all teams is time consuming and unnecessary, unless we are working directly together on a feature.
This will also free up time for improving the quality of the research. Which brings me to the point of this post: Having time to reflect on my experience in research so far, I’ve come up with 3 commandments that we will apply to UX research from now on:

1- We don’t do unplanned research, and we don’t ask unstudied questions. Research is there to help us inform design decisions (example: Does the text need to be highlighted? Should there be a time stamp?) Therefore, we need to plan the research at least 2 days ahead of time, to design better questions and recruit participants.

2- Research should not answer “Like/hate” questions. People constantly engage in things they claim to hate. Instead, research should focus on gathering useful insights.

3- The researcher does not work in isolation. Research should be conducted in tandem with a designer, developer, or product owner from the team. Because A) the interview partner allows the researcher to focus on what the user is saying (or not saying) and to let the conversation flow naturally, rather than running through a list of questions, half-writing, half-listening. And without that focus, the researcher is likely to miss out on some really valuable stuff. B) People who have a hand in collecting the insights will look for opportunities to apply them.

We have a responsibility to the people we’re designing for, and this starts with asking better questions. Then listening, really listening, to those who give us their time and feedback.

Why we formed cross-functional business teams

swiss-army-knifeLate last year we decided to divide our three development teams and have them join the six business teams. Previously we had a hard time coming up with meaningful OKRs for the development teams. We figured only goals with business outcomes would make sense. Having the engineers work together with business experts in the same teams seemed like the logical consequence.

The previous setup made it hard to prioritize the requests to the development teams from the different business teams. This got frustrating for pretty much everyone involved — especially those whose requests didn’t make the top of the backlog. Today this issue all but disappeared.

We also hoped this would make the business teams more independent and thus allow them to move faster while at the same time allowing the engineers to focus on one area of our business model. This feels a whole lot more like the spirit from the founding days.

The former teams included developers (backend, frontend, iOS, Android), designers and UX experts. One might call those in itself cross-functional development teams. Including business experts seemed again like the logical next step — effectively turning them into cross-functional business teams. From my experience in working with teams in several companies a common progression for cross-functionality tends to be: developers + testers + design + UX experts + business experts — in that order.

Having gained some experience with the new setup, we realised that the choice between specialist teams and cross-functional teams is not black or white, but rather a tradeoff — as per usual.

Pro cross-functional business teams

  • no handovers
  • faster learning about business
  • more innovation
  • shorter development cycles
  • broadened perspectives through diversity of experiences, expertise and knowledge
  • greater sense of purpose by working on the full (or at least a greater part) of the value stream

Pro specialist teams (aka silos)

  • get work done more efficiently when it can be described precisely and handovers are cheap
  • learning from specialists in same field
  • higher consistency of outcomes within silos
  • easier agreement with people that speak the same lingo

How to remedy the short-comings of cross-functional feature teams

  • use communities of practice (CoP) for knowledge sharing amongst specialists
  • express yourself in the lingo of the addressed person when talking to a specialist in another field
  • get to a novice level of understanding in the specialist fields of your team mates (“become” T-shaped)

Cross-functional teams rock!
Christina, Online Marketing Manager SEA

By now, nobody is questioning the general setup anymore of having dedicated developers working with business teams. What we have realized though, is that the skills needed within a team is not constant but varies. The need for a UX expert is much higher in the beginning of an epic (a bunch of coherent user stories), than it is towards the end; therefore, the concrete team composition is a choice we have to keep making. As we believe that permanent teams outperform short-lived ones, we are trying to make as few changes in team membership as possible. Having T- or M-shaped people helps with that. But more on that in another blog post.

Image by James Case

Enable Your Teams to Rapidly Ship and Operate Quality Software

How often do your development teams release to production? Who gets the alert in the middle of the night when everything crashes and burns? Do these questions make you uncomfortable or rather their answers? Or maybe you are already discussing changes to your current deploy process? Because it sucks, right? If you’re honest, it will always suck because it constantly needs to be adapted to the current business requirements.

Enter the “Platform Team”: a group of build & deploy experts that jumpstart your teams down the road to operational success while providing a safety net. And, no, I’m not referring to a System Administrator with a pager. Instead, I’m suggesting a three-ply construction of automation, containerization and monitoring.

But wouldn’t hiring a full stack developer be cheaper?

The amount of job ads for “Full-Stack Developers” is insane. Hunting for them is like searching for unicorns – the closest you’ll get is an interview with an insane, cone-ornamented horse telling you your search is at an end.

Take the time and effort to teach the folks you have right now. Let them know the most important job is customer satisfaction and this means much more than just writing code.

A platform team is composed of great teachers: test managers who understand ‘release-ready’ test coverage and how to automate it; sysadmins that can point out application bottlenecks and suggest monitoring tools.

How to infect cross-functional teams with the DevOps bug?

The culture of DevOps is founded upon interdisciplinary communication. What minimally was a developer and sysadmin talking about howto deploy and operate an application, now involves product owners and UX folks in the conversation. MVP has replaced monthly releases and the pace only quickens.

With automation becoming ever easier to employ and virtual servers costing pennies during a workday, now’s the time to ready teams for self-service. Here are the first three Self-Service Mantras:

Self-Reliance (aka “We can build it”)

Work with the developer – your customer – to get their requirements right (spoiler: they’ll change during implementation ;). Make the build and deploy commands executable in the online chat room – we use hubot. And if you don’t use an online chat go checkout hipchat or slack. These rooms are essential for improving situational awareness + simplicity + remote deploys.

Attach a minimal test harness to ensure the build really did what you wanted. Is it really the right code version? Does it start correctly and fulfill it’s primary goals? Jenkins + Saucelabs is an inexpensive 1-2 knockout punch for this!

Finally, get those builds auto-deployed to your testing environment. A simple, post-build hook from Jenkins in your chat room with the corresponding URL is a honeypot for product owners keen on signing off features for release. “Houston, we are ‘GO’ for launch!”

Self-Confidence (aka “We know what’s going on”)

  1. it takes:
    • 20-30 builds to work out build bugs (git commit hooks, right branches, etc.)
    • 20-30 more builds to perfect test harness (those damn outliers)
    • 20-30 more builds to prove auto deployment (yes, this is the right version)
  2. with ~100 builds, the team is confident
  3. they know how it works, why it breaks and how to fix it

Self-Responsibility (aka “I care about my users”)

Monitoring used to be black magic, but today, with tools like loggly, datadog and newrelic, it’s incredibly easy to get deep insight on your applications and servers. Most of these support direct alerts into your online chat room too, so the entire team can feel the pain they subject upon their hapless users. Seeing a long list of disturbing, red graphs with the first cup of coffee in the morning turns the daily standup into a Q&A session.

To provide first-level support, cross-functional teams need to be able to dig for root causes. Gathering data from all your monitoring services (you are using more than one right?), you can quickly narrow down potential problems until you’ve found what’s not working. This takes time and experience, so be patient and always available for pair-troubleshooting!

These mantras close the gap between developers and sysadmins and can be unified with the overarching principle “You build it. You run it.” Empowered developers take the time to build higher quality products, because they’re the ones getting the call during the night.

Scarcity lends Focus

We don’t have enough Platform Engineers to embed in each product team, so we loan them out for new projects as “on-site consultants”. Once they’ve gotten the engineers through the three phases above, they move on to the next construction site always with an eye on improving and adding to the company’s architectural platform and engineering excellence. This style of teaching full-stack development is much more rewarding and infinitely more successful.

There’s something to be said for time-boxing new technical initiatives. They tend to be streamlined applications that only do what’s needed. Since there’s not as much code or moving pieces involved, they’re also easier to maintain.

As you widen developers’ horizons to the world of automation and virtualization, a few may decide to pick up the standard and become platform engineers themselves. It’s a win-win situation for your employees and your business.

And what about those unsettling questions about your release process? With development teams responsible for building, shipping and operating their applications, it will suck a whole lot less!

Do the Right Things First

As an agile coach I keep telling people that they should do one thing at a time. Stop starting, start finishing is the mantra. But what should be the first thing to start with? What is important? What delivers the most value? Here are two simple techniques that could help.

Two Dimensions: Eisenhower Matrix

The US general and 34th president of the USA is said to have used the following matrix to decide what to do now, what to do next, what to delegate, and what to drop.

Eisenhower Matrix
Eisenhower Matrix

To do it like “Ike” just go through all your tasks/things on your desk and put it in one of the four quadrants of the matrix.

So you got two dimensions to categorize your tasks. A very simple and handy method. For starters. But what if the world around you is a bit more complicated? Why not try a more sophisticated but still handy approach?

Weighted Shortest Job First

I stumbled across this idea (again) when I had a workshop on the Scaled Agile Framework (SAFe). It all starts with a form like this.

The Form


Write your things to do into the task column. Make a brain dump. That’s it for now. Let’s focus on the next columns. What do they all mean?

User/Business Value

What is the user or business value generated by doing the task? As lean thinkers we always focus on added value to the customer, right?

Time Criticality

Is the task time critical? Do I need to do it now, or can it wait?


RR is risk reduction, OE means opportunity enablement. In doing the task, do I reduce any risk to myself, the customer, my team, my company? In doing the task, do I open another window of opportunity?

Job Size

No explanation needed, right?

Filling the Columns

You fill the rest of the form column by column (this is important, otherwise it is more likely that you cheat on yourself). Let’s start with the user or business value and take it as an example.

Identify the task with the least user/business value and assign it a 1. Than rank the other tasks relatively to that. Use the modified Fibonacci sequence of 1, 2, 3, 5, 8, 13, 20, 40, 100. If another task provides five times the user value, give it a 5. Fill the whole column. These values are only estimates. Don’t hold a meeting with yourself while filling the column. Use your gut feeling. Pick the next column. Assign the least time critical task a 1… fill all columns accordingly.

Calculate the WSJF Value

To calculate the WSJF add the user/business value, the time criticality, and the RR/OE of a task and divide it by its size.

WSJF = (User or Business Value + Time Criticality + RR or OE) / Job Size

What Should I Do Next?

Pick the task with the highest WSJF and get it done, tick it off your list, continue with the next highest WSJF and so forth. Yeah, right. As if I could tick off my list without new things coming in. If this happens, you just add them to your list and reassess the whole list. That means delete all the values and start filling the columns again.

Is it Really a Good Idea?

Let’s put it to the test. For that little experiment we reduce it to just two dimensions (value and time). Let’s assume we have a small task (pink) that would provide a feature which provides a value of $200 everyday once rolled out. It would take a day to complete. Now we have a second feature worth $400 (red). As we all know bigger things tend to be more risky and take more coordination. So let’s assume it will take 3 days to complete. Our intuition tells us to tackle the big feature first because it provides more value. Well, see (a.k.a. count) for yourself!

The two ways to tackle both features. Shortest first and biggest first.
The two ways to tackle both features. Shortest first and biggest first.
Money made after both features are delivered.
Money made after both features are delivered.

What is the picture after one day we put both features into production? How much did we earn?

Big first: $1000,-
Short first: $1200,-

By just acting counter intuitive and having done the small feature first, we made $200 more!

The blog about the technical challenges and solutions of STYLIGHT