Agile Community

All About Agile | Agile Development Made Easy!

Hi all. One area I have always struggled with over the years is metrics. Since doing agile development, Velocity has given me an indicative measure of output, but it's not really possible to use as a benchmark between teams, so it gives me no real sense of whether the output is good or bad, just whether it's better or worse than before, which admittedly is still very useful. We've also been using a metric we call 'reliability'. This is the number of story points delivered as a percentage of the number of points committed to in sprint planning. This gives us a reasonable gauge of how good the team is at delivering on their commitments. This is a major factor in stakeholder satisfaction because it measures how well we meet expectations, at least in terms of volume of work delivered. However it doesn't help at all to measure quality, either in terms of how well the software works or whether it's what the users really expected. This kind of qualitative measure has always been very tricky in software development I think and I've never quite managed to crack it.

I would be very interested to understand what other people have done with metrics and what you have found to be effective or ineffective, whether it relates to quality or any other aspect of software development?

Kelly.

Views: 47

Reply to This

Replies to This Discussion

Hi Mark. Me too. We were calling it 'predictability' at first but somehow 'reliability' just seemed to stick. I think the term reliability has some negative connotations, which is why I initially preferred predictability. The term predictability is also more in line with traditional project management principles. On the other hand, calling it reliability might be more motivating, as people don't want to be seen as unreliable, and many people (myself included) don't really want to be known as predictable!

About the idea of estimating tasks in the second part of Sprint Planning being waste, in some circumstances I would agree with you. You should only really plan what your Velocity suggests you can usually deliver. If that doesn't fill the available time when it comes to estimating the hours, it's because in reality the available time is really needed to deal with unidentified tasks, support coming in sideways, meetings, interruptions, problems, and under-estimated stories, for example. In which case estimating the tasks in hours doesn't really serve much purpose. I do also agree with you though, that discussing the stories, clarifying requirements, breaking down the tasks to invoke discussion about design, are in themselves valuable. It's just assigning hours to tasks that may be counter-productive in practice.

I blogged about this a little while ago -

Burndown User Stories Rather Than Tasks

Kelly.
The value of tasking is in learning and communication not prediction. New teams use tasking to be more realistic on what the work for a story really entails. That said, a mature team often does not need to task. When teams are having trouble with over committing I find that tasking can help eliminate the problem. Teams the understand their architecture and the problem can usually task very quickly.
I agree that teams that take too much time tasking can often fool themselves regarding how much they can do as you noted. Story velocity is a better predictor of what they will be able to finish.
(I think of tasks as what individuals do,stories as what teams do)
Thanks for your reply Jeff. I would agree with that. In my blog post, I acknowledge there is definitely value in the discussion. I have thought about this issue quite a bit over the last few years. This is the one part of Scrum that I've never quite reconciled in my mind. The second part of Sprint Planning that entails estimating the tasks in hours seems to contradict all the good things about Velocity and the self-correcting value of estimating in points.

I think my view is finally crystallising on this. There is value is discussing the user stories to clarify requirements (Sprint Planning part 1), of course, and there is value in breaking the stories into tasks (Sprint Planning part 2), because both of these discussions reveal important information about the user stories and the possible or intended design. However, there is little value is estimating the tasks in hours, as this can't realistically be used to predict what can be delivered and it just creates the urge to reconcile the available time with the planned work, which may (and often does in my experience) contradict what your Velocity tells you is actually realistic.

Kelly.
A task burndown which is initially derived from tasks and estimates is a useful tool for the team to use during the sprint. It helps them recognize what they are actually doing. Task burndowns give the team a way to recognize that they need to focus on fewer stories say in order to complete more than they might.
I had a team member consistently not 'have enough time' to complete her part of the work. By tasking she recognized that she was spending over half her time offering to do things for other people. She discovered this by noticing that during the daily standup she could not report on tasks that were part of stories the team had committed to.
One last point is that tasking is not to be used to compare against actuals. It is a quick form of determinng how much work is left to do in the sprint. The actual work of tasking and updating the time remaining should not take more than an hour or two in the Sprint Planning meeting and 5 minutes/team member during the sprint.
Kelly,

You hit a good point, measuring productivity alone isn't worth the effort: a team always hits the desired velocity, just drop some quality. I think there are two parts to quality, which you touch upon: quality of delivered software and fit for use of that same software. The team owns the quality of what is delivered, and measurements you can use are number of bugs discovered after go live, or after the iteration in which it was delivered. Build problems, number of open bugs all these metrics help a team understand how well they're doing. Fit for use is owned by the product owner: a team can build flawless software that the customer doesn't like. Measures are sales figures, complaints from users, ROI, etc.

Makes sense?

Hubert
Thanks Hubert, makes sense. You have made some great points there which I think have added some real clarity. Like the balanced scorecard concept, you always need to have counter-measures to alleviate the problem of people 'playing the system' and delivering one metric at the expense of another aspect that isn't measured. I think that's why I've never quite cracked the metrics problem as well as I'd like. It's a complex issue.

Kelly.
It's a complex issue, but a complex answer is often not working. Keep it simple, find two or three metrics that help the team. And recognize whether the team wants to take this into their own hands, don't manage them to do so. If you have to manage your team to work on quality then you have a motivation problem at haqnd, which isn't solved by more metrics.

--Hubert
Actually, software development is complex enough, that you can't prevent dysfunction by adding another metric. The system of metrics will always be incomplete, and using that system as a primary measure of success will therefore lead to dysfunction.

The only way to prevent dysfunction is to not use metrics in a motivational way, but just as data points that are used to start, or to contribute to, a discussion.

"Measuring and Managing Performance in Organizations" is a good book on the topic.
Mark,

That is where agile - short iterations - come into play. That gives one opportunity to build what the product owner needs. Fast releases allow fast feedback from the market / customer, which is another feedback loop. Also the team knows (often) a lot about how the product can be optimized, so they are not without responsibilities here. But ultimately the product owner makes the hard choices as to where the scarce resources are spent.

--Hubert
Having early stories to flesh out the use cases is a common approach. Not to the level of detail that some use case folks do but to mock up the IF and clarify the flow. This also help identify the steps/stages and allows for some naming to occur. This improves communication in the future.
I use a metric called Velocity Variation. It looks at the difference in actual vs planned story points. The exact formula would be
(Actual Story points-Planned Story points)/Planned Story points

Actual story points are the number of user stories completed as per the “Done” criteria. Planned story points are the number of story points planned for the sprint at the beginning of the sprint in the sprint planning meeting.This data is captured for each sprint. It could be positive or negative. Negative values will be an area of concern , however high positive values could also be a concern, since it could be because of under estimation and not due to productivity improvement.

-Sowmya
Hi Sowmya. I like this metric. It's effectively a different flavour of our 'reliability' or predictability metric. In your case, zero is perfect, in ours 100% is perfect. A range of + or - 10% might be considered reasonable; as you say too high is as bad as too low and may also represent under-committing and a lack of ambition. In other words, it's easy to meet expectations if you set them low enough! Thanks for sharing...

Kelly.

RSS

© 2012   Created by Kelly Waters.

Badges  |  Report an Issue  |  Terms of Service