Find and measure what’s important (to you), only call it a KPI afterwards
Key Performance Indicators are a funny topic. Okay, funny in my sense of humor I guess, especially cause I have people coming to me and asking “Jörg, what should we measure, which KPIs do we need?”, but I normally can’t give them an answer and the “Well, what’s your business objective?” does not seem to do the trick
Well, KPI is a three work acronym, that should make it easy to get to the bottom of this
- Key : Basically meaning a central thing, like in “Key to the Kingdom”, this also emphasises how important that thing is. Also it means it is a pretty atomic thing, there is nothing you can split a key into
- Performance: performance tells you how good or bad you did something. It’s basically like what you discuss in your performance review meetings with your boss (or if you are the boss, the thing you discuss with your employees)
- Indicators: Well, most likely a number, more precise a number which indicators your key performance in this case
Okay, all jokes aside, Key Performance Indicators a really unique to your company. Yes, sorry, I know. KPI should reflect whatever goal or objective you want to achieve. Let’s take a look into some examples, they will come from the Security Operations world, well because that’s what I know best I guess
Let’s assume you run a SOC and one of your key goals is the SLA with a 1h response time to incidents reported by customers or such. In this case your Indicators might cover such areas
- Mean Time to Response (MTTR): Always a good start, basically you measure the “Time of Detection” — “Time to Remediation” and you know that on average it takes you this long to remediate an incident. If it gets very close to 1h it will be a red flag.
- There might be a taxonomy problem, cause it also could be that your definition of “MTTR” is different. I have seen companies where Response really meant the first response to an inquiry, so basically as soon as the ticket creation message is send, the Response Time is complete. In such a case MTTR may mean Mean Time to Remediate for you.
- Mean Time to Detect (MTTD): Taking QRadar terms, this is basically the time between first indicator and Offense creation
All these are argumentably Service Level Agreements rather then Key Performance Indicators. None the less these SLA data points can contribute to your overall KPI’s.
When it comes to “real” KPI’s I like to use these instead
- Incidents per hour/day/week/month/year
It is important to know how busy your SOC is. With this data you are able to identify trends and optimize the SOC performance. If incidents per day beat the number of the average incident count of the last months, there might be something going on and you may consider scaling the SOC up.
- Number of free analysts
If your SOC isn’t fully automated, and no SOC actually is, the analyst is your most meaningful resource. Knowing how much buffer you have for dealing with incidents is crucial knowledge.
- Incidents per Incident type
This lets you analyse if your detection rules are good or bad, as well as give an overview if you need to spend time on rule tuning. If a certain Incident type makes up 90% of your alerts, you might want to check why.
- Time spend on an incident
Well, same as for Incidents per Incident type. This is a good starting point to prioritize rule tuning and add automation into the process.
- False Positive ratio
Last but not least, how valuable are your detection rules. If an analyst closes 90% of the incidents per day as false positives, that is great from a security point of view, but it really leads to alert fatigue. You might also consider auto-closing of such incidents.
I hope that gets you going for a while, looking around in your organisation will lead to many more and more specific things to record and report on.
That’s it again, please leave a 👏 and like always, be excellent to each other