Monitoring against required behavior
The goal of most monitoring systems is (or should be) to provide status knowledge about a system's behavior compared to it's business requirements, and to help you avoid digging where you don't need to (no need to look into the "cause" of things being "just fine"). Drilling into details is what you do when you know a problem exists. But first you need to know the problem exists, which starts with knowing that you are failing to meet some required behavior.
The reason everyone should be measuring and plotting max values is simple: I don't know of any applications that don't actually have a requirement for Max response time or latency. I do know of many teams that don't know what the requirement is, have never talked about it, think the requirement doesn't exist, and don't test for it. But the requirement is always there.
"But we don't have a max requirement"
Whenever someone tells me "we don't have a max time requirement", I answer with "so it's ok for your system to be completely unresponsive for 3 days at a time, right?".
When they say no (they usually use something more profound than a simple "no"), I calmly say "so your requirement is to never have response time be longer than 3 days then..."
They will usually "correct me" at that point, and eventually come up with some number that is reasonable for their business needs. At which point they discover that they are not watching the numbers for that requirement.
So if you have the power to measure this Max latency or response time stuff yourself, or to require it from others, start doing so right away, and start looking at it.
Beyond being a universally useful and critical-to-watch requirement. Max is also a great sanity checker for other values. It's harder to measure Max wrong (although the thing many tools report and display as "max" is a bogus forms of "sampled" max) , and it's really hard to hide from it once you plot or display it.
The first conversations that happen when people start to look at max values after monitoring their stuff for a while without them often start with: "I understand that the pretty lines showing 95%'lie, average, and median all appear great, and hover around 70msec +/- 100msec, and that we've been making them better for months..., but if that's really the case, what the &$#^! is this 7 second max time doing here, and how come it happens several times an hour? And why has nobody said anything about this before? ..."
Who knows? You may find that those nice fuzzy feelings the 95%'lie charts have been giving you are well justified. So no harm then.
For those of you who find an uglier truth because I made you look, and don't like what they see, I sincerely pre-apologize for having written this...