Spark is currently the most popular tool for identifying performance issues on a Minecraft server, even recently replacing the old Timings system in Paper. While it’s more accurate than Timings, it can be more difficult to read and can lead to some of the same misinterpretations if you don’t fully understand how it works. If you need help with installing Spark, or the process of actually getting a report, Spark has excellent documentation.
How is it better than Timings?
Spark is a CPU sampler. As explained in my previous post on Timings, a sampler measures the time between different “sample points” in the code. The difference between Spark and Timings, however, is that Spark automatically handles these sample points. Timings is extremely limited in comparison, because all of these sample points must be manually written into the code. This means that while Timings can tell you what events on the server are slow, Spark can to an extent tell you why they are slow.
Due to the significantly larger number of sample points, it also means that Spark’s reports are more complex. Rather than just showing what plugins take up a percentage of time per event, Spark lets you step through how much time each part of the server’s code takes. This gives you a much more fine-grained view of what’s slow, but also means you need to have more contextual understanding to know what’s slow.
Reading the Spark report
At its core, Spark is made up of a waterfall graph that you can expand to show more details. The further right you go, the deeper in the code hierarchy you’re looking. If you expand a line, you’re viewing the code “inside” the line you’ve expanded.
For example, in the above image, the CraftScheduler.mainThreadHeartbeat()
code contains CraftTask.run()
and CraftFuture.run()
. If you add up the 58.80%
and 7.70%
, you get 66.5%
, which is0.05%
shy of the total time (66.55%
) that CraftScheduler.mainThreadHeartbeat()
takes. This extra 0.05%
is what we’d refer to as “self-time,” or extra time spent within that code itself rather than other code that’s triggered by it. Using this basic technique, you can narrow down the causes of lag to specific bits of code.
View Modes
Like Timings, Spark allows you to select three different view modes. “all,” “flat,” and “plugins.” There’s also a flame icon, allowing you to view the data in a flamegraph rather than the waterfall graph like the main view modes. The “all” view gives you a complete waterfall graph of the entire server, letting you drill down from the top. This was the view used in the above image.
Flat View
The flat view is similar to the “all” view in that it doesn’t do any filtering, however rather than having to dig through each line of the graph, it flattens out all lines and sorts them. While all lines have been flattened, you’re still able to dig deeper into each individual one as necessary.
The flat view also includes a number of different options. For example, by default it sorts them by the total time a line, and every line under it, takes. This sorting can instead be changed to sort by the amount of time that line itself takes. This is useful for finding cases where one specific part of the code is very slow, usually in cases where you have one very prominent source of lag. Generally, however, server slowdowns are a case of “death by a thousand cuts,” rather than one very slow thing.
Plugins View
The final view is the plugins view. While this view is very useful from a quick glance, it’s important to keep in mind that it has a significant limitation. Like Timings, the plugins that Spark lists here are categorised based on whether that plugin is mentioned in that branch of the graph. This means the same issue that Timings has, where it’ll “blame” a plugin (pictured below) that sets a lever, triggering a large redstone contraption, for the time that the redstone contraption takes.
Luckily, Spark provides more information than Timings. While Timings would just provide that basic level of information, with Spark you can dig deeper. If you dig down and start seeing something that appears entirely unrelated to the plugin, such as redstone updates, chunk generation, hoppers, etc, that can indicate that it’s not necessarily the plugin that’s directly causing issues, but something downstream from the plugin. That’s not to say that the plugin is definitely not at fault, but it provides significantly more context as to where the slowdown is actually taking place.
Flamegraph
The flamegraph mode, selected with the flame icon next to the main view mode selector, allows visualising the data in a different format. Rather than the waterfall graph that shows the hierarchy of the code and allows you to dig deeper, the flame graph shows everything at once. A flamegraph is good for getting a quick overview of what's slow, but isn't as useful for digging into the details. However, in some situations a quick glance might be all you need.
Each line in the flamegraph represents a different part of the code, with the width (and colour) representing how much time that part took. The wider the line and hotter the colour, the more time it took. Each following line then allows you to see what other parts of code that part called, and how much time they took. For example, in the above image the MinecraftServer.tickChildren()
box took up most of the time on the 8th line, with the CraftScheduler.mainThreadHeartbeat()
and ServerLevel.tick()
boxes on the 9th line being the source of most of it. If you want to focus on a very specific area, clicking on a box will focus it and make it take the full width of the chart, allowing you to see what's below it easier.
Limitations
While significantly fewer than Timings, Spark does sadly have a few limitations.
Event counts
Spark doesn’t measure how many times something was called, only how much time was spent doing it. This means that if a giant hopper lag machine is passing thousands of items around, it might spam a protection plugin like WorldGuard with item movement events. Each individual item check might be taking far less than a millisecond but doing it thousands of times per tick still ends up causing a slowdown. As you can’t see how many times this was called, you have no way to tell if something is actually slow or whether it’s just being inundated.
Lag spikes
If you’re dealing with intermittent lag spikes, Spark by default might make it fairly difficult to actually see in the report. The results that Spark shows you are the amount of time each line of the graph took across the entire duration that the sampler was running for. This means if you run it for one minute, and there’s a lag spike during only one tick, the data from that one tick will likely be drowned out by all the other data.
The generally recommended way to alleviate this is to re-run Spark with the --only-ticks-over [time]
flag. This flag basically only includes results in the report if the tick they were a part of took over the amount of time in milliseconds given to the flag. The Spark docs provide some good information on how to find a good time value to use, but there are also a few good rules of thumb to go by to see only ticks slower than average. As the Minecraft server ideally runs at 20 ticks per second, this means the average tick should take less than or equal to 1000/20
, or 50, milliseconds.
Contextual knowledge
Another limitation is less to do with Spark, and more to do with reading it. While Spark provides more information, this information might not actually be useful if you’re not a developer. If you don’t have contextual knowledge of what different parts of the Minecraft server, as well as plugins, are doing, it’s harder to actually interpret the report and connect the dots. While it’s very useful to be able to do baseline interpretation of the report yourself so that you know who to talk to about it, it’s still a task that is better suited for people familiar with the code showing up in the report.
If you’re seeing a plugin mentioned a lot, the best people to interpret that report are the developers of that plugin. If not, you should reach out to your server software’s support instead.
Conclusion
Spark is a significant improvement over Timings and has even replaced it as the performance tool bundled with Paper. It’s important to be able to interpret these to an extent, however the developers behind the relevant software are better placed to interpret the context around those areas of the code.
Current
How to understand Spark reports to fix Minecraft Server lagHi, I'm Maddy Miller, a Senior Software Engineer at Clipchamp at Microsoft. In my spare time I love writing articles, and I also develop the Minecraft mods WorldEdit, WorldGuard, and CraftBook. My opinions are my own and do not represent those of my employer in any capacity. Find out more.