Acronis True Image Costs Performance When Not Used

Over two years ago I installed Acronis True Image for Crucial in order to migrate my data to a new SSD I had just purchased. It worked. I then left True Image installed “just in case”, and what harm could that possibly cause.

Well, funny you should ask.

I recently noticed that whenever I plugged or unplugged my external monitor Explorer.exe would consume a lot of CPU time – dozens of seconds of it. It was enough CPU time to make my computer noticeably sluggish until things calmed down which could take 15+ seconds. “That’s odd” is how most of my investigative reporting starts so I grabbed an ETW trace and drilled in. It didn’t take long to find the culprit.

Aside: I have worked with Acronis to help them understand this issue and they have provided a mitigation and have said that they plan to address the problem in the next release of their software. See “Workarounds and fixes” for details.

In the trace Explorer.exe was using 44 s of CPU time over a 16 s time period (from 7.0 s to 23.0 s in the trace) which is way too much:

image

I opened up CPU Usage (Sampled) to investigate. The CPU usage was distributed across dozens of unnamed threads so I hid the Thread ID column and the Thread Name column in order to group all the threads together and drilled down:

image

I quickly found that windows.storage.dll!CFSFolder::_GetOverlayInfo was consuming a large chunk of the time (20,191 of the 42,299 samples), and most of that was in a call to an unknown function in tishell64_26_0_39450.dll. I temporarily ignored the question of who owned that DLL while I first tried to understand what it was doing.

If you want to follow along you can download the trace I’m looking at and load it into Microsoft’s Windows Performance Analyzer (WPA).

The CPU Usage (Sampled) data works by interrupting all running CPUs 1,000 times a second (by default) and grabbing call stacks. This makes it a powerful tool for understanding where CPU time is being spent. You can read more about how to use this information in Xperf for Excess CPU Consumption.

The 20,191 samples with CFSFolder::_GetOverlayInfo on the stack suggest that approximately 20 s of CPU time was consumed inside that function and its descendants (on that call stack). Approximately 6.6 s of that is in Process32NextW (and its descendants) and approximately 3.1 s in CreateToolhelp32Snapshot (and its descendants). I don’t have symbols or source for the tishell64 DLL but I know what those two Windows functions do so I’ll start with those.

CreateToolhelp32Snapshot grabs a snapshot of system data that could include a list of processes, threads, modules, heaps, etc. Process32NextW is one of the functions used to iterate through the snapshot and its presence tells us that TH32CS_SNAPPROCESS was specified. So, the tishell64 DLL is grabbing a list of running processes and iterating through that list.

The CPU Usage (Sampled) data gives you an approximation of how much time is spent in different call stacks but it cannot differentiate between a small number of expensive function calls and a large number of cheap calls. That is, I couldn’t tell whether CreateToolhelp32Snapshot and Process32NextW were expensive, being called too frequently, or a bit of both.

I decided to investigate this by attaching the Visual Studio debugger to Explorer.exe and setting a breakpoint on kernel32.dll!CreateToolhelp32Snapshot. I set this as a conditional breakpoint that would only halt after being hit one billion times because I didn’t actually want Explorer.exe to halt in the debugger – I just wanted Visual Studio to count how many times the breakpoint was hit. The breakpoint settings looked like this:

image

Debugging Explorer.exe made me nervous because if Visual Studio’s debugger tried to invoke some Explorer.exe functionality while Explorer was halted at a breakpoint then I could end up with a deadlock. But, it worked! I had to tell Visual Studio not to stop on imagesome sort of COM exception that Explorer.exe was throwing, but after that things went smoothly. Visual Studio doesn’t update the hit count while the debuggee is running so after plugging or unplugging my external monitor I would use Debug-> Break All to temporarily break into Explorer.exe to see the count.

Results varied but with my setup (three Explorer windows open) I would see anywhere from 1,200 to 3,000 hits on the CreateToolhelp32Snapshot breakpoint:

image

With no Explorer windows open I would still see 44 hits on the breakpoint, so 44 calls to CreateToolhelp32Snapshot. Now, without symbols or source code for the tlshell64 DLL I can’t say what is going on but I will say that I don’t understand why a shell extension would need to get a list of running processes even a single time. That sort of functionality is useful for debugging and development tools but it seems unusual – downright strange in fact – for it to be called in this context.

Calling CreateToolhelp32Snapshot once is strange. Calling it up to 3,000 times because a monitor is plugged or unplugged is the problem.

How many times CreateToolhelp32Snapshot is called seems to depend on how many Explorer Windows are open (I use three) and perhaps on how many icons are visible (my Downloads folder shows many) and then the cost of CreateToolhelp32Snapshot presumably depends on how many processes are running. With my system under its normal load of processes I saw this path consuming up to 32 CPU seconds in explorer.exe when I unplug my external monitor.

The total cost from the tishell64 DLL is greater than this, however. I noticed the tishell64 DLL on some other call stacks so I used WPA’s View Callers By Module feature to group all samples with the tishell64 DLL present on the stack:

image

This showed that the actual cost from the tishell64 DLL was somewhere around 26 CPU seconds in my initial trace. In one torture-test trace the total cost from the tishell64 DLL was more than 60 CPU seconds! That is an enormous amount of CPU time for just unplugging my external monitor.

Who dunnit?

Now it’s time to find out who owns the tishell64 DLL, although the title of this blog post is a bit of a spoiler.

In WPA’s Graph Explorer I expanded System Activity and then Images and then double clicked on Lifetime By Process, Image, which gives me this view:

image

There are 370 (!!!) DLLs loaded into Explorer.exe so I dragged the File Version column to the left of the Image Name column, sorted by that column, and then expanded explorer.exe and the blank and <Unknown> file versions to get this view:

image

That’s 11 DLLs that are lacking file version information. This strikes me as very sloppy – who would ship all these DLLs with this important information missing? I added the Image Path column and now we can see where all of these DLLs live:

image

In particular the tlshell64 DLL is located in “C:\Program Files (x86)\Acronis\TrueImageHome” and we can therefore assume that it is published by Acronis as part of Acronis True Image, and running sigcheck on it verifies this.

Why is Process32NextW expensive?

The ETW profiling data that showed me that Process32NextW() is consuming lots of CPU time also lets me see exactly where it is spent. It’s mostly in mapping and unmapping sections, and some page faults (probably from this). Maybe it could be faster, but optimizing it is almost certainly the wrong place to spend resources. It just shouldn’t be called this frequently, and if it was called a thousand times less frequently (very practical) then its performance wouldn’t matter.

In other words, I don’t care why it is expensive.

Workarounds and fixes

I was able to reach out to Acronis and talk to one of their representatives about this issue. It was a slow process (time zones!) but they shared symbols for the tishell64 DLL to help us understand what was going on. Now that they are aware of the process-enumeration issues they plan to address the problem in the next release.

Until then the process-enumeration code can be disabled by deleting the following registry key:

Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\
Explorer\ShellIconOverlayIdentifiers\     AcronisDrive

Note that the key name has five spaces at the start of it.

The other tishell64 costs seem to have already been fixed in the latest versions of Acronis True Image. However if you are using the version from Crucial then all of the performance issues are still there. I tested with the most recent version on Crucial’s website and all of the issues are present. I have reached out to Crucial. Their first response: “We have not encountered any customers experiencing issues with the Acronis free version available on the Crucial website.” After a bit more pushing I got “Our team is looking into this matter, and we will provide any relevant updates as soon as possible.” – here’s hoping.

Personally I have mitigated these performance issues in the simplest and most effective way – I uninstalled Acronis True Image. If you are running the version distributed by Crucial or some other potentially out-of-date version then I recommend this.

Missing metadata

I ran sysinternals’ sigcheck on the 11 DLLs with no version listed and… 10 of them have the Publisher listed as Microsoft, and one as Acronis International GmbH.

These files are also missing Product Name, Company Name, and Product Version in the ETW fields and much of this information is also missing from the sigcheck output. My tests were on Windows 10 but Windows 11 still shows 8 Microsoft DLLs in Explorer.exe that don’t have File Version filled out for ETW to record. There really should be automated checks to make sure that appropriate metadata is added to the bits that Microsoft ships. I’ve reported this before but it’s not clear that there has been any progress.

Acronis also needs to fix this missing metadata, but really, that is the least of their problems. What Acronis needs to do is to either iterate through the list of running processes orders of magnitude less frequently or, better yet, not at all.

Conclusion

Acronis True Image is iterating through a list of running processes many times – sometimes thousands of times – whenever my monitor is plugged or unplugged. It probably does this same wasteful iteration in other situations as well. This iterating wastes dozens of seconds of CPU time, wasting battery life and making my computer sluggish while this is happening. That’s the bug.

Don’t look behind the curtain

Attaching a debugger to Explorer.exe feels like looking into a filthy basement that is being used for packaging of medical supplies. It seems like Explorer should be clean and tidy because otherwise we risk having our computers be unstable but the reality is a busy stream of C++ and COM exceptions and warnings about unsupported interfaces and invalid window handles. The exceptions may be working-as-intended but I am dubious about the other errors. Here is some typical debug spew that I saw when I had the debugger connected:

Exception thrown at 0x00007FFF89C4B699 in explorer.exe: Microsoft C++ exception: Platform::COMException ^ at memory location 0x0000000002CED670.
Exception thrown at 0x00007FFF89C4B699 in explorer.exe: Microsoft C++ exception: [rethrow] at memory location 0x0000000000000000.
Exception thrown at 0x00007FFF89C4B699 in explorer.exe: Microsoft C++ exception: [rethrow] at memory location 0x0000000000000000.00007FFF53D62633: (caller: 00007FF7BC39E620) ReturnHr(159) tid(b138) 80070578 Invalid window handle.
     Msg:[Platform::Exception^: Invalid window handle.
pcshell\shell\applicationframe\frame\lib\titlebar.cpp(5549)\ApplicationFrame.dll!00007FFF533D74AE: (caller: 00007FFF533EEA09) ReturnHr(10) tid(4ae4) 80070490 Element not found.
shell\lib\gitreglist.cpp(53)\twinui.dll!00007FFF534B6AEA: (caller: 00007FFF534FD9B8) ReturnHr(133) tid(678) 80004002 No such interface supported

I also noticed in the File I/O graph that Explorer.exe creates (opens) “C:\Program Files (x86)\Acronis\TrueImageHome\ti_managers_proxy.dll” 4,663 times during one of its busy times, representing 79% of its Create calls. I’m not sure what’s going on here and I’m not sure it actually “matters” but this seems wasteful. I am guessing that this is being done by the Acronis code but I didn’t actually check.

Unknown's avatar

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048
This entry was posted in Investigative Reporting, Performance, uiforetw, xperf and tagged , , . Bookmark the permalink.

12 Responses to Acronis True Image Costs Performance When Not Used

  1. wim colgate's avatar wim colgate says:

    I always find your insights and analysis so very interesting. Thanks for the post!

  2. Paul's avatar Paul says:

    You did 99% of the work, how about a fix? You can probably hook CreateToolhelp32Snapshot or Process32NextW and either return an error, an empty list, or the previous result (more effort) depending on what’s the behavior in each case and how much this info is really needed. If on error they just skip it and no significant functionality is lost, that’d be simplest. It’d be really simple to write such Windhawk mod, and easy to publish it for others to enjoy the fix.

    • brucedawson's avatar brucedawson says:

      I have no interest in creating a fix. The best solution for me is to uninstall Acronis True Image. Done. The best solution for others is for Acronis to release a fixed version. Hooking CreateToolhelp32Snapshot is a hack and it would only benefit the tiny proportion of people who find the hack and install it, and it could easily break some legitimate functionality in explorer.ee.

      • Paul's avatar Paul says:

        Yes, you’re right, a fix in this case would be for a small population who uses the variant that’s not being updated, and who have no option to uninstall it. It’s just that every time I read such an investigation piece from you, I can’t help but think, damn, the fix is right there and requires a couple of code line tweaks. Not all vendors are responsive (hello Microsoft), and so other fixes can benefit a large amount of users.

        In any case, I enjoyed the read, thanks

        • brucedawson's avatar brucedawson says:

          I’m not familiar with Windhawk so I’d have to learn how to use it. That could be fun, but I probably spent too much time on this blog post as it is – discussions with Acronis went on for almost a month, for instance. Maybe somebody else will create such a patch

  3. Koby Kahane's avatar Koby Kahane says:

    It’s unfortunate that the native NtGetNextProcess and NtGetNextThread APIs are not documented (or available via Win32 wrappers) – they are far more efficient than the terrible CreateToolhelp32Snapshot.

  4. fish's avatar fishthss says:

    Thank you for the in-depth analysis, as always!

    I found this exact issue about three years ago with a slightly different symptom: UAC dialogs would take longer to trigger on my system. I profiled it, and turned out that tishell64_xxx.dll was busy enumerating all processes before the UAC dialog pops up. I don’t remember the details, but after reversing the relevant code, the logic seems to be detecting if another Acronis daemon process was running by checking the image name of another process.

    I found the root cause and proposed a fix to Acronis, but I could not get through their customer support (just like with many other software vendors when I report non-security bugs). So I had to solve the problem in a dirty way: Unregister tishell64_xxx.dll as a shell extension (iirc it was a shell extension) and make sure it does not come back when Acronis True Image upgrades itself.

    Thanks again for your effort in pushing Acronis to solve this annoying problem!

    • brucedawson's avatar brucedawson says:

      Ooh – fascinating! I always figured that there were probably other ways to trigger this behaviour but I didn’t put in any effort to find out what they might be. Bringing up a UAC dialog is a more common operation on some systems.

      Thanks for the information about what the code might be doing – Acronis was not forthcoming with any details and even with their symbols (thanks Acronis!) I couldn’t really tell what was happening.

      If I get energetic I might reinstall Acronis True Image just to double check this, but trusting your reporting is simpler and I am lazy so I will probably leave it at that.

  5. WS's avatar WS says:

    The number of overlay images in Windows is limited to 16. Using spaces in the registry is an evil way to give yourself higher priority.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.