-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[lldb] refactor PlatformAndroid #145382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[lldb] refactor PlatformAndroid #145382
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-lldb Author: Chad Smith (cs01) ChangesAdbClient and PlatformAndroid GetSyncService is not threadsafe. This was not an issue because it was (apparently) not used in a threaded environment. However when the new setting
was added, it began being called from multiple threads to load modules simultaneously. Setting it to True triggers crashes, False stops it from crashing. The top of the stack in the crash looked something like this:
Our workaround was to set it to False, but the root cause was not fixed. This PR fixes the root cause by creating a new connection for each Adb SyncService. Full diff: https://github.com/llvm/llvm-project/pull/145382.diff 5 Files Affected:
diff --git a/lldb/source/Plugins/Platform/Android/AdbClient.cpp b/lldb/source/Plugins/Platform/Android/AdbClient.cpp
index a179260ca15f6..4c5342f7af60b 100644
--- a/lldb/source/Plugins/Platform/Android/AdbClient.cpp
+++ b/lldb/source/Plugins/Platform/Android/AdbClient.cpp
@@ -417,13 +417,30 @@ Status AdbClient::ShellToFile(const char *command, milliseconds timeout,
return Status();
}
+// Returns a sync service for file operations.
+// This operation is thread-safe - each call creates an isolated sync service
+// with its own connection to avoid race conditions.
std::unique_ptr<AdbClient::SyncService>
AdbClient::GetSyncService(Status &error) {
- std::unique_ptr<SyncService> sync_service;
- error = StartSync();
- if (error.Success())
- sync_service.reset(new SyncService(std::move(m_conn)));
+ std::lock_guard<std::mutex> lock(m_sync_mutex);
+
+ // Create a temporary AdbClient with its own connection for this sync service
+ // This avoids the race condition of multiple threads accessing the same
+ // connection
+ AdbClient temp_client(m_device_id);
+
+ // Connect and start sync on the temporary client
+ error = temp_client.Connect();
+ if (error.Fail())
+ return nullptr;
+ error = temp_client.StartSync();
+ if (error.Fail())
+ return nullptr;
+
+ // Move the connection from the temporary client to the sync service
+ std::unique_ptr<SyncService> sync_service;
+ sync_service.reset(new SyncService(std::move(temp_client.m_conn)));
return sync_service;
}
@@ -487,7 +504,9 @@ Status AdbClient::SyncService::internalPushFile(const FileSpec &local_file,
error.AsCString());
}
error = SendSyncRequest(
- kDONE, llvm::sys::toTimeT(FileSystem::Instance().GetModificationTime(local_file)),
+ kDONE,
+ llvm::sys::toTimeT(
+ FileSystem::Instance().GetModificationTime(local_file)),
nullptr);
if (error.Fail())
return error;
diff --git a/lldb/source/Plugins/Platform/Android/AdbClient.h b/lldb/source/Plugins/Platform/Android/AdbClient.h
index 851c09957bd4a..9ef780bc6202b 100644
--- a/lldb/source/Plugins/Platform/Android/AdbClient.h
+++ b/lldb/source/Plugins/Platform/Android/AdbClient.h
@@ -14,6 +14,7 @@
#include <functional>
#include <list>
#include <memory>
+#include <mutex>
#include <string>
#include <vector>
@@ -135,6 +136,7 @@ class AdbClient {
std::string m_device_id;
std::unique_ptr<Connection> m_conn;
+ mutable std::mutex m_sync_mutex;
};
} // namespace platform_android
diff --git a/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp b/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
index 5bc9cc133fbd3..68e4886cb1c7a 100644
--- a/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
+++ b/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
@@ -474,13 +474,11 @@ std::string PlatformAndroid::GetRunAs() {
return run_as.str();
}
-AdbClient::SyncService *PlatformAndroid::GetSyncService(Status &error) {
- if (m_adb_sync_svc && m_adb_sync_svc->IsConnected())
- return m_adb_sync_svc.get();
-
+std::unique_ptr<AdbClient::SyncService>
+PlatformAndroid::GetSyncService(Status &error) {
AdbClientUP adb(GetAdbClient(error));
if (error.Fail())
return nullptr;
- m_adb_sync_svc = adb->GetSyncService(error);
- return (error.Success()) ? m_adb_sync_svc.get() : nullptr;
+
+ return adb->GetSyncService(error);
}
diff --git a/lldb/source/Plugins/Platform/Android/PlatformAndroid.h b/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
index 5602edf73c1d3..3140acb573416 100644
--- a/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
+++ b/lldb/source/Plugins/Platform/Android/PlatformAndroid.h
@@ -80,9 +80,7 @@ class PlatformAndroid : public platform_linux::PlatformLinux {
std::string GetRunAs();
private:
- AdbClient::SyncService *GetSyncService(Status &error);
-
- std::unique_ptr<AdbClient::SyncService> m_adb_sync_svc;
+ std::unique_ptr<AdbClient::SyncService> GetSyncService(Status &error);
std::string m_device_id;
uint32_t m_sdk_version;
};
diff --git a/lldb/unittests/Platform/Android/AdbClientTest.cpp b/lldb/unittests/Platform/Android/AdbClientTest.cpp
index 0808b96f69fc8..c2f658b9d1bc1 100644
--- a/lldb/unittests/Platform/Android/AdbClientTest.cpp
+++ b/lldb/unittests/Platform/Android/AdbClientTest.cpp
@@ -6,9 +6,13 @@
//
//===----------------------------------------------------------------------===//
-#include "gtest/gtest.h"
#include "Plugins/Platform/Android/AdbClient.h"
+#include "gtest/gtest.h"
+#include <atomic>
#include <cstdlib>
+#include <future>
+#include <thread>
+#include <vector>
static void set_env(const char *var, const char *value) {
#ifdef _WIN32
@@ -31,14 +35,14 @@ class AdbClientTest : public ::testing::Test {
void TearDown() override { set_env("ANDROID_SERIAL", ""); }
};
-TEST(AdbClientTest, CreateByDeviceId) {
+TEST_F(AdbClientTest, CreateByDeviceId) {
AdbClient adb;
Status error = AdbClient::CreateByDeviceID("device1", adb);
EXPECT_TRUE(error.Success());
EXPECT_EQ("device1", adb.GetDeviceID());
}
-TEST(AdbClientTest, CreateByDeviceId_ByEnvVar) {
+TEST_F(AdbClientTest, CreateByDeviceId_ByEnvVar) {
set_env("ANDROID_SERIAL", "device2");
AdbClient adb;
@@ -47,5 +51,116 @@ TEST(AdbClientTest, CreateByDeviceId_ByEnvVar) {
EXPECT_EQ("device2", adb.GetDeviceID());
}
+TEST_F(AdbClientTest, GetSyncServiceThreadSafe) {
+ // Test high-volume concurrent access to GetSyncService
+ // This test verifies thread safety under sustained load with many rapid calls
+ // Catches race conditions that emerge when multiple threads make repeated
+ // calls to GetSyncService on the same AdbClient instance
+
+ AdbClient shared_adb_client("test_device");
+
+ const int num_threads = 8;
+ std::vector<std::future<bool>> futures;
+ std::atomic<int> success_count{0};
+ std::atomic<int> null_count{0};
+
+ // Launch multiple threads that all call GetSyncService on the SAME AdbClient
+ for (int i = 0; i < num_threads; ++i) {
+ futures.push_back(std::async(std::launch::async, [&]() {
+ // Multiple rapid calls to trigger the race condition
+ for (int j = 0; j < 20; ++j) {
+ Status error;
+
+ auto sync_service = shared_adb_client.GetSyncService(error);
+
+ if (sync_service != nullptr) {
+ success_count++;
+ } else {
+ null_count++;
+ }
+
+ // Small delay to increase chance of hitting the race condition
+ std::this_thread::sleep_for(std::chrono::microseconds(1));
+ }
+ return true;
+ }));
+ }
+
+ // Wait for all threads to complete
+ bool all_completed = true;
+ for (auto &future : futures) {
+ bool thread_result = future.get();
+ if (!thread_result) {
+ all_completed = false;
+ }
+ }
+
+ // This should pass (though sync services may fail
+ // to connect)
+ EXPECT_TRUE(all_completed) << "Parallel GetSyncService calls should not "
+ "crash due to race conditions. "
+ << "Successes: " << success_count.load()
+ << ", Nulls: " << null_count.load();
+
+ // The key test: we should complete all operations without crashing
+ int total_operations = num_threads * 20;
+ int completed_operations = success_count.load() + null_count.load();
+ EXPECT_EQ(total_operations, completed_operations)
+ << "All operations should complete without crashing";
+}
+
+TEST_F(AdbClientTest, ConnectionMoveRaceCondition) {
+ // Test simultaneous access timing to GetSyncService
+ // This test verifies thread safety when multiple threads start at exactly
+ // the same time, maximizing the chance of hitting precise timing conflicts
+ // Catches race conditions that occur with synchronized simultaneous access
+
+ AdbClient adb_client("test_device");
+
+ // Try to trigger the exact race condition by having multiple threads
+ // simultaneously call GetSyncService
+
+ std::atomic<bool> start_flag{false};
+ std::vector<std::thread> threads;
+ std::atomic<int> null_service_count{0};
+ std::atomic<int> valid_service_count{0};
+
+ const int num_threads = 10;
+
+ // Create threads that will all start simultaneously
+ for (int i = 0; i < num_threads; ++i) {
+ threads.emplace_back([&]() {
+ // Wait for all threads to be ready
+ while (!start_flag.load()) {
+ std::this_thread::yield();
+ }
+
+ Status error;
+ auto sync_service = adb_client.GetSyncService(error);
+
+ if (sync_service == nullptr) {
+ null_service_count++;
+ } else {
+ valid_service_count++;
+ }
+ });
+ }
+
+ // Start all threads simultaneously to maximize chance of race condition
+ start_flag.store(true);
+
+ // Wait for all threads to complete
+ for (auto &thread : threads) {
+ thread.join();
+ }
+
+ // The test passes if we don't crash
+ int total_results = null_service_count.load() + valid_service_count.load();
+ EXPECT_EQ(num_threads, total_results)
+ << "All threads should complete without crashing. "
+ << "Null services: " << null_service_count.load()
+ << ", Valid services: " << valid_service_count.load();
+}
+
} // end namespace platform_android
} // end namespace lldb_private
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one way to fix the problem. Another (maybe even more obvious) would be to use a single SyncService object but synchronize all access to it. I think your approach makes sense -- it unlocks more parallelism, though that parallelism may be a mirage if the device connection is going to be the bottleneck. And all the extra connections may add some overhead. Have you looked into how the two approaches compare?
As for the patch itself, the code makes sense, but the result looks very path-dependent (meaning: you'd never write code like this -- creating a temporary AdbClient object -- if you were writing this from scratch). It's also not optimal, as you're creating a temporary AdbClient in PlatformAndroid::GetSyncService and then creating another temporary object inside AdbClient::GetSyncService.
I think this would look better if PlatformAndroid::GetSyncService constructed a (new) SyncService directly. And the SyncService could create a temporary AdbClient object as an implementation detail (or not -- maybe it could handle all of the connection setup internally)
error = StartSync(); | ||
if (error.Success()) | ||
sync_service.reset(new SyncService(std::move(m_conn))); | ||
std::lock_guard<std::mutex> lock(m_sync_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this actually protecting?
std::unique_ptr<SyncService> sync_service; | ||
sync_service.reset(new SyncService(std::move(temp_client.m_conn))); | ||
return sync_service; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::unique_ptr<SyncService> sync_service; | |
sync_service.reset(new SyncService(std::move(temp_client.m_conn))); | |
return sync_service; | |
return std::make_unique<SyncService>(std::move(temp_client.m_conn)); |
@cs01, can you get a more symbolicated stacktrace for the crash with debug info and line info? |
"use after free" and "race condition" aren't mutually exclusive. The backtrace is consistent with one thread destroying (moving from) the |
Thank you both for the feedback. I started working on changes that add more mutexes to adbclient methods, use a shared pointer for the sync service, and have cleaner/less path dependent creation of the syncservice (I agree with your comments labath). I will push an update either today or sometime next week. |
After more analysis, I think shared ptr will be too difficult to implement, since the connection and client manages its lifecycle assuming it's the only client/thread using it. It will be challenging to ensure it doesn't get disconnected or freed while other threads are using it. Creating a new connection for each syncservice should have negligible overhead to establish the connection (several bytes), especially in comparison to pushing or pulling a file. I tested a build of this in a real world scenario and the performance seemed unchanged. |
It has been a week since the last update was published. Friendly ping to @labath @jeffreytan81 (or anyone else who wants to review) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very fond of the amount of validity checks this PR is adding. What's up with that? Is it somehow related to you passing a non-nullptr-but-uninitialized Connection object (std::make_unique<ConnectionFileDescriptor>()
) into the AdbClient? Any chance to get rid of that? Maybe by using a nullptr to mean "no connection"? Or by reducing the amount of moving around?
@@ -665,3 +655,40 @@ Status AdbClient::SyncService::PullFileChunk(std::vector<char> &buffer, | |||
Status AdbClient::SyncService::ReadAllBytes(void *buffer, size_t size) { | |||
return ::ReadAllBytes(*m_conn, buffer, size); | |||
} | |||
|
|||
Status AdbClient::SyncService::SetupSyncConnection(const std::string &device_id) { | |||
if (!m_conn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This moving of connections around is very unelegant. Could you at least make it so that the connection is moved at most once, e.g. created inside AdbClient and then moved into this object.
Though ideally, I would try to avoid using AdbClient here at all by making any methods that need to operate on the connection static (something similar to ReadAllBytes).
Thanks for the feedback, I agree with you, it was confusing. I was trying to do the minimal change to make it work without too big of a refactor since this is my first PR to lldb. Based on your feedback, I did a more extensive refactor.
In lldb, turn on logs
then attach to an android process (e.g.
In lldb, get a file to confirm the sync service works:
shows
I also added an implementation for |
Friendly ping @labath and anyone else who has any thoughts |
@JDevlieghere @adrian-prantl, would you review this diff and unblock @cs01 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything we can do to help move this along? I would love the people with Android expertise to comment and help move this along, or I can approve early next week if we have no further comments.
Problem
When the new setting
was added, lldb began fetching modules from the devices from multiple threads simultaneously. This caused crashes of lldb when debugging on android devices.
The top of the stack in the crash look something like this:
Our workaround was to set
set target.parallel-module-load
tofalse
to avoid the crash.Background
PlatformAndroid creates two different classes with one stateful adb connection shared between the two -- one through AdbClient and another through AdbClient::SyncService. The connection management and state is complex, and seems to be responsible for the segfault we are seeing. The AdbClient code resets these connections at times, and re-establishes connections if they are not active. Similarly, PlatformAndroid caches its SyncService, which uses an AdbClient class, but the SyncService puts its connection into a different 'sync' state that is incompatible with a standard connection.
Changes in this diff
PlatformAndroid::FindProcesses
was implemented.