Add a standalone reducer and serialization support #1252

jinhuix · 2025-06-30T13:37:33Z

Resolves #1241

This PR introduces a standalone SQL reducer mechanism to SQLancer. It enables automated simplification of bug-inducing SQL queries in an offline manner via serialized context. The PR is split into 5 commits for clarity:

Serializable context: Add SerializableReducerContext class to store reproducible context (SQL queries, expected errors, and oracle type, etc).
SimpleReducer: Implement a lightweight SimpleReducer engine that uses delta-debugging-like reduction to minimize failing SQL inputs. It supports both exception-based and oracle-based reduction (NoRECOracle, TLPWhereOracle).
Integration Hooks: Add minimal changes to the SQLancer core to support optional serialization of reducer context data (via CLI flags).
Testing: Add a TestSimpleReducer.java class for validation, covering Exception, NoREC, and TLPWhere bugs.
Documentation: Add usage guide with examples and more details in README.md.

@mrigger @KabilanMA Happy to hear any thoughts or suggestions!

mrigger · 2025-06-30T14:50:41Z

Thanks, this looks like a great start, and it's great to see that this also works!

The information you save in the reducer context is quite low level (i.e., strings). Do you think we can save more high-level information? This would make the approach more robust, as the current parsing seems quite complex. We could also do this for the expected errors. Rather than checking for every statement whether an error matches any expected error of the system, we could store a detailed statement-to-expected-error mapping. When reading back the reduced statements, this would require doing some kind of best-effort alignment. For now, with the line granularity, this could be a 1:1 match per statement.

jinhuix · 2025-07-01T05:59:04Z

Thanks a lot for the feedback! I'll revise the implementation accordingly:

Structured serialization: I'll split the reducer context into general base fields and oracle-specific ones. For example, NoRECReproducerData will contain optimizedQueryString, unoptimizedQueryString, and shouldUseAggregate.
Direct reproducer info: The serializer will retrieve information directly from the Reproducer, rather than parsing query strings.
Error mapping: For expected errors, I plan to switch to a structure like Map<String, String> statementToExpectedError. Since this would change how expectedErrors are checked, I’m considering re-implementing the relevant parts of SQLQueryAdapter inside the new reducer to keep SQLancer core untouched.

Before I proceed with the refactor, do you think this direction makes sense? @mrigger @KabilanMA

mrigger · 2025-07-02T15:29:53Z

Structured serialization: I'll split the reducer context into general base fields and oracle-specific ones. For example, NoRECReproducerData will contain optimizedQueryString, unoptimizedQueryString, and shouldUseAggregate.

I am a bit unclear about that one. If we serialize the reproducer, I assume we will not need this information?

Direct reproducer info: The serializer will retrieve information directly from the Reproducer, rather than parsing query strings.

Sounds good.

Error mapping: For expected errors, I plan to switch to a structure like Map<String, String> statementToExpectedError.

I think this should probably be a Map<SQLQuery, ExpectedErrors> mapping.

jinhuix · 2025-07-04T08:54:16Z

If we serialize the reproducer, I assume we will not need this information?

My original concern was that Reproducer is an interface and may have different implementations for each oracle, which could make direct serialization difficult. So instead, we can create oracle-specific ReproducerData classes (e.g., NoRECReproducerData) that store only the necessary fields for reproduction.

I think this should probably be a Map<SQLQuery, ExpectedErrors> mapping.

Agreed — I'll change the structure accordingly.

KabilanMA

Checked the PR changes to some extend and left a few comments, please make the changes accordingly, if needed.

KabilanMA · 2025-07-15T00:34:03Z

src/sqlancer/Main.java

+                }
+            }
+
+            private static boolean hasAggregateFunction(String query) {


This is not scalable

KabilanMA · 2025-07-15T00:44:39Z

src/sqlancer/Main.java

+                    if (statement.contains("the counts mismatch")) {
+                        parseNoRECOracle(context, statement);
+                        return;
+                    } else if (statement.contains("result sets mismatch")


This type of string comparison can make it difficult to maintain the source code when there are rapid changes; at the very least, perform a case-insensitive check. I think parsing the context based on a string is not a good idea. Is it possible to determine which Oracle parser to use from somewhere else and use it here?

KabilanMA · 2025-07-15T00:50:43Z

src/sqlancer/Main.java

+                    if (line.startsWith("-- Query: \"") && line.endsWith("\"")) {
+                        String query = line.substring(11, line.length() - 1);


I am not sure, but it seems like, this cannot produce consistent results with changes to the other codebase. Still, I am not sure whether it is the right way to extract the string query.

KabilanMA · 2025-07-15T00:58:48Z

src/sqlancer/SerializableReducerContext.java

@@ -0,0 +1,210 @@
+package sqlancer;


I feel like the use of strings throughout the workflow is hanging by a thin thread. But, the string information or any other information can be converted to something consistent in this file to serialize. Then, when you parse, you don't have to depend on the string comparison again; instead, use the consistent values (e.g., enum) you used here. Also, when there is any change or it isn't working after some changes to other source files, you know this is the only file you have to make the modification.

jinhuix · 2025-07-17T02:21:12Z

Updated the PR:

Made the data in ReducerContext.java more high-level (e.g., using enums).
Removed the manual parsing logic from Main.java.
Added one-to-one serialization of <SQLQuery, ExpectedErrors> mappings.

KabilanMA · 2025-07-18T11:54:28Z

src/sqlancer/postgres/PostgresProvider.java

@@ -279,14 +279,14 @@ public SQLConnection createDatabase(PostgresGlobalState globalState) throws SQLE
        }
        Connection con = DriverManager.getConnection("jdbc:" + entryURL, username, password);
        globalState.getState().logStatement(String.format("\\c %s;", entryDatabaseName));
-


Don't push these kinds of format changes into the PR.

KabilanMA · 2025-07-18T12:26:46Z

src/sqlancer/common/query/SQLQueryAdapter.java

+    public boolean checkException(Exception e) throws AssertionError {
        Throwable ex = e;

        while (ex != null) {
            if (expectedErrors.errorIsExpected(ex.getMessage())) {
-                return;
+                return true;


This function's logic seems to always return inside the while loop, or run infinitely inside the while loop (theoretically), or throw an AssertionError. I don't understand the logic behind changing it to return boolean. After the modification, it returns true or throws an exception.

jinhuix added 6 commits June 30, 2025 20:41

Add serializable context class

b122d5e

Implement SimpleReducer for SQL reduction

bb0fb42

Support saving reducer context in SQLancer

4b4c749

Add tests for SimpleReducer

0296142

Add usage guide and examples for SimpleReducer

798c11a

Fix formatter

41336d9

KabilanMA suggested changes Jul 15, 2025

View reviewed changes

jinhuix added 3 commits July 16, 2025 14:00

Make ReducerContext more high-level and improve serialization extraction

b9511d2

Serialize expected errors in one-to-one mapping

5e8c819

Fix formatter

aec4a28

KabilanMA suggested changes Jul 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a standalone reducer and serialization support #1252

Add a standalone reducer and serialization support #1252

Uh oh!

jinhuix commented Jun 30, 2025

Uh oh!

mrigger commented Jun 30, 2025

Uh oh!

jinhuix commented Jul 1, 2025 •

edited

Loading

Uh oh!

mrigger commented Jul 2, 2025 •

edited

Loading

Uh oh!

jinhuix commented Jul 4, 2025

Uh oh!

KabilanMA left a comment

Uh oh!

KabilanMA Jul 15, 2025

Uh oh!

KabilanMA Jul 15, 2025

Uh oh!

KabilanMA Jul 15, 2025

Uh oh!

KabilanMA Jul 15, 2025

Uh oh!

jinhuix commented Jul 17, 2025

Uh oh!

KabilanMA Jul 18, 2025

Uh oh!

KabilanMA Jul 18, 2025

Uh oh!

Uh oh!

		if (line.startsWith("-- Query: \"") && line.endsWith("\"")) {
		String query = line.substring(11, line.length() - 1);

Add a standalone reducer and serialization support #1252

Are you sure you want to change the base?

Add a standalone reducer and serialization support #1252

Uh oh!

Conversation

jinhuix commented Jun 30, 2025

Uh oh!

mrigger commented Jun 30, 2025

Uh oh!

jinhuix commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrigger commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jinhuix commented Jul 4, 2025

Uh oh!

KabilanMA left a comment

Choose a reason for hiding this comment

Uh oh!

KabilanMA Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

KabilanMA Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

KabilanMA Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

KabilanMA Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

jinhuix commented Jul 17, 2025

Uh oh!

KabilanMA Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

KabilanMA Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jinhuix commented Jul 1, 2025 •

edited

Loading

mrigger commented Jul 2, 2025 •

edited

Loading