-
Notifications
You must be signed in to change notification settings - Fork 366
Add a standalone reducer and serialization support #1252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks, this looks like a great start, and it's great to see that this also works! The information you save in the reducer context is quite low level (i.e., strings). Do you think we can save more high-level information? This would make the approach more robust, as the current parsing seems quite complex. We could also do this for the expected errors. Rather than checking for every statement whether an error matches any expected error of the system, we could store a detailed statement-to-expected-error mapping. When reading back the reduced statements, this would require doing some kind of best-effort alignment. For now, with the line granularity, this could be a 1:1 match per statement. |
Thanks a lot for the feedback! I'll revise the implementation accordingly:
Before I proceed with the refactor, do you think this direction makes sense? @mrigger @KabilanMA |
I am a bit unclear about that one. If we serialize the reproducer, I assume we will not need this information?
Sounds good.
I think this should probably be a |
My original concern was that
Agreed — I'll change the structure accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked the PR changes to some extend and left a few comments, please make the changes accordingly, if needed.
src/sqlancer/Main.java
Outdated
} | ||
} | ||
|
||
private static boolean hasAggregateFunction(String query) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not scalable
src/sqlancer/Main.java
Outdated
if (statement.contains("the counts mismatch")) { | ||
parseNoRECOracle(context, statement); | ||
return; | ||
} else if (statement.contains("result sets mismatch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of string comparison can make it difficult to maintain the source code when there are rapid changes; at the very least, perform a case-insensitive check. I think parsing the context based on a string is not a good idea. Is it possible to determine which Oracle parser to use from somewhere else and use it here?
src/sqlancer/Main.java
Outdated
if (line.startsWith("-- Query: \"") && line.endsWith("\"")) { | ||
String query = line.substring(11, line.length() - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, but it seems like, this cannot produce consistent results with changes to the other codebase. Still, I am not sure whether it is the right way to extract the string query.
@@ -0,0 +1,210 @@ | |||
package sqlancer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the use of strings throughout the workflow is hanging by a thin thread. But, the string information or any other information can be converted to something consistent in this file to serialize. Then, when you parse, you don't have to depend on the string comparison again; instead, use the consistent values (e.g., enum) you used here. Also, when there is any change or it isn't working after some changes to other source files, you know this is the only file you have to make the modification.
Updated the PR:
|
@@ -279,14 +279,14 @@ public SQLConnection createDatabase(PostgresGlobalState globalState) throws SQLE | |||
} | |||
Connection con = DriverManager.getConnection("jdbc:" + entryURL, username, password); | |||
globalState.getState().logStatement(String.format("\\c %s;", entryDatabaseName)); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't push these kinds of format changes into the PR.
public boolean checkException(Exception e) throws AssertionError { | ||
Throwable ex = e; | ||
|
||
while (ex != null) { | ||
if (expectedErrors.errorIsExpected(ex.getMessage())) { | ||
return; | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function's logic seems to always return inside the while
loop, or run infinitely inside the while
loop (theoretically), or throw an AssertionError
. I don't understand the logic behind changing it to return boolean
. After the modification, it returns true or throws an exception.
Resolves #1241
This PR introduces a standalone SQL reducer mechanism to SQLancer. It enables automated simplification of bug-inducing SQL queries in an offline manner via serialized context. The PR is split into 5 commits for clarity:
SerializableReducerContext
class to store reproducible context (SQL queries, expected errors, and oracle type, etc).SimpleReducer
engine that uses delta-debugging-like reduction to minimize failing SQL inputs. It supports both exception-based and oracle-based reduction (NoRECOracle
,TLPWhereOracle
).TestSimpleReducer.java
class for validation, coveringException
,NoREC
, andTLPWhere
bugs.README.md
.@mrigger @KabilanMA Happy to hear any thoughts or suggestions!