Skip to content

bad performance and high memory usage when returning large array of nested objects #2605

@Felk

Description

@Felk

This is basically a clone of an issue smallrye/smallrye-graphql#930 which I originally opened in SmallRye GraphQL, but it was suspected to be only relevant to graphql-java.

I encountered a performance issue when returning a large array of nested objects as response to a GraphQL query and was wondering whether it would be considered an actual performance issue or out of scope. Here is a minimal test class to illustrate the issue, using graphql-java 17.3 with java 17:

class GraphQLPerformanceTest {
    private static final List<SomeWrapper> objects = IntStream
            .range(0, 10_000_000)
            .mapToObj(i -> new SomeWrapper("value #" + i))
            .collect(Collectors.toList());

    @Value
    public static class SomeWrapper {
        String someValue;
    }

    private final GraphQL graphQL;

    public GraphQLPerformanceTest() {
        TypeDefinitionRegistry typeDefinitionRegistry = new SchemaParser().parse(
                """
                type Query {
                    giveMeLargeResponse: [SomeWrapper]
                }
                type SomeWrapper {
                    someValue: String
                }
                """
        );
        RuntimeWiring wiring = RuntimeWiring.newRuntimeWiring()
                .type(newTypeWiring("Query")
                        .dataFetcher("giveMeLargeResponse", env -> objects))
                .build();
        GraphQLSchema schema = new SchemaGenerator().makeExecutableSchema(typeDefinitionRegistry, wiring);
        this.graphQL = GraphQL.newGraphQL(schema).build();
    }

    @Test
    void testPerformance() {
        graphQL.execute(
            """
            query {
                giveMeLargeResponse {
                    someValue
                }
            }
            """
        );
    }
}

The request takes roughly 12s to return on my Dell precision 7560 notebook with an Intel(R) Core(TM) i9-11950H CPU @ 2.60GHz and intermittedly takes up over 4GB of RAM.

I know 10 million entries is a lot, but I expected the query to return much faster, given that it only needs to serialize and output the data. The majority of the time seems to be spent within completeValueForList:

grafik

The flame graph in IntelliJ looks something like this, so I couldn't see any obvious low hanging fruit:
grafik

I attached the intellij profiling data and the visualvm snapshot data:

GraphQLPerformanceTest_testPerformance_2021_11_04_111402.zip
graphql-java-performance-snapshot.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions