GraphQL Cyclic Queries and Depth Limiting

The relational aspect of GraphQL can be a vulnerability exploited by running deep and cyclic queries causing your API to crawl under the load and crash. That's a Denial of Service. Learn how it works and how you can protect your API!

GraphQL Cyclic Queries and Depth Limiting

You get back to work on a Monday to find that the social network you have been developing using GraphQL you just opened to the public is extremely slow to process requests. Yet everything was working fine until now. Response times were low. Users were sending very positive feedback. According to your monitoring platform, there are not more requests than usual, if not less. This is very strange. If there are fewer requests, the backend should be more efficient at handling them all.

By looking at the requests themselves, it turns out that they are different from those normally used by your Progressive Web Application. They were made from the APIs you just opened to the public. The requests generated by the frontend of your application are a minority – probably because users who are fed up with the long loading time have ultimately abandoned your application.

Investigating the problem

The queries in question have a very unusual structure as they reuse the same objects many times in a cyclical manner. In fact, many queries are nested within each other, starting from a search function for which no arguments have been provided.

Here is one of such queries:

query UnknownQuery {
    searchGroups(name:””, limit = 1000000){
        users{
            groups{
                users{
                    groups{
                        users{
                        	id
                        }
                    }
                }
            }
        }
    }
}

This query searches for groups from an empty string, which has the effect of returning all groups, then searches for each of the users in those groups, then lists the groups they are part of, then the members of those groups, then the groups to which those users belong, then the members of those groups, whose IDs are searched. If there are 1000 groups with an average of 20 members and if each user belongs to an average of 5 groups, this gives us about 1000 x 20 x 5 x 20 x 5 x 20 = 200 million identifiers returned per query. If such requests are multiplied, the application consequently needs a lot of computing power and bandwidth to keep up with the load. This results in a significant slowdown of the company's application.

What is going on?

Malicious queries take advantage of a feature of many graph databases: their cyclical nature. Cycles are the result of two or more objects being interdependent. In our case study, a group is a gathering of users. The GraphQL API allows us to look up who are the users of each one of them. Yet, it might also be interesting to know to which groups a user belongs. To do so, we can also rely on the GraphQL API, which can give us the groups of a given user.

The problem arises when these two relationships are used together in a query. In our situation, that is the case when, starting from a “group” node, the graph traversal returns other “group” nodes. The cycle is then repeated numerous times to obtain queries that will both require a large graph traversal and return a myriad of elements. The database engine in charge of carrying out these queries is then very solicited by the graph traversal and – given that the database engine manages to complete the query – the network is solicited by the large amount of data to be transmitted to the sender of the query.

However, this behavior of GraphQL is not wrong in itself since graph databases are precisely designed to represent data in nodes and relations. Still, in our case, it could be interesting to know the most represented groups among the members of a cooking group, for example, with a query like the following:

query GroupsLinkedToCookingGroup {
	searchGroups(name:”Cooking”, limit = 1){
        users{
            groups{
            	name
            }
        }
    }
}

This query will indeed return for each user of the "Cooking" group the name of the groups to which he or she belongs.

We could also be interested in members having more than one group in common, with a query such as the following:

query UsersLinkedToJohDoe {
    searchUsers(name:”John Doe”, limit = 1){
        groups{
            users{
                id
                name
            }
        }
    }
}

It would be counterproductive here to look for a way to prevent these cycles by suppressing the possibility of searching for a member's group or the members of a group. Fortunately, most GraphQL Engines can restrict the depth of the queries they accept. We should be interested in the notion of the depth of a query, which can be restricted through the GraphQL engine.

Moreover, it turns out that the search functions made available here have no particular limit on the volume of data that can be returned. For a large amount of data, the GraphQL engine can also saturate very quickly.


If you want to catch DoS vulnerabilities and 100+ other GraphQL security vulnerabilities before it's too late, check out Escape. Run hundreds of security scans in your CI/CD 🚀


Remediation

Depth is indeed an interesting aspect to investigate since it can be limited in GraphQL. Let's take our malicious query again:

query DDoS { 								# Depth Level = 0
	searchGroups(name: "", limit=1000000) { # Depth Level = 1
    	users {								# Depth Level = 2
        	groups {						# Depth Level = 3
            	users {						# Depth Level = 4
                	groups {				# Depth Level = 5
                    	users {				# Depth Level = 6
                        	id				# Depth Level = 7
                        }
                    }
                }
            }
        }
    }
}

Here, the depth level reaches 7 but this value could be much higher by continuing the nesting of cyclic queries.

Many GraphQL implementations provide a specific parameter that you must set to a given value so that the GraphQL engine automatically ignores queries that exceed this depth level without even starting the evaluation.

There are several ways to proceed, depending on the GraphQL engine you use:

  • Apollo, Express GraphQL, GraphQL Node:

Queries' depth can be limited by using the graphql-depth-limit package (available through npm). Once the package is installed, all you have to do is supply a single parameter at application initialisation :

app.use('/graphql', graphqlServer({
	validationRules: [depthLimit(7)]
}));
  • Hasura Cloud:

From the admin panel in Hasura Cloud, you can set API limits (under API > Security > API Limits) and especially depth limiting for existing user groups.

  • Graphene:

The parameter that can be used to limit query depth is called “max_depth” and can be supplied as a rule to the query validator (graphql.validate) just as follows:

validation_errors = validate(
	schema=schema.graphql_schema,
    document_ast=parse("THE QUERY"),
    rules=(
    	depth_limit_validator(max_depth=7)
    )
)

From the query duration perspective, there is another approach that consists in limiting the query time: the "security timeout". It is no longer a question of blocking requests that are too demanding before they are executed but rather of monitoring their execution time and stopping them if they take too long to execute. Depending on the implementation of GraphQL used, this limitation is done either at the level of the GraphQL engine (at the level of the resolver), at the level of the server (HTTP timeout, for example), or both. This can be used in addition to the depth limit.

You must bear in mind that these limits will apply to all queries. Setting the maximum depth level to 2, for instance, or the timeout to 1 second will block most queries. It is, therefore up to you to assess the maximum reasonable complexity of a query. To do so, you could, for instance review your users’ use cases.  Of course, it is important that the tool remains usable while being robust against attacks. Furthermore, these parameters can still be adjusted afterwards, if you realize that certain needs require a greater depth of query or a longer timeout or if you make new information or functions available to the public.

Conclusion

In conclusion, the sudden performance slowdown of our GraphQL-based social network was a result of malicious queries exploiting cyclic relationships within our data. These queries caused excessive load on our application, leading to a significant degradation in response times. While GraphQL's natural support for graph databases is not inherently flawed, we need to implement safeguards to prevent abuse.

We've discussed two key remediation strategies:

  1. Query Depth Limitation: By restricting the depth of queries, we can prevent overly complex and resource-intensive queries from being executed. Different GraphQL engines offer various methods to set depth limits, such as packages like graphql-depth-limit, API configuration in Hasura Cloud, or rules for query validation in Graphene.
  2. Query Timeout: Implementing query timeouts allows us to monitor query execution times and terminate those that take too long. This complements depth limitations and ensures that queries don't monopolize resources.

It's essential to strike a balance between security and usability, setting reasonable limits based on your users' needs while protecting against potential attacks. Regularly assess and adjust these parameters to accommodate changing requirements and maintain a robust and responsive GraphQL application.

Want to learn more?

💡
Do you prefer hands-on learning about GraphQL Security? Start your lessons with our API Security Academy focused on GraphQL and learn how to build safe GraphQL APIs.