GraphQL as a Database Query Language
January 31, 2018
The GraphQL ecosystem has quickly grown with development tools, client-side framework integrations, reflection-based generators, and backend frameworks. Thus far, the most unexplored area is GraphQL as the native data query language that is interpreted on the database server itself. Taking cues from projects like dgraph (GraphQL over RocksDB), PostGraphile (generated GraphQL endpoint based on PostgreSQL schema), and tuql (generated GraphQL endpoint based on an SQLite database), I propose the development of a new datastore that uses GraphQL as a fully native query language. Key advantages of having a database that understands GraphQL queries are expressive metadata queries, optimized storage, and reduced overhead.
GraphQL can enrich system metadata queries with relevant information. Imagine using GraphQL to query system catalogs, user info, or runtime statistics. For example, a query on running databases could be populated with each user’s names, permission levels, and timestamp of the latest query without table joins that increased the number of results.
At the same time, a GraphQL language layer driving the execution planner has the opportunity to optimize the storage of data. Placing the GraphQL interpreter closer to the storage engine lets the database server make better decisions than the application client about how to store the data. Similar to how column-oriented databases store column values next to each other as opposed to placing full rows side-by-side like row-based databases, a GraphQL-oriented database could group the values of a type together. The indices could even be inferred from the schema based on the arguments specified in the schema’s mutations.
With the language layer running on the database server itself, there would be less tooling to manage at deployment time. Instead of a GraphQL app server fronting a database server, a GraphQL database can replace two processes with one and avoid the additional point of failure. Furthermore, all GraphQL queries would be invoked directly on the database server, saving network round-trips for complex queries.
A proof-of-concept implementation of these features could be a PostgreSQL extension that added a GraphQL interpreter. This would save some work by using an existing storage engine. Further optimizations to the storage layer could be developed later after this strategy proves useful.
In order to reduce the friction of managing such a database, system operations should be exposed as GraphQL mutations. This could be manifested in a default “system” GraphQL schema like system catalogs and tables in other databases. Types could replace table or collection definitions, mutations could replace system commands, and subscriptions could replace triggers.
Ultimately, such a database will make data querying and manipulation more powerful by expressing nested relationships with the natural GraphQL syntax. With GraphQL as the only language on the database side, fewer context switches need to be made during development.