Breaking down my recent Open Source Contribution
I recently landed a PR in RxDB, a fairly large open-source repo. I would like to discuss the issue that my PR addressed, how I came across it, and the entire contribution process.
RxDB, if you're unfamiliar, is a local-first, client-side database. It works by persisting data locally on the client side using a number of storage options and replicating the data to the server in the background. RxDB offers several options to facilitate this replication, one of which is GraphQL - the option I used in my app.
The Problem
To understand the issue I encountered while working with RxDB, I'll need to give a brief overview of GraphQL. GraphQL is a query language that allows us to specify in granular detail the exact data we need from our server. It all starts with a schema which describes the shape of the data and the operations we intend to perform on it:
type User {
id: ID!
isActive: Boolean!
username: String
}
type Todo {
id: ID
title: String!
description: String
completed: Boolean
user: User!
}
input CreateTodoInput {
title: String!
description: String
}
type Query {
allTodos: [Todo]
}
type Mutation {
createTodo(input: CreateTodoInput!): Todo
}
Based on this schema, a client can use a GraphQL query to request data or trigger a mutation. On the server, these queries are validated and executed by resolvers. For example, a query to retrieve allTodos
would look like:
query AllTodos {
allTodos {
id
title
description
completed
user {
id
isActive
username
}
}
}
And the resulting JSON data would look like:
{
"data": {
"allTodos": [
{
"id": "1",
"title": "Buy groceries",
"description": "Milk, eggs, bread",
"completed": false,
"user": {
"id": "1",
"isActive": true,
"username": "johndoe"
}
}
]
}
}
The key thing to note here is that GraphQL queries can support deeply nested structures, as seen in the above example where user
is nested within todos
. The server's resolver is responsible for resolving a field and all its child properties based on the GraphQL query.
Importantly, when a field is nested in a query, we need to specify the individual properties in which we're interested. So a query to get all the sub-fields in the user
field will throw a GraphQLError
error:
query AllTodos {
allTodos {
id
title
description
completed
user
}
}
We will be coming back to this later.
Replication in RxDB operates through a push, pull, and streaming mechanism. You can read more about how that works here. To replicate our changes with GraphQL, we need to create push and pull queries. We then pass these queries to the replicateGraphQL
function provided by RxDB.
RxDB provides a number utility functions, such as pullQueryBuilderFromRxSchema
, which automatically generates the necessary GraphQL queries from the RxDB database schema:
const RXSchema = {
version: 0,
primaryKey: "passportId",
type: "object",
properties: {
passportId: {
type: "string",
maxLength: 100,
},
firstName: {
type: "string",
},
lastName: {
type: "string",
},
age: {
type: "integer",
minimum: 0,
maximum: 150,
},
updatedAt: {
type: "string",
},
address: {
type: "object",
properties: {
street: {
type: "string",
},
city: {
type: "string",
},
zip: {
type: "string",
},
},
},
},
};
const pullQuery = pullQueryBuilderFromRxSchema(RXSchema);
When i tried to use the pullQueryBuilderFromRxSchema
function, I ran into an error:
GraphQLError: Field \"address\" of type \"HumanModel\" must have a selection of subfields.
After logging the return of the pullQueryBuilderFromRxSchema
function, this is what I got in the query field:
query PullHuman($checkpoint: HumanInputCheckpoint, $limit: Int!) {
pullHuman(checkpoint: $checkpoint, limit: $limit) {
documents {
passportId
firstName
lastName
age
updatedAt
address
_deleted
}
checkpoint {
passportId
updatedAt
}
}
}
If you notice, the generated schema does not match our RXschema
- specifically, the pullHuman
query does not include the subfields for the address, hence the error. GraphQL requires that you construct your queries in a way that only returns concrete data. Each field must ultimately resolve to one or more fields.
Searching the RxDB repo (at the time), we can identify the issue in the source of the pullQueryBuilderFromRxSchema
:
export function pullQueryBuilderFromRxSchema(
collectionName: string,
input: GraphQLSchemaFromRxSchemaInputSingleCollection,
): RxGraphQLReplicationPullQueryBuilder<any> {
...
const outputFields = Object.keys(schema.properties).filter(k => !(input.ignoreOutputKeys as string[]).includes(k));
// outputFields.push(input.deletedField);
const checkpointInputName = ucCollectionName + 'Input' + prefixes.checkpoint;
const builder: RxGraphQLReplicationPullQueryBuilder<any> = (checkpoint: any, limit: number) => {
const query = 'query ' + operationName + '($checkpoint: ' + checkpointInputName + ', $limit: Int!) {\n' +
SPACING + SPACING + queryName + '(checkpoint: $checkpoint, limit: $limit) {\n' +
SPACING + SPACING + SPACING + 'documents {\n' +
SPACING + SPACING + SPACING + SPACING + outputFields.join('\n' + SPACING + SPACING + SPACING + SPACING) + '\n' +
SPACING + SPACING + SPACING + '}\n' +
SPACING + SPACING + SPACING + 'checkpoint {\n' +
SPACING + SPACING + SPACING + SPACING + input.checkpointFields.join('\n' + SPACING + SPACING + SPACING + SPACING) + '\n' +
SPACING + SPACING + SPACING + '}\n' +
SPACING + SPACING + '}\n' +
'}';
return {
query,
operationName,
variables: {
checkpoint,
limit
}
};
};
return builder;
}
The issue lies with the outputFields
variable, which only considers the top-level keys of schema.properties
. This explains why I was encountering a problem.
Making the contribution
The issue seemed relatively straightforward to fix. It didn't involve the core database code and was isolated enough for me to understand without knowing the larger codebase.
Following the instructions, I opened an issue and submitted a failing test PR to illustrate the problem here. After receiving feedback from the contributor, I opened a PR which was merged a few days later.
The solution involved using recursion to recursively process the output fields based on the schema. Here's the core function I added o do that:
type GenerateGQLOutputFieldsOptions = {
schema: RxJsonSchema<any> | TopLevelProperty;
spaceCount?: number;
depth?: number;
ignoreOutputKeys?: string[];
};
function generateGQLOutputFields(options: GenerateGQLOutputFieldsOptions) {
const { schema, spaceCount = 4, depth = 0, ignoreOutputKeys = [] } = options;
const outputFields: string[] = [];
const properties = schema.properties;
const NESTED_SPACING = SPACING.repeat(depth);
const LINE_SPACING = SPACING.repeat(spaceCount);
for (const key in properties) {
//only skipping top level keys that are in ignoreOutputKeys list
if (ignoreOutputKeys.includes(key)) {
continue;
}
const value = properties[key];
if (value.type === "object") {
outputFields.push(
LINE_SPACING + NESTED_SPACING + key + " {",
generateGQLOutputFields({
schema: value,
spaceCount,
depth: depth + 1,
}),
LINE_SPACING + NESTED_SPACING + "}"
);
} else {
outputFields.push(LINE_SPACING + NESTED_SPACING + key);
}
}
return outputFields.join("\n");
}
I rarely use recursion in my day job, so it was nice to see a clear use case here.
Learnings
This PR is my biggest contribution to a major open-source project, and I'm quite pleased with the results. Mitchell's blog post on Contributing to Complex Projects served as a helpful guide in this entire process.
My key takeaway is that it's easier to contribute to open-source when you are solving an issue you encountered while using the library. Since I understood the bug I fixed, I was in the best position to implement a fix. However, this may not always be the case.
I hope you enjoyed this walkthrough of the contribution. I'm interested in hearing your thoughts and feedback.
Until next time.