Six steps to success when reporting open-source computational chemistry issues

9 minute read

Published:

I’ve been working with open-source computational chemistry tools for about a decade, half of which has been in a professional capacity. I’ve worn about every hat - developer, maintainer, confused user, poweruser, typo-producer, and combinations thereof - so I’ve become more familiar with how the sausage is made than I maybe originally hoped to.

A persistent problem in this space is a mismatch between the supply and demand of maintenance: users, especially of popular and valuable libraries, have feature requests and bug reports, but maintainers are almost universally stretched thin on both time and attention. I’ve been on every side of this - a maintainer who falls behind his responsibility, a user who reported bugs to un-maintained projects, a developer who contributed patches to projects that were on their path to abandonment, the list goes on.

Below are a few steps that at best can resolve issues without needing to get a maintainer’s attention and at worst make it as easy as possible for them to help. They are ordered intentionally and could serve as either a programmatic checklist to work through while figuring out a bug or as a general framework in which to think about communicating them.

Check to see if the project is active

It’s an awkward discovery to make, but it happens occasionally. The tool you’ve been using for a while, or maybe even built a business-critical product out of, hasn’t had a release in a while. You’re worried it’s completely abandoned. There’s unfortunately no clear boundary between a project being only a little dusty and simply too rusty to use. The three things I look for, and general timeframes, are

  • Any commits in the past year, even ones that look trivial or are authored by bots.
  • Any activity in the past two years from maintainers/authors on GitHub, even if it’s only an apology that they’re behind on things.
  • Any releases in the past three years, a rough time period after which a release is unlikely to work with new upstreams.

Sometimes a project can be sunset for good reasons, such as being replaced by something newer and better. More commonly, the maintainers have moved on and either signaled the end of the project or no intent to add or welcome new features. No matter the reason, the top of the README file is the place to check. Precisely what to do if a project is pushed into maintenance-only mode is trickier topic and a discussion for another day. But if there is a signal, explicitly or implicitly, that the project is not being maintained, there is less reason to continue through the next steps.

Search the documentation

Hopefully the tools you’re using have some documentation, even if it’s only automatically-generatic or hastily-written code snippets in a README file. No matter what shape it takes, it’s there to be used; the cost that developers (and organizations more broadly) sink into writing and providing documentation is done with the intention that it enables you to use the software with minimal friction. It’s usually built up over time as incremental responses to user feedback and support requests, so common use cases are more likely to be included in example code and prose. This is especially true for older and more mature libraries. For example, loading a SMILES string is basically at the top of RDKit’s documentation, so even a non-chemist can learn how to do that without digging into the guts of the toolkit.

The documentation should be easily accessible, i.e. one click from a GitHub landing page. (If it’s not, or if a link is broken/outdated, the maintainers would probably like to know!) Hopefully it has features such as a navigation pane to organize common topics and a search bar to find something by keyword. It’s natural to reach for Google to search across other websites, but be aware that it likes to index and provide old versions of documentation, even if it’s years out of date and no longer accurate.

Search for existing issues

Users (like you!) are often doing exotic and novel things, especially in open-source computational chemistry which is populated by people who want to push the edges of what’s possible. Sometimes, however, somebody else out there has tried what you’re trying to do and has left a paper trail on GitHub. Older and/or closed issues likely include discussion from maintainers and other users that is helpful. Frequently, older issues don’t precisely match what the user is after but provides some useful context or code snippets that can be adapted to a new use case.

GitHub’s default search tool works fine for this. I don’t recommend using LLMs for this, currently, as training data is sparse and hallucinations are common and harmful.

Even if you’re coming across a corner case that is completely new, signaling to maintainers that you made an effort to search for existing issues is an easy way to build goodwill. (Think “I searched for ‘foo’ and ‘bar’ but couldn’t find any prior discussion of combining it with ‘baz’, so … “.)

Attempt to isolate the behavior in a fresh environment

Even when working in virtual environments (synonymously conda environments), it’s easy for local versions of libraries to diverge from released versions. This is especially common when working with unreleased software, such as installing against a git repo or developing off of feature branch(es).

Virtual environments are designed to be created quickly and easily. This means the cost of deleting and re-creating an environment should be trivial. Try reproducing the error in a fresh environment; if the issue persists, it’s likely to be genuine. Sometimes, however, the proverbial ghost in the machine can be shooed away just by blowing away a virtual environment.

Produce a minimal reproducible example

If the issue appears to be new and can be reproduced in a fresh virtual environment, try to wrap it into a self-contained example, or minimal reproducing example (MRE). The goal is to isolate the problematic behavior (reproduce) with as little code as possible (minimal) and can usually be done in a single script (example). This is standard practice in the field, so plenty is already written about it.

This may seem like duplicated effort (I just showed the error, isn’t that enough?) but for a maintainer to dig into the root cause of an issue, they need to be able to get to your state on their machine. This is only possible by installing the versions of libraries you used and running a script(s) that reproduce the problematic behavior in a self-contained manner. Think about coming into the problem but without the time spend to get there; the maintainer is playing catch-up to everything you’ve doing before discovering the bug.

Remember the human

Most open-source tools are maintained by people who are not - and have not, and likely will not ever be - paid for doing so. As is stated in most licenses, these tools are distributed for free but without warranty or guarantees of fitness. The person dealing with your bug report or feature request is likely doing so in their free time with no expectation of compensation. Even the (relatively few) people who maintain open source tools as part of their day job tend to be overburdened with other responsibilities and aren’t able to budget large portions of their working hours to issue triage, user support, and bug fixes.

It may seem like each of the above steps are self-serving to open-source maintainers, simply diverting users to other solutions than directly getting their attention and support or making the users do some of the work. That’s precisely what’s going on here:

  1. Documentation is a cheap way of demonstrating common use cases, which are probably sufficient for a majority of the user base.
  2. Archiving disussion (and bug reports, user support, etc.) on GitHub provides a relatively straightforward way for issues to not be re-solved repeatedly and for users (including future users) to help each other.
  3. Re-creating creating a virtual environment is a sort of blunt-force object that can remove cruft which may be confounding the apparent issue.
  4. Wrapping the behavior into a minimal reproducing example makes it easier - commonly, just makes it possible - for maintainers to make an attempt at figuring out what’s going wrong.
  5. Keeping reasonable expectations for support from unpaid volunteers is essential for all parties involved.

These steps, considered after evaluating if the project is still active, should make it more likely that your issue is engaged with by maintainers. Working for the Open Force Field Initiative, a project in the Open Molecular Software Foundation, I frequently am on the receiving end of many reports from users across several project. Those which follow some of these steps are much easier to work with than those that don’t, which for practical reasons has bearing on how quickly I’m able to see them to completion. (Frequently the user actually does most of the work and I’m able to quickly turnaround a bugfix with little effort.) I’m guilty about those that I respond to later or slip through the cracks for longer - but you can use these tips to your advantage and, with some non-trivial investment in reporting the issue, keep a spot at the front of the line.