For a long time I wanted to contribute “at some point” to an open source project. That ideal moment was always postponed until later, when I had more experience and more time. So when I was read about the Data Umbrella Sprint on several telegram channels related to python and programming, I didn’t hesitate and signed up. It was the perfect opportunity to force myself to learn. There’s nothing like a little social pressure to push the gears of a procrastinator.
To be honest, I didn’t knew Data Umbrella. It’s an organization that is concerned with providing support to underrepresented groups, whether by gender, race, age, sexual orientation or otherwise, in the fields of Machine Learning, Data Science and Artificial Intelligence. The sprint held on June 26th had a focus on Latin America, because of the low participation in these topics. Data Umbrella’s work with underrepresented groups is very valuable in breaking down all the myths and entry barriers that may be holding back new talents.
What I liked most about the Data Umbrella Sprint was the organization: they had a very precise checklist of the topics to review, with videos explaining each step by step. It was easy to estimate how much time you needed to prepare or learn before the sprint. This really helped assuring you would have all the required knowledge for the sprint. The use of discord also helped a lot to provide a community and informal aspect, and served to answer questions and get to know each other. Getting into a new group is always difficult, and for newbies the challenge may be even greater. Having a pre-sprint and post-sprint helps to consolidate the human and community aspect, solve the technical problems that always arise, and gain confidence.
During the sprint, we worked with pair-programming and proved to be of great help. With Leonardo Rocco we worked on 2 issues:
- DOC Ensures that ARDRegression passes numpydoc validation #20381
- DOC ensures FastICA estimator passes the numpydoc validation #20405
I can proudly say that both pull requests have been accepted!
Reflecting on the sprint experience, I realize that I had the expectation that I had too many things to learn before doing anything useful to open source.
In the sprint I learned that you don’t need to be a super-programmer to contribute to open source. The reality is that, to begin with, there is no single way to contribute. There is an endless range of possible jobs, from the simplest to the most advanced, and a long learning curve. Therefore, it is important to realize that it is not that “you don’t know” but that “you don’t know yet“, and that there is a community willing to support you in that learning process. We are all in a learning process, from newbies to seasoned coders. Getting involved in collaborative projects is precisely a way to accelerate learning, and in the process, contribute to the libraries you use the most.
There is a great report of the sprint on Reshama’s blog. The participant’s distribution by country is quite surprising. I would have guessed a more uniform distribution, while it’s largely concentrated on Argentina and Brasil. We will have to do something about that!
Besides community and individual contributions, another big piece on open code are grants and funding from companies. Go ask your manager for budget to financing the tools you use day-to-day! In particular, this sprint was funded in part by a grant from Code for Science & Society. Community and transparency at its best: you can get all the details for the grant online: Grant number GBMF8449 at Gordon and Betty Moore Foundation.