The field of data science is accelerating at an unparalleled pace over a wide variety of problem domains. Whether you’re new to the field, transitioning to the field from another discipline, or a tried-and-proven data scientist with years of experience, one common professional strategy for moving forward is to take advantage of the many communities available to data scientists.
In this article, we’ll take a look at a number of valuable community resources that serve the interests of data scientists for – continuing education, solving technical problems, and providing a venue for professional networking.
Kaggle (acquired by Google in 2017) is a prime example of a quality data science community. Originally started as a place for data scientists to compete head-to-head machine learning projects for monetary prizes, this site has evolved into a place for data scientists (both newbies and experienced) to learn and grow.
Those new to the field can try out many proven machine learning techniques by studying the submissions of challenge project competitors. Take the Titanic “Getting Started” challenge for example. The public leaderboard shows nearly 15,000 submissions. The code section includes the source code for many solutions in a variety of programming languages, using many of the top algorithms. And the discussion section has a sense of community with participants discussing all sorts of topics.
Other community features of Kaggle that attract data scientists include a data set repository (now numbering over 115,000), and also free courses.
One common need of all data scientists is technical problem resolution, i.e. what to do with an error message, how to tune an algorithm, and other issues. If you use Google to locate information about a specific error message when coding in R or Python, invariably the top citations lead you to either Stackoverflow, or Quora. As thriving communities of knowledgeable data scientists, you often get helpful and well-crafted explanations for how you can get around errors and technical problems.
You might not immediately think of Twitter as a community, but over the years data science aficionados have made it one. It can take a while to follow the right people, companies, and organizations, but once you do, you’ll have a constant flow of late-breaking news, event details, commentaries, and technical announcements. It’s an excellent way to keep current in this fast-pace field. And you can actually make good friends by replying to Tweets, and also using direct messaging (DM).
LinkedIn was built to establish professional communities of all kinds, and data science folks have taken this quite seriously. There are many compelling LinkedIn Groups with a concentration on data science. Some are hosted by recognized leaders in the global data science community – companies, educational institutions, professional societies, and research labs. Each group includes articles, announcements, and message forums. Some groups like Data Science Central have over 400,000 members.
Meetup.com is a great way to join technical communities of all types. The field of data science is well-represented on Meetup with communities centered on coding (R and Python), data science, machine learning, deep learning, and big data – the list is quite long. The original intent of Meetup was to bring like-minded people together for physical get-togethers. After operating that way for many years, the pandemic brought the meetings online. The virtual meetings weren’t quite as useful for networking opportunities including finding employment, but being virtual meant you could attend a lot more events, both local and over great distances.
Some successful local Meetups took the pandemic as an opportunity to spread their wings and widen their membership. For example, the group previously known as the “LA R User Group” which was based in Los Angeles and had regular in-person meetings, became Real Data Science USA, a national community of data scientists with virtual meetings. Other data science Meetups are bringing the community together locally in-person as well as virtually, such as Data Science Salon Miami, Data Science Salon Austin and Data Science Salon San Francisco.
All data scientists should sign up for Meetup.com and search for groups with a focus on favored areas of technology. Once in-person Meetups resume, you’ll find these (usually weekday evening) meetings of great value. Each one has a featured speaker or panel, talking about a popular and timely technical topic. The meetings usually include a Q&A at the end, and a networking event prior to the presentation. This is a true community where you can meet colleagues, make new friends, make contacts with potential employers, and just learn a lot about new technology.
Most technology vendors and conference sponsors took the pandemic in stride, moving their events online. As with Meetup groups, having conferences virtual means you can attend many more events without the need to schedule a number of days out of the office, not to mention worrying about travel and lodging costs.
Data Science Salon (DSS) offers excellent virtual and hybrid conferences with a focus on applying AI and machine learning techniques in different verticals. The DSS community is active beyond the event series and you can join the network here to access resources and meet other data professionals.
You also can find a number of more special purpose data science communities. Here is a short list to consider:
R is one of the most popular data science programming languages, especially for practitioners coming out of academic statistics programs. R has a huge global following and the R community is well-represented with a plethora of excellent blogs from around the world. Rather than keeping your own favorites list and checking them on a regular basis, the R blog aggregator site R-bloggers is a god send. It saves a lot of time in keeping connected to the community. Subscribers (registration is free) receive a daily email digest of highly relevant blog articles. Each article is technical over a broad range of topics including machine learning algorithms, data munging, visualization, case studies, and so much more.
Reddit offers another R community which is worth evaluating for your purposes including as a learning resource, along with other key subreddits such as r/datascience and r/MachineLearning.
Python is another leading programming language for data science, especially for deep learning problem domains like computer vision, image classification/detection, video classification, speech recognition, chatbots, etc. In addition to data science, Python can also be used for data engineering and as a result there are many Python communities to choose from.
Python-bloggers is a blog aggregation site that offers the same level of convenience and learning resources as R-bloggers. It’s a great way to keep current and connected to the global Python community.
With a field advancing as rapidly as data science, it’s important to establish a strong sense of community in order for practitioners to avoid falling prey to “data science imposter syndrome.” Feelings of inadequacy evaporate when you’re around your fellow community members. Technology communities are a great way not only to keep pace with all the new developments, but they also serve to humanize practitioners. It’s great to hear other perspectives especially when dealing with technology. You really can’t be a member of too many data science communities!