OSSPREY (Open Source PRoject sustainabilitY tracker) is built to help open source projects not just survive, but thrive. Instead of relying on surface-level metrics, OSSPREY looks deeper — mapping how people collaborate, how code evolves, and how projects grow or struggle over time. We combine network analysis, machine learning, and lessons pulled directly from real research to offer practical, evidence-based recommendations — not just vague advice. Our goal is simple: make it easier for maintainers, contributors, and foundations to spot risks early, understand what’s working, and take action to build more sustainable projects. Open source is complex and always changing. OSSPREY gives you tools that grow with you — whether you're steering a small project or managing a major foundation’s portfolio. It’s not about static scores or dashboards; it’s about building a healthier future for your community.
To learn more about OSSPREY, visit our live link at ossprey.netlify.app.
OSSPREY's architecture comprises four integrated modules that enable real-time sustainability forecasting and actionable recommendation generation.
The scraper module, implemented in Rust, fetches granular monthly activity data for the given GitHub repository using the GraphQL API https://graphql.org/. The scraper extracts commit-level and issue-level metadata, including commit SHAs, timestamps, lines added and deleted, modified files, issue URLs, and comments. The Scraper module's output is passed to the Network Generator module for further processing.
This Python module converts scraped data into structured socio-technical networks by constructing two distinct graphs: Social Network, which captures communication between contributors, and technical network, which maps developers to file types they have contributed to. Monthly network snapshots generated by the module are stored in structured JSON format in an intermediate file system. These graph networks serve as input to the forecasting module and are also used for generating interactive visualizations, such as Sankey diagrams, within the tool.
The forecaster module takes socio-technical network graphs as input and outputs the month-wise sustainability probability of a given project. It computes project metrics and features from the networks following the approach proposed by Yin et al., incorporating additional characteristics such as s_net_overlap
, t_net_overlap
, and st_num_dev
. To make sustainability predictions, we leverage a transformer-based model that provides month-wise sustainability probabilities for any given GitHub project.
This Python-based recommendation engine implements the ReACTs framework, which consists of 105 curated actionables derived from 186 empirical software engineering studies. The module identifies underperforming features and recommends interventions tagged as Critical, Medium, or Low based on their impact, linking each recommendation to the original research publication.
The system follows a modular pipeline architecture where data flows sequentially from GitHub scraping to sustainability forecasting and actionable recommendation delivery. Raw data collected by the Scraper module is written to an intermediate file system in structured format. This data is then passed to the Network Generator module, which constructs monthly socio-technical networks based on developer communication and collaboration patterns. These network snapshots are used to compute project features that serve as input to the Forecaster Module, which estimates month-wise sustainability probabilities, and to the ReACT Recommender, which generates tailored, evidence-based interventions.
All frontend–backend interactions are managed through a Flask-based API layer that coordinates execution and logic across modules. This API layer exposes endpoints for data ingestion, feature access, forecast generation, and recommendation delivery, supporting modularity, asynchronous execution, and real-time updates to the user dashboard. A complete list of endpoints and their functionalities can be found at https://oss-prey.github.io/OSSPREY-Website/#API.
The system avoids reliance on a centralized database. Instead, each module writes and reads intermediate outputs from the file system, allowing for loose coupling and greater transparency in the processing pipeline. The backend is containerized using Docker and served behind an NGINX reverse proxy, which ensures secure, scalable, and low-latency access.
Feature Name | Description |
---|---|
s_num_nodes |
Number of unique active developers in the social network during a given month. |
s_avg_clustering_coef |
Average clustering coefficient in the social network, representing how interconnected a developer’s neighbors are. |
s_graph_density |
Density of the social network graph, measuring overall connectivity between nodes. |
s_num_component |
Number of disconnected components in the social network, indicating fragmentation. |
s_weighted_mean_degree |
Weighted mean degree of the social network, capturing the average number of interactions per contributor. |
s_net_overlap |
Number of developers who were active in both the current and previous month in the social network. |
t_graph_density |
Density of the technical network, reflecting the level of collaboration among developers via shared file modifications. |
t_num_dev_per_file |
Average number of developers modifying each file (developers per file node). |
t_num_dev_nodes |
Number of unique developers active in the technical network during the given month. |
t_num_file_nodes |
Number of unique files modified by developers in the technical network. |
t_num_file_per_dev |
Average number of files modified per developer (files per developer node). |
t_net_overlap |
Number of developers who were consistently active across consecutive months in the technical network. |
st_num_dev |
Number of developers contributing to both social and technical networks, representing integrated participation. |
Endpoint | Purpose |
---|---|
POST /api/upload_git_link |
Uploads a GitHub repository link and returns forecast data, ReACTs, metadata, and raw commit/email data. |
GET /api/projects |
Retrieves all Apache project names. |
GET /api/project_info |
Returns metadata for Apache projects, including sponsor, mentors, and descriptions. |
GET /eclipse/project_info |
Retrieves metadata for Eclipse projects (those with display: true ). |
GET /api/monthly_ranges |
Returns valid month ranges for Apache projects. |
GET /api/grad_forecast/{project_id} |
Fetches sustainability forecast for Apache projects. |
GET /eclipse/grad_forecast/{project_id} |
Fetches sustainability forecast for Eclipse projects. |
GET /api/email_measure/{project_id}/{month} |
Retrieves email-based quantitative metrics. |
GET /api/commit_measure/{project_id}/{month} |
Retrieves commit-based quantitative metrics. |
GET /api/commit_links/{project_id}/{month} |
Fetches commit links for a given month, filtered by developer in frontend. |
GET /api/email_links/{project_id}/{month} |
Fetches email links for a given month, filtered by developer in frontend. |
GET /api/social_net/{project_id}/{month} |
Fetches email interaction network (Apache projects). |
GET /api/tech_net/{project_id}/{month} |
Fetches technical collaboration network (Apache projects). |
GET /eclipse/social_net/{project_id}/{month} |
Fetches Eclipse-specific email interaction network. |
GET /eclipse/tech_net/{project_id}/{month} |
Fetches Eclipse-specific technical collaboration network. |
GET /react_set.json |
Loads a static set of ReACT recommendations. |
GET /foundation.json |
Loads time-series data used to compute ReACTs per project/month. |
This guide walks you through each element of the OSSPREY dashboard and explains how it contributes to understanding project sustainability.
Select a repository from the drop-down list and define the time window with the month slider. The forecast, networks, and actionables instantly refresh to show information for the chosen span of months. This enables fine-grained exploration of longitudinal trends.
The details card summarizes metadata about the selected project such as its incubation status, GitHub link, and popularity metrics like stars or forks. Reviewing this information helps set expectations about activity levels before interpreting the predictive plots.
A line chart visualizes the predicted health score from 0 (low) to 1 (high). Hover over the graph to inspect monthly probabilities. Downward slopes or sharp drops indicate periods of risk where intervention may be required.
Based on deviations in the forecast and socio-technical metrics, this table lists research-backed recommendations. Each item links to the supporting literature and is annotated with a severity level so maintainers can prioritize critical actions.
The social network graph illustrates how contributors communicate across issues and mailing lists. Nodes correspond to developers and edges denote message exchanges. Dense clusters highlight active discussion, while isolated nodes reveal potential disconnects.
This diagram maps collaboration on code by linking developers to file types they modify. Inspecting who works on which parts of the codebase can identify key maintainers, shared ownership, or overly siloed components.
By reviewing these visualizations in concert, you can monitor a project's momentum and apply targeted interventions to support its long-term success.
OSSPREY consists of multiple modular components. Begin by cloning all required repositories:
git clone https://github.com/OSS-PREY/OSSPREY-FrontEnd-Server
git clone https://github.com/OSS-PREY/OSSPREY-BackEnd-Server
git clone https://github.com/OSS-PREY/OSSPREY-ReACT-API
git clone https://github.com/OSS-PREY/OSSPREY-Pex-Forecaster
git clone https://github.com/OSS-PREY/OSSPREY-OSS-Scraper-Tool
sudo apt install npm
npm install
npm run dev
npm run build
rm -rf node_modules package-lock.json
npm cache clean --force
npm install
python3 -m venv venv
source venv/bin/activate
requirements.txt
from
GitHub and run:
pip install -r requirements.txt
cd OSSPREY-OSS-Scraper-Tool
.env
file in the OSSPREY-OSS-Scraper-Tool
directory with the following content:
GITHUB_TOKEN="PERSONAL-TOKEN"
Replace PERSONAL-TOKEN
with your GitHub personal access token. The token should be fine-grained access tokens with read access to the public repositories on GitHub.
cargo update
cargo clean
cargo build
cargo fix --bin "miner"
The built binary will appear inside the target
directory.
cd ../OSSPREY-BackEnd-Server
This directory contains the Flask server that serves API endpoints.
sh install_mongo.sh
sh install_mongosh.sh
mongosh
use decal-db
db.createUser({
user: "ossprey-backend",
pwd: "FL3YyVGCr79xlPT0",
roles: [{ role: "readWrite", db: "decal-db" }]
})
parent
directory.
sh insert_data_to_mongodb.sh
.env
file in the OSSPREY-BackEnd-Server
directory with the following content:
GITHUB_TOKEN_1="PERSONAL-TOKEN-1"
GITHUB_TOKEN_2="PERSONAL-TOKEN-2"
GITHUB_TOKEN_3="PERSONAL-TOKEN-3"
GITHUB_TOKEN_4="PERSONAL-TOKEN-4"
PEX_GENERATOR_REPO_URL="https://github.com/arjashok/pex-forecaster.git"
OSS_SCRAPER_REPO_URL="https://github.com/priyalsoni15/OSS-scraper.git"
PEX_GENERATOR_DIR="/mnt/data1/OSPEX/root-linode/pex-forecaster"
OSS_SCRAPER_DIR="/OSSPREY-OSS-Scraper-Tool"
REACT_API_DIR="/OSSPREY-ReACT-API"
GITHUB_USERNAME="GITHUB_USERNAME"
MONGODB_URI="mongodb://ossprey-backend:FL3YyVGCr79xlPT0@localhost:27017/decal-db?retryWrites=true&w=majority"
Replace PERSONAL-TOKEN-1
, PERSONAL-TOKEN-2
, PERSONAL-TOKEN-3
, and PERSONAL-TOKEN-4
with your GitHub personal access tokens. These should be fine-grained access tokens with read access to public repositories. Also, replace GITHUB_USERNAME
with your GitHub username.
gunicorn -w 4 --max-requests 100 --max-requests-jitter 10 --timeout 120 -b 0.0.0.0:5000 run:app
ngrok config add-authtoken YOUR_AUTH_TOKEN
ngrok http 5000
https://123456789.ngrok.io
) and use it to access your local server from anywhere.Ctrl + C
in the terminal where ngrok is running.python -m flask run
Or, to allow external access:
python -m flask run --host=0.0.0.0 --port=5000
Download the Dockerfile
from the given link
Use the following commands to build and run the Docker container securely. Ensure that your environment variables and sensitive credentials are properly configured beforehand:
sudo docker build --build-arg GITHUB_USERNAME="priyalsoni15" \
--build-arg GITHUB_PAT="$(cat ~/.github_pat)" \
--build-arg GITHUB_TOKEN_1="$(cat ~/.github_token1)" \
--build-arg GITHUB_TOKEN_2="$(cat ~/.github_token2)" \
--build-arg GITHUB_TOKEN_3="$(cat ~/.github_token3)" \
--build-arg GITHUB_TOKEN_4="$(cat ~/.github_token4)" \
--build-arg MONGODB_USER="priyalsoniwritings" \
--build-arg MONGODB_PASSWORD="$(cat ~/.mongodb_password)" \
-t my-secure-image .
docker run -d --env-file .env -p 5001:5000 my-secure-image
.env
must be created manually to securely store sensitive credentials; they are not embedded in the Dockerfile for security reasons.This research was supported by the National Science Foundation under Grant No. 2020751, as well as by the Alfred P. Sloan Foundation through the OSPO for UC initiative (Award No. 2024-22424).
Moreover, we extend our heartfelt gratitude to Likang Yin, Anirudh Ramchandran, and Swati Singhvi for their contributions to the development of APEX and EPEX. Several core functionalities of OSSPREY build upon and extend these earlier tools.
The OSSPREY project is licensed under the Apache License 2.0. This permissive license allows you to use, modify, and distribute the software for both personal and commercial purposes, as long as you include proper attribution and comply with the terms outlined in the license.
Contributions are very welcome! Whether you're adding features, fixing bugs, or improving documentation, we appreciate your support in making this project better.
Before getting started, please read our Contributing Guidelines to ensure your contributions align with our project's standards and workflows.
Thank you for helping improve this project!
For high-level discussions, funding opportunities, or collaboration inquiries, please reach out to the project supervisor, Professor Vladimir Filkov (vfilkov@ucdavis.edu).
For technical questions, bug reports, or concerns regarding the codebase, please contact the current tech lead, Nafiz Imtiaz Khan (nikhan@ucdavis.edu).
For general discussions, contributions, and community updates, join our OSSPREY Slack workspace .
We're excited to hear from you!