OSSPREY (Open Source PRoject sustainabilitY tracker) is built to help open source projects not just survive, but thrive. Instead of relying on surface-level metrics, OSSPREY looks deeper — mapping how people collaborate, how code evolves, and how projects grow or struggle over time. We combine network analysis, machine learning, and lessons pulled directly from real research to offer practical, evidence-based recommendations — not just vague advice. Our goal is simple: make it easier for maintainers, contributors, and foundations to spot risks early, understand what’s working, and take action to build more sustainable projects. Open source is complex and always changing. OSSPREY gives you tools that grow with you — whether you're steering a small project or managing a major foundation’s portfolio. It’s not about static scores or dashboards; it’s about building a healthier future for your community.
To learn more about OSSPREY, visit our live link at ossprey.netlify.app.
OSSPREY leverages a robust data processing pipeline, socio-technical network modeling, machine learning predictions, and evidence-driven recommendations to support OSS maintainers and communities. Below is a detailed breakdown of its architecture.
The pipeline begins with a fast and reliable data scraping engine implemented in Rust. This component retrieves raw project data from various open-source repositories such as GitHub, Apache, and Eclipse. It collects key information including commit histories, issue threads, pull requests, contributor activity, and email communications when available. Rust was chosen for its performance and memory safety, allowing efficient scraping of large datasets across many projects.
The next stage processes the raw data into structured monthly socio-technical networks. Two types of networks are constructed:
The Health Predictor is a machine learning module trained on historical OSS project data. It takes the extracted network metrics and evaluates the sustainability of a project using features such as:
OSSPREY goes a step further by offering actionable, research-backed advice via the ReACT (Researched Actionables) Recommender. Drawing from peer-reviewed software engineering literature, this module suggests concrete actions tailored to the project’s network profile. Each recommendation is grounded in published evidence and can include strategies like:
The entire system is underpinned by a MongoDB database that stores raw inputs, network outputs, prediction results, and recommendations. This NoSQL setup is ideal for managing OSS metadata, which is often semi-structured and variable in format across different ecosystems.
The architecture follows a clean and modular data flow:
Raw OSS Data
↓
Data Scraper (Rust)
↓
Network Generator (Python)
↓
Health Predictor (Python)
↓
ReACT Recommender (Python)
↓
Frontend + MongoDB Storage
This system allows OSSPREY to move from passive monitoring to active sustainability guidance, tailored to each project’s unique socio-technical landscape.
OSSPREY consists of multiple modular components. Begin by cloning all required repositories:
git clone https://github.com/OSS-PREY/OSSPREY-FrontEnd-Server
git clone https://github.com/OSS-PREY/OSSPREY-BackEnd-Server
git clone https://github.com/OSS-PREY/OSSPREY-ReACT-API
git clone https://github.com/OSS-PREY/OSSPREY-Pex-Forecaster
git clone https://github.com/OSS-PREY/OSSPREY-OSS-Scraper-Tool
sudo apt install npm
npm install
npm run dev
npm run build
rm -rf node_modules package-lock.json
npm cache clean --force
npm install
python3 -m venv venv
source venv/bin/activate
requirements.txt
from
GitHub and run:
pip install -r requirements.txt
cd OSSPREY-OSS-Scraper-Tool
.env
file in the OSSPREY-OSS-Scraper-Tool
directory with the following content:
GITHUB_TOKEN="PERSONAL-TOKEN"
Replace PERSONAL-TOKEN
with your GitHub personal access token. The token should be fine-grained access tokens with read access to the public repositories on GitHub.
cargo update
cargo clean
cargo build
cargo fix --bin "miner"
The built binary will appear inside the target
directory.
cd ../OSSPREY-BackEnd-Server
This directory contains the Flask server that serves API endpoints.
sh install_mongo.sh
sh install_mongosh.sh
mongosh
use decal-db
db.createUser({
user: "ossprey-backend",
pwd: "FL3YyVGCr79xlPT0",
roles: [{ role: "readWrite", db: "decal-db" }]
})
parent
directory.
sh insert_data_to_mongodb.sh
.env
file in the OSSPREY-BackEnd-Server
directory with the following content:
GITHUB_TOKEN_1="PERSONAL-TOKEN-1"
GITHUB_TOKEN_2="PERSONAL-TOKEN-2"
GITHUB_TOKEN_3="PERSONAL-TOKEN-3"
GITHUB_TOKEN_4="PERSONAL-TOKEN-4"
PEX_GENERATOR_REPO_URL="https://github.com/arjashok/pex-forecaster.git"
OSS_SCRAPER_REPO_URL="https://github.com/priyalsoni15/OSS-scraper.git"
PEX_GENERATOR_DIR="/mnt/data1/OSPEX/root-linode/pex-forecaster"
OSS_SCRAPER_DIR="/OSSPREY-OSS-Scraper-Tool"
REACT_API_DIR="/OSSPREY-ReACT-API"
GITHUB_USERNAME="GITHUB_USERNAME"
MONGODB_URI="mongodb://ossprey-backend:FL3YyVGCr79xlPT0@localhost:27017/decal-db?retryWrites=true&w=majority"
Replace PERSONAL-TOKEN-1
, PERSONAL-TOKEN-2
, PERSONAL-TOKEN-3
, and PERSONAL-TOKEN-4
with your GitHub personal access tokens. These should be fine-grained access tokens with read access to public repositories. Also, replace GITHUB_USERNAME
with your GitHub username.
gunicorn -w 4 --max-requests 100 --max-requests-jitter 10 --timeout 120 -b 0.0.0.0:5000 run:app
ngrok config add-authtoken YOUR_AUTH_TOKEN
ngrok http 5000
https://123456789.ngrok.io
) and use it to access your local server from anywhere.Ctrl + C
in the terminal where ngrok is running.python -m flask run
Or, to allow external access:
python -m flask run --host=0.0.0.0 --port=5000
Download the Dockerfile
from the given link
Use the following commands to build and run the Docker container securely. Ensure that your environment variables and sensitive credentials are properly configured beforehand:
sudo docker build --build-arg GITHUB_USERNAME="priyalsoni15" \
--build-arg GITHUB_PAT="$(cat ~/.github_pat)" \
--build-arg GITHUB_TOKEN_1="$(cat ~/.github_token1)" \
--build-arg GITHUB_TOKEN_2="$(cat ~/.github_token2)" \
--build-arg GITHUB_TOKEN_3="$(cat ~/.github_token3)" \
--build-arg GITHUB_TOKEN_4="$(cat ~/.github_token4)" \
--build-arg MONGODB_USER="priyalsoniwritings" \
--build-arg MONGODB_PASSWORD="$(cat ~/.mongodb_password)" \
-t my-secure-image .
docker run -d --env-file .env -p 5001:5000 my-secure-image
.env
must be created manually to securely store sensitive credentials; they are not embedded in the Dockerfile for security reasons.This research was supported by the National Science Foundation under Grant No. 2020751, as well as by the Alfred P. Sloan Foundation through the OSPO for UC initiative (Award No. 2024-22424).
Moreover, we extend our heartfelt gratitude to Likang Yin, Anirudh Ramchandran, and Swati Singhvi for their contributions to the development of APEX and EPEX. Several core functionalities of OSSPREY build upon and extend these earlier tools.
The OSSPREY project is licensed under the Apache License 2.0. This permissive license allows you to use, modify, and distribute the software for both personal and commercial purposes, as long as you include proper attribution and comply with the terms outlined in the license.
Contributions are very welcome! Whether you're adding features, fixing bugs, or improving documentation, we appreciate your support in making this project better.
Before getting started, please read our Contributing Guidelines to ensure your contributions align with our project's standards and workflows.
Thank you for helping improve this project!
For high-level discussions, funding opportunities, or collaboration inquiries, please reach out to the project supervisor, Professor Vladimir Filkov (vfilkov@ucdavis.edu).
For technical questions, bug reports, or concerns regarding the codebase, please contact the current tech lead, Nafiz Imtiaz Khan (nikhan@ucdavis.edu).
For general discussions, contributions, and community updates, join our OSSPREY Slack workspace .
We're excited to hear from you!