OSSPREY System Architecture

OSSPREY: Open Source Software PRojEct sustainabilitY tracker

Department of Computer Science, University of California, Davis

Abstract

OSSPREY (Open Source PRoject sustainabilitY tracker) is built to help open source projects not just survive, but thrive. Instead of relying on surface-level metrics, OSSPREY looks deeper — mapping how people collaborate, how code evolves, and how projects grow or struggle over time. We combine network analysis, machine learning, and lessons pulled directly from real research to offer practical, evidence-based recommendations — not just vague advice. Our goal is simple: make it easier for maintainers, contributors, and foundations to spot risks early, understand what’s working, and take action to build more sustainable projects. Open source is complex and always changing. OSSPREY gives you tools that grow with you — whether you're steering a small project or managing a major foundation’s portfolio. It’s not about static scores or dashboards; it’s about building a healthier future for your community.

To learn more about OSSPREY, visit our live link at ossprey.netlify.app.

System Modules

OSSPREY's architecture comprises four integrated modules that enable real-time sustainability forecasting and actionable recommendation generation.

OSS Scraper Module

The scraper module, implemented in Rust, fetches granular monthly activity data for the given GitHub repository using the GraphQL API https://graphql.org/. The scraper extracts commit-level and issue-level metadata, including commit SHAs, timestamps, lines added and deleted, modified files, issue URLs, and comments. The Scraper module's output is passed to the Network Generator module for further processing.

Network Generator Module

This Python module converts scraped data into structured socio-technical networks by constructing two distinct graphs: Social Network, which captures communication between contributors, and technical network, which maps developers to file types they have contributed to. Monthly network snapshots generated by the module are stored in structured JSON format in an intermediate file system. These graph networks serve as input to the forecasting module and are also used for generating interactive visualizations, such as Sankey diagrams, within the tool.

Forecaster Module

The forecaster module takes socio-technical network graphs as input and outputs the month-wise sustainability probability of a given project. It computes project metrics and features from the networks following the approach proposed by Yin et al., incorporating additional characteristics such as s_net_overlap, t_net_overlap, and st_num_dev. To make sustainability predictions, we leverage a transformer-based model that provides month-wise sustainability probabilities for any given GitHub project.

ReACT-Recommender Module

This Python-based recommendation engine implements the ReACTs framework, which consists of 105 curated actionables derived from 186 empirical software engineering studies. The module identifies underperforming features and recommends interventions tagged as Critical, Medium, or Low based on their impact, linking each recommendation to the original research publication.

System Architecture

The system follows a modular pipeline architecture where data flows sequentially from GitHub scraping to sustainability forecasting and actionable recommendation delivery. Raw data collected by the Scraper module is written to an intermediate file system in structured format. This data is then passed to the Network Generator module, which constructs monthly socio-technical networks based on developer communication and collaboration patterns. These network snapshots are used to compute project features that serve as input to the Forecaster Module, which estimates month-wise sustainability probabilities, and to the ReACT Recommender, which generates tailored, evidence-based interventions.

All frontend–backend interactions are managed through a Flask-based API layer that coordinates execution and logic across modules. This API layer exposes endpoints for data ingestion, feature access, forecast generation, and recommendation delivery, supporting modularity, asynchronous execution, and real-time updates to the user dashboard. A complete list of endpoints and their functionalities can be found at https://oss-prey.github.io/OSSPREY-Website/#API.

The system avoids reliance on a centralized database. Instead, each module writes and reads intermediate outputs from the file system, allowing for loose coupling and greater transparency in the processing pipeline. The backend is containerized using Docker and served behind an NGINX reverse proxy, which ensures secure, scalable, and low-latency access.

Feature Descriptions of the Forecaster Module

Feature Name Description
s_num_nodes Number of unique active developers in the social network during a given month.
s_avg_clustering_coef Average clustering coefficient in the social network, representing how interconnected a developer’s neighbors are.
s_graph_density Density of the social network graph, measuring overall connectivity between nodes.
s_num_component Number of disconnected components in the social network, indicating fragmentation.
s_weighted_mean_degree Weighted mean degree of the social network, capturing the average number of interactions per contributor.
s_net_overlap Number of developers who were active in both the current and previous month in the social network.
t_graph_density Density of the technical network, reflecting the level of collaboration among developers via shared file modifications.
t_num_dev_per_file Average number of developers modifying each file (developers per file node).
t_num_dev_nodes Number of unique developers active in the technical network during the given month.
t_num_file_nodes Number of unique files modified by developers in the technical network.
t_num_file_per_dev Average number of files modified per developer (files per developer node).
t_net_overlap Number of developers who were consistently active across consecutive months in the technical network.
st_num_dev Number of developers contributing to both social and technical networks, representing integrated participation.

API Endpoints

Endpoint Purpose
POST /api/upload_git_link Uploads a GitHub repository link and returns forecast data, ReACTs, metadata, and raw commit/email data.
GET /api/projects Retrieves all Apache project names.
GET /api/project_info Returns metadata for Apache projects, including sponsor, mentors, and descriptions.
GET /eclipse/project_info Retrieves metadata for Eclipse projects (those with display: true).
GET /api/monthly_ranges Returns valid month ranges for Apache projects.
GET /api/grad_forecast/{project_id} Fetches sustainability forecast for Apache projects.
GET /eclipse/grad_forecast/{project_id} Fetches sustainability forecast for Eclipse projects.
GET /api/email_measure/{project_id}/{month} Retrieves email-based quantitative metrics.
GET /api/commit_measure/{project_id}/{month} Retrieves commit-based quantitative metrics.
GET /api/commit_links/{project_id}/{month} Fetches commit links for a given month, filtered by developer in frontend.
GET /api/email_links/{project_id}/{month} Fetches email links for a given month, filtered by developer in frontend.
GET /api/social_net/{project_id}/{month} Fetches email interaction network (Apache projects).
GET /api/tech_net/{project_id}/{month} Fetches technical collaboration network (Apache projects).
GET /eclipse/social_net/{project_id}/{month} Fetches Eclipse-specific email interaction network.
GET /eclipse/tech_net/{project_id}/{month} Fetches Eclipse-specific technical collaboration network.
GET /react_set.json Loads a static set of ReACT recommendations.
GET /foundation.json Loads time-series data used to compute ReACTs per project/month.

User Guide

This guide walks you through each element of the OSSPREY dashboard and explains how it contributes to understanding project sustainability.

Project Selector

Select a repository from the drop-down list and define the time window with the month slider. The forecast, networks, and actionables instantly refresh to show information for the chosen span of months. This enables fine-grained exploration of longitudinal trends.

Project Details

The details card summarizes metadata about the selected project such as its incubation status, GitHub link, and popularity metrics like stars or forks. Reviewing this information helps set expectations about activity levels before interpreting the predictive plots.

Probability of Sustainability

A line chart visualizes the predicted health score from 0 (low) to 1 (high). Hover over the graph to inspect monthly probabilities. Downward slopes or sharp drops indicate periods of risk where intervention may be required.

Researched Actionables

Based on deviations in the forecast and socio-technical metrics, this table lists research-backed recommendations. Each item links to the supporting literature and is annotated with a severity level so maintainers can prioritize critical actions.

Social Network

The social network graph illustrates how contributors communicate across issues and mailing lists. Nodes correspond to developers and edges denote message exchanges. Dense clusters highlight active discussion, while isolated nodes reveal potential disconnects.

Technical Network

This diagram maps collaboration on code by linking developers to file types they modify. Inspecting who works on which parts of the codebase can identify key maintainers, shared ownership, or overly siloed components.

By reviewing these visualizations in concert, you can monitor a project's momentum and apply targeted interventions to support its long-term success.

Installation Manual

Step 1: Clone All Repositories

OSSPREY consists of multiple modular components. Begin by cloning all required repositories:

git clone https://github.com/OSS-PREY/OSSPREY-FrontEnd-Server
git clone https://github.com/OSS-PREY/OSSPREY-BackEnd-Server
git clone https://github.com/OSS-PREY/OSSPREY-ReACT-API
git clone https://github.com/OSS-PREY/OSSPREY-Pex-Forecaster
git clone https://github.com/OSS-PREY/OSSPREY-OSS-Scraper-Tool

Step 2: Front-End Installation

  1. Ensure Node.js & npm are installed (version 14.x or above):
    sudo apt install npm
  2. Install project dependencies:
    npm install
  3. Start development server (hot reload):
    npm run dev
  4. Build for production:
    npm run build
  5. Clear cache and reinstall (if build issues occur):
    rm -rf node_modules package-lock.json
    npm cache clean --force
    npm install

Step 3: Back-End Installation

  1. Create and activate Python environment (Python 3.10 recommended):
    python3 -m venv venv
    source venv/bin/activate
  2. Install Python dependencies:
    Download requirements.txt from GitHub and run:
    pip install -r requirements.txt
  3. Navigate to OSSPREY-OSS-Scraper-Tool directory:
    cd OSSPREY-OSS-Scraper-Tool
  4. Install Rust and Cargo:
    Follow the official guide at rust-lang.org/tools/install
  5. Enviroment File Configuration
    Create a .env file in the OSSPREY-OSS-Scraper-Tool directory with the following content:
    GITHUB_TOKEN="PERSONAL-TOKEN" 
    Replace PERSONAL-TOKEN with your GitHub personal access token. The token should be fine-grained access tokens with read access to the public repositories on GitHub.
  6. Prepare and build the OSS Scraper tool:
    cargo update
    cargo clean
    cargo build
    cargo fix --bin "miner"
    The built binary will appear inside the target directory.
  7. Navigate to the core backend directory (OSSPREY-BackEnd-Server):
    cd ../OSSPREY-BackEnd-Server
    This directory contains the Flask server that serves API endpoints.
  8. Install MongoDB and Mongosh:
    Use the provided scripts (MongoDB and Mongosh) to install MongoDB and Mongosh:
    sh install_mongo.sh
    sh install_mongosh.sh
  9. Set up the database user in mongosh:
    mongosh
    
    use decal-db
    db.createUser({
      user: "ossprey-backend",
      pwd: "FL3YyVGCr79xlPT0",
      roles: [{ role: "readWrite", db: "decal-db" }]
    })
    
  10. Download Foudational Data: Download the foundational data and place it in the parent directory.
  11. Insert data into MongoDB:
    Run the following script from the given Insert-Data file.
    sh insert_data_to_mongodb.sh
  12. Environment File Configuration
    Create a .env file in the OSSPREY-BackEnd-Server directory with the following content:
    GITHUB_TOKEN_1="PERSONAL-TOKEN-1" 
    GITHUB_TOKEN_2="PERSONAL-TOKEN-2"
    GITHUB_TOKEN_3="PERSONAL-TOKEN-3"
    GITHUB_TOKEN_4="PERSONAL-TOKEN-4"
                
    PEX_GENERATOR_REPO_URL="https://github.com/arjashok/pex-forecaster.git"
    OSS_SCRAPER_REPO_URL="https://github.com/priyalsoni15/OSS-scraper.git"
    PEX_GENERATOR_DIR="/mnt/data1/OSPEX/root-linode/pex-forecaster"
    OSS_SCRAPER_DIR="/OSSPREY-OSS-Scraper-Tool"
    REACT_API_DIR="/OSSPREY-ReACT-API"
    GITHUB_USERNAME="GITHUB_USERNAME"
    MONGODB_URI="mongodb://ossprey-backend:FL3YyVGCr79xlPT0@localhost:27017/decal-db?retryWrites=true&w=majority"
                  
    Replace PERSONAL-TOKEN-1, PERSONAL-TOKEN-2, PERSONAL-TOKEN-3, and PERSONAL-TOKEN-4 with your GitHub personal access tokens. These should be fine-grained access tokens with read access to public repositories. Also, replace GITHUB_USERNAME with your GitHub username.
  13. Start the Flask backend using Gunicorn (production):
    gunicorn -w 4 --max-requests 100 --max-requests-jitter 10 --timeout 120 -b 0.0.0.0:5000 run:app
  14. ngrok http --url=ossprey.ngrok.app 5000
  15. Configuring NGROK:
    • Download and install ngrok from ngrok.com/download
    • Sign up for a free account at ngrok.com/signup
    • Authenticate your ngrok account using the command below:
    • ngrok config add-authtoken YOUR_AUTH_TOKEN
    • Start ngrok to expose your local server to the internet:
    • ngrok http 5000
    • Copy the forwarding URL provided by ngrok (e.g., https://123456789.ngrok.io) and use it to access your local server from anywhere.
    • To stop ngrok, simply press Ctrl + C in the terminal where ngrok is running.
  16. Debug locally using Flask (development):
    python -m flask run
    Or, to allow external access:
    python -m flask run --host=0.0.0.0 --port=5000

Docker Installation Manual

Download the Dockerfile from the given link Use the following commands to build and run the Docker container securely. Ensure that your environment variables and sensitive credentials are properly configured beforehand:

sudo docker build --build-arg GITHUB_USERNAME="priyalsoni15" \
        --build-arg GITHUB_PAT="$(cat ~/.github_pat)" \
        --build-arg GITHUB_TOKEN_1="$(cat ~/.github_token1)" \
        --build-arg GITHUB_TOKEN_2="$(cat ~/.github_token2)" \
        --build-arg GITHUB_TOKEN_3="$(cat ~/.github_token3)" \
        --build-arg GITHUB_TOKEN_4="$(cat ~/.github_token4)" \
        --build-arg MONGODB_USER="priyalsoniwritings" \
        --build-arg MONGODB_PASSWORD="$(cat ~/.mongodb_password)" \
        -t my-secure-image .
        docker run -d --env-file .env -p 5001:5000 my-secure-image
          
Note: This Dockerfile involves some manual steps and may require debugging:
  • The automatic creation of a MongoDB user is not supported and must be done manually.
  • Environment files such as .env must be created manually to securely store sensitive credentials; they are not embedded in the Dockerfile for security reasons.

Acknowledgements

This research was supported by the National Science Foundation under Grant No. 2020751, as well as by the Alfred P. Sloan Foundation through the OSPO for UC initiative (Award No. 2024-22424).


Moreover, we extend our heartfelt gratitude to Likang Yin, Anirudh Ramchandran, and Swati Singhvi for their contributions to the development of APEX and EPEX. Several core functionalities of OSSPREY build upon and extend these earlier tools.

License

The OSSPREY project is licensed under the Apache License 2.0. This permissive license allows you to use, modify, and distribute the software for both personal and commercial purposes, as long as you include proper attribution and comply with the terms outlined in the license.

Contributing

Contributions are very welcome! Whether you're adding features, fixing bugs, or improving documentation, we appreciate your support in making this project better.

Before getting started, please read our Contributing Guidelines to ensure your contributions align with our project's standards and workflows.

Thank you for helping improve this project!

Contact

For high-level discussions, funding opportunities, or collaboration inquiries, please reach out to the project supervisor, Professor Vladimir Filkov (vfilkov@ucdavis.edu).

For technical questions, bug reports, or concerns regarding the codebase, please contact the current tech lead, Nafiz Imtiaz Khan (nikhan@ucdavis.edu).

For general discussions, contributions, and community updates, join our OSSPREY Slack workspace .

We're excited to hear from you!