OSSPREY System Architecture

OSSPREY: Open Source Software PRojEct sustainabilitY tracker

Department of Computer Science, University of California, Davis

Abstract

OSSPREY (Open Source PRoject sustainabilitY tracker) is built to help open source projects not just survive, but thrive. Instead of relying on surface-level metrics, OSSPREY looks deeper — mapping how people collaborate, how code evolves, and how projects grow or struggle over time. We combine network analysis, machine learning, and lessons pulled directly from real research to offer practical, evidence-based recommendations — not just vague advice. Our goal is simple: make it easier for maintainers, contributors, and foundations to spot risks early, understand what’s working, and take action to build more sustainable projects. Open source is complex and always changing. OSSPREY gives you tools that grow with you — whether you're steering a small project or managing a major foundation’s portfolio. It’s not about static scores or dashboards; it’s about building a healthier future for your community.

To learn more about OSSPREY, visit our live link at ossprey.netlify.app.

System Architecture

OSSPREY System Architecture
Figure: System architecture of OSSPREY

OSSPREY leverages a robust data processing pipeline, socio-technical network modeling, machine learning predictions, and evidence-driven recommendations to support OSS maintainers and communities. Below is a detailed breakdown of its architecture.

1. Data Scraper (Rust)

The pipeline begins with a fast and reliable data scraping engine implemented in Rust. This component retrieves raw project data from various open-source repositories such as GitHub, Apache, and Eclipse. It collects key information including commit histories, issue threads, pull requests, contributor activity, and email communications when available. Rust was chosen for its performance and memory safety, allowing efficient scraping of large datasets across many projects.

2. Network Generator (Python)

The next stage processes the raw data into structured monthly socio-technical networks. Two types of networks are constructed:

  • Social Networks: Nodes are the developers who communicate via email, lists or issues, and edges indicate their communication. E.g., if A sends an email on the mailing list, and B replies, then A → B. Communication patterns correlate with productivity and organization.
  • Technical Networks: Nodes are the developers (left) and files (right), edges indicate who committed code changes to the files (for tractability we aggregate files by their extensions). Multi-tasking and collaboration behaviors arise in the technical networks.
These dynamic networks offer a detailed view into how OSS communities organize and evolve over time.

3. Health Predictor (Python)

The Health Predictor is a machine learning module trained on historical OSS project data. It takes the extracted network metrics and evaluates the sustainability of a project using features such as:

  • Contributor centrality and diversity
  • Activity bursts and drop-offs
  • Network modularity and communication density
The output is a prediction about the project's health trajectory, such as whether it's on track to succeed, stagnate, or fail.

4. ReACT Recommender (Python)

OSSPREY goes a step further by offering actionable, research-backed advice via the ReACT (Researched Actionables) Recommender. Drawing from peer-reviewed software engineering literature, this module suggests concrete actions tailored to the project’s network profile. Each recommendation is grounded in published evidence and can include strategies like:

  • Implementing mentorship programs
  • Encouraging modularization of the codebase
  • Facilitating structured communication practices
This makes the platform not just diagnostic, but also prescriptive.

5. MongoDB (Data Store)

The entire system is underpinned by a MongoDB database that stores raw inputs, network outputs, prediction results, and recommendations. This NoSQL setup is ideal for managing OSS metadata, which is often semi-structured and variable in format across different ecosystems.

Data Flow Summary

The architecture follows a clean and modular data flow:

Raw OSS Data 
    ↓
Data Scraper (Rust) 
    ↓
Network Generator (Python) 
    ↓
Health Predictor (Python) 
    ↓
ReACT Recommender (Python) 
    ↓
Frontend + MongoDB Storage
          

This system allows OSSPREY to move from passive monitoring to active sustainability guidance, tailored to each project’s unique socio-technical landscape.

Installation Manual

Step 1: Clone All Repositories

OSSPREY consists of multiple modular components. Begin by cloning all required repositories:

git clone https://github.com/OSS-PREY/OSSPREY-FrontEnd-Server
git clone https://github.com/OSS-PREY/OSSPREY-BackEnd-Server
git clone https://github.com/OSS-PREY/OSSPREY-ReACT-API
git clone https://github.com/OSS-PREY/OSSPREY-Pex-Forecaster
git clone https://github.com/OSS-PREY/OSSPREY-OSS-Scraper-Tool

Step 2: Front-End Installation

  1. Ensure Node.js & npm are installed (version 14.x or above):
    sudo apt install npm
  2. Install project dependencies:
    npm install
  3. Start development server (hot reload):
    npm run dev
  4. Build for production:
    npm run build
  5. Clear cache and reinstall (if build issues occur):
    rm -rf node_modules package-lock.json
    npm cache clean --force
    npm install

Step 3: Back-End Installation

  1. Create and activate Python environment (Python 3.10 recommended):
    python3 -m venv venv
    source venv/bin/activate
  2. Install Python dependencies:
    Download requirements.txt from GitHub and run:
    pip install -r requirements.txt
  3. Navigate to OSSPREY-OSS-Scraper-Tool directory:
    cd OSSPREY-OSS-Scraper-Tool
  4. Install Rust and Cargo:
    Follow the official guide at rust-lang.org/tools/install
  5. Enviroment File Configuration
    Create a .env file in the OSSPREY-OSS-Scraper-Tool directory with the following content:
    GITHUB_TOKEN="PERSONAL-TOKEN" 
    Replace PERSONAL-TOKEN with your GitHub personal access token. The token should be fine-grained access tokens with read access to the public repositories on GitHub.
  6. Prepare and build the OSS Scraper tool:
    cargo update
    cargo clean
    cargo build
    cargo fix --bin "miner"
    The built binary will appear inside the target directory.
  7. Navigate to the core backend directory (OSSPREY-BackEnd-Server):
    cd ../OSSPREY-BackEnd-Server
    This directory contains the Flask server that serves API endpoints.
  8. Install MongoDB and Mongosh:
    Use the provided scripts (MongoDB and Mongosh) to install MongoDB and Mongosh:
    sh install_mongo.sh
    sh install_mongosh.sh
  9. Set up the database user in mongosh:
    mongosh
    
    use decal-db
    db.createUser({
      user: "ossprey-backend",
      pwd: "FL3YyVGCr79xlPT0",
      roles: [{ role: "readWrite", db: "decal-db" }]
    })
    
  10. Download Foudational Data: Download the foundational data and place it in the parent directory.
  11. Insert data into MongoDB:
    Run the following script from the given Insert-Data file.
    sh insert_data_to_mongodb.sh
  12. Environment File Configuration
    Create a .env file in the OSSPREY-BackEnd-Server directory with the following content:
    GITHUB_TOKEN_1="PERSONAL-TOKEN-1" 
    GITHUB_TOKEN_2="PERSONAL-TOKEN-2"
    GITHUB_TOKEN_3="PERSONAL-TOKEN-3"
    GITHUB_TOKEN_4="PERSONAL-TOKEN-4"
                
    PEX_GENERATOR_REPO_URL="https://github.com/arjashok/pex-forecaster.git"
    OSS_SCRAPER_REPO_URL="https://github.com/priyalsoni15/OSS-scraper.git"
    PEX_GENERATOR_DIR="/mnt/data1/OSPEX/root-linode/pex-forecaster"
    OSS_SCRAPER_DIR="/OSSPREY-OSS-Scraper-Tool"
    REACT_API_DIR="/OSSPREY-ReACT-API"
    GITHUB_USERNAME="GITHUB_USERNAME"
    MONGODB_URI="mongodb://ossprey-backend:FL3YyVGCr79xlPT0@localhost:27017/decal-db?retryWrites=true&w=majority"
                  
    Replace PERSONAL-TOKEN-1, PERSONAL-TOKEN-2, PERSONAL-TOKEN-3, and PERSONAL-TOKEN-4 with your GitHub personal access tokens. These should be fine-grained access tokens with read access to public repositories. Also, replace GITHUB_USERNAME with your GitHub username.
  13. Start the Flask backend using Gunicorn (production):
    gunicorn -w 4 --max-requests 100 --max-requests-jitter 10 --timeout 120 -b 0.0.0.0:5000 run:app
  14. ngrok http --url=ossprey.ngrok.app 5000
  15. Configuring NGROK:
    • Download and install ngrok from ngrok.com/download
    • Sign up for a free account at ngrok.com/signup
    • Authenticate your ngrok account using the command below:
    • ngrok config add-authtoken YOUR_AUTH_TOKEN
    • Start ngrok to expose your local server to the internet:
    • ngrok http 5000
    • Copy the forwarding URL provided by ngrok (e.g., https://123456789.ngrok.io) and use it to access your local server from anywhere.
    • To stop ngrok, simply press Ctrl + C in the terminal where ngrok is running.
  16. Debug locally using Flask (development):
    python -m flask run
    Or, to allow external access:
    python -m flask run --host=0.0.0.0 --port=5000

Docker Installation Manual

Download the Dockerfile from the given link Use the following commands to build and run the Docker container securely. Ensure that your environment variables and sensitive credentials are properly configured beforehand:

sudo docker build --build-arg GITHUB_USERNAME="priyalsoni15" \
        --build-arg GITHUB_PAT="$(cat ~/.github_pat)" \
        --build-arg GITHUB_TOKEN_1="$(cat ~/.github_token1)" \
        --build-arg GITHUB_TOKEN_2="$(cat ~/.github_token2)" \
        --build-arg GITHUB_TOKEN_3="$(cat ~/.github_token3)" \
        --build-arg GITHUB_TOKEN_4="$(cat ~/.github_token4)" \
        --build-arg MONGODB_USER="priyalsoniwritings" \
        --build-arg MONGODB_PASSWORD="$(cat ~/.mongodb_password)" \
        -t my-secure-image .
        docker run -d --env-file .env -p 5001:5000 my-secure-image
          
Note: This Dockerfile involves some manual steps and may require debugging:
  • The automatic creation of a MongoDB user is not supported and must be done manually.
  • Environment files such as .env must be created manually to securely store sensitive credentials; they are not embedded in the Dockerfile for security reasons.

Acknowledgements

This research was supported by the National Science Foundation under Grant No. 2020751, as well as by the Alfred P. Sloan Foundation through the OSPO for UC initiative (Award No. 2024-22424).


Moreover, we extend our heartfelt gratitude to Likang Yin, Anirudh Ramchandran, and Swati Singhvi for their contributions to the development of APEX and EPEX. Several core functionalities of OSSPREY build upon and extend these earlier tools.

License

The OSSPREY project is licensed under the Apache License 2.0. This permissive license allows you to use, modify, and distribute the software for both personal and commercial purposes, as long as you include proper attribution and comply with the terms outlined in the license.

Contributing

Contributions are very welcome! Whether you're adding features, fixing bugs, or improving documentation, we appreciate your support in making this project better.

Before getting started, please read our Contributing Guidelines to ensure your contributions align with our project's standards and workflows.

Thank you for helping improve this project!

Contact

For high-level discussions, funding opportunities, or collaboration inquiries, please reach out to the project supervisor, Professor Vladimir Filkov (vfilkov@ucdavis.edu).

For technical questions, bug reports, or concerns regarding the codebase, please contact the current tech lead, Nafiz Imtiaz Khan (nikhan@ucdavis.edu).

For general discussions, contributions, and community updates, join our OSSPREY Slack workspace .

We're excited to hear from you!