GitHub Integrationยถ

CellMage provides integration with GitHub through the %github magic command, allowing you to fetch GitHub repositories and pull requests directly into your notebook and use them as context for LLM queries.

Installationยถ

To use the GitHub integration, install CellMage with the GitHub extra dependency:

pip install "cellmage[github]"

This will install the required dependencies including PyGithub and python-dotenv.

Configurationยถ

The GitHub integration requires a GitHub personal access token. You can set it using environment variables:

# In your terminal
export GITHUB_TOKEN="your_personal_access_token"

# Or in a .env file
GITHUB_TOKEN=your_personal_access_token

To create a GitHub Personal Access Token:

  • Go to your GitHub account settings โ†’ Developer settings โ†’ Personal access tokens

  • Generate a new token with the repo scope (for private repositories) or just public_repo for public repositories

  • Copy the token and set it as your GITHUB_TOKEN environment variable

Basic Usageยถ

To fetch a specific repository:

%github username/repo

This fetches the repository summary and adds it as a user message in the chat history.

Advanced Usageยถ

Fetching Pull Requestsยถ

You can also fetch a specific pull request from a repository:

%github username/repo --pr 123

Command Optionsยถ

  • --pr ID: Fetch a specific pull request by ID

  • --system: Add content as system message instead of user message

  • --show: Only display the content without adding it to the chat history

  • --clean: Clean the repository content to focus on code (removes non-essential files)

  • --full-code: Include all code content from the repository (may be very large)

  • --exclude-dir PATTERN: Exclude directories matching the pattern (can use multiple times)

  • --exclude-file PATTERN: Exclude files matching the pattern (can use multiple times)

  • --exclude-ext EXT: Exclude files with the specified extension (can use multiple times)

  • --exclude-regex PATTERN: Exclude files matching the regex pattern (can use multiple times)

  • --contributors-months N: Include contributors from the last N months (default: 6)

Examplesยถ

Fetch a repository and add it to history:

%github username/repo

Fetch a repository and add it as system context:

%github username/repo --system

Just view a repository summary without adding to history:

%github username/repo --show

Fetch a pull request:

%github username/repo --pr 123

View a pull request without adding to history:

%github username/repo --pr 123 --show

Exclude certain directories and file types:

%github username/repo --exclude-dir "node_modules" --exclude-ext ".json" --exclude-ext ".md"

Using GitHub Content with LLM Queriesยถ

Once youโ€™ve fetched GitHub content, you can reference it in your LLM queries:

# First, fetch the repository
%github username/repo

# Then, reference it in your prompt
%%llm
Based on the GitHub repository above, can you explain the project architecture and suggest improvements?

Or with pull requests:

# First, fetch the pull request
%github username/repo --pr 123

# Then, use it as context in your prompt
%%llm
Please review the pull request above and suggest any improvements or issues to address.

Troubleshootingยถ

Authentication Issuesยถ

  1. Verify your token is set properly:

    import os
    print("GITHUB_TOKEN is set:", os.environ.get("GITHUB_TOKEN") is not None)
    
  2. Check token scope and permissions:

    • Ensure your token has the required scopes (repo for private repositories, public_repo for public ones)

    • Verify the token hasnโ€™t expired

    • Regenerate the token if necessary

Rate Limitingยถ

GitHub has API rate limits that may affect your usage:

  1. Authenticated rate limits: With a token, you get 5,000 requests per hour

  2. Unauthenticated rate limits: Without a token, only 60 requests per hour

  3. Rate limit errors: If you see 403 Rate Limit Exceeded errors, wait for your rate limit to reset

Repository Access Issuesยถ

  1. Private repositories: Ensure your token has repo scope for accessing private repositories

  2. Organization repositories: You need appropriate organization permissions if accessing org repositories

  3. Repository not found: Check if the repository exists and youโ€™ve spelled the name correctly

Large Repository Problemsยถ

  1. Timeout errors: For very large repositories, you might experience timeouts

  2. Memory issues: Large repositories may cause memory problems

  3. Solutions:

    • Use --clean to reduce the amount of data

    • Use --exclude-dir, --exclude-file, and --exclude-ext to filter content

    • Avoid --full-code for large repositories

For any persistent issues, you can enable debug logging:

import logging
from cellmage.utils.logging import setup_logging
setup_logging(level=logging.DEBUG)
# The logs will be written to cellmage.log in your working directory