Creating A CI Workflow To Check For Broken Links

by ADMIN 49 views

Introduction

In today's digital age, maintaining a website or documentation with broken links can be detrimental to user experience and search engine rankings. A Continuous Integration (CI) workflow can help identify and report broken links, ensuring that your online presence remains up-to-date and error-free. In this article, we will explore how to create a CI workflow using GitHub Workflows to check for broken links.

Requirements

Before we dive into the workflow, let's outline the requirements:

  • The workflow must be triggered via GitHub Workflows.
  • It should report errors when broken links are detected.
  • It must generate a report identifying which specific resources have broken links.

Enhanced Prompt

Below is the YAML code for the CI workflow:

name: Check for Broken Links

on:
  push:
    branches: [ main, master ]
  pull_request:
    branches: [ main, master ]
  schedule:
    - cron: '0 0 * * 0'  # Run weekly on Sundays at midnight

jobs:
  check-links:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Node.js
      uses: actions/setup-node@v3
      with:
        node-version: '16'
        
    - name: Install link checker
      run: npm install -g broken-link-checker
      
    - name: Check for broken links
      id: link-check
      run: |
        # Create output directory
        mkdir -p ./link-check-results
        
        # Run link checker and output results to a file
        # Adjust the URL to your site or documentation
        blc https://your-site-or-docs-url.com -ro --filter-level 3 > ./link-check-results/broken-links.txt
        
        # Check if there are broken links and set output variable
        if grep -q "BROKEN" ./link-check-results/broken-links.txt; then
          echo "broken_links=true" >> $GITHUB_OUTPUT
          echo "::error::Broken links detected! See the report for details."
        else
          echo "broken_links=false" >> $GITHUB_OUTPUT
        fi
      
    - name: Generate broken links report
      if: steps.link-check.outputs.broken_links == 'true'
      run: |
        echo "## Broken Links Report" > ./link-check-results/report.md
        echo "" >> ./link-check-results/report.md
        echo "The following broken links were detected:" >> ./link-check-results/report.md
        echo "" >> ./link-check-results/report.md
        grep "BROKEN" ./link-check-results/broken-links.txt | awk '{print "- " $0}' >> ./link-check-results/report.md
      
    - name: Upload broken links report
      if: always()
      uses: actions/upload-artifact@v3
      with:
        name: broken-links-report
        path: ./link-check-results/
        
    - name: Fail if broken links detected
      if: steps.link-check.outputs.broken_links == 'true'
      run: exit 1

Usage Instructions

To set up this CI workflow in your repository, follow these steps:

  1. Create a .github/workflows in your repository if it doesn't exist.
  2. Add a new file (e.g., check-links.yml) in that directory with the contents above.
  3. Customize the URL in the blc https://your-site-or-docs-url.com line to point to your site.
  4. Adjust the branch names and cron schedule as needed.
  5. For repository-specific content (like Markdown files), you may need to modify the script to crawl local files instead.

Alternative Tools

If you're not satisfied with the broken-link-checker tool, you can consider these alternatives:

  • lychee: A fast, async link checker
  • markdown-link-check: Specifically for checking Markdown files
  • muffet: A fast website link checker

Remember to customize the workflow to your specific needs, especially regarding which directories or file types should be checked.

Conclusion

Q: What is a CI workflow, and why do I need it to check for broken links?

A: A CI workflow is a set of automated processes that run on your code repository to ensure that your code is of high quality and meets the required standards. In this case, the CI workflow is used to check for broken links on your website or documentation. This is essential to maintain a good user experience and search engine rankings.

Q: What are the benefits of using a CI workflow to check for broken links?

A: The benefits of using a CI workflow to check for broken links include:

  • Improved user experience: By identifying and fixing broken links, you can ensure that your users have a seamless experience on your website or documentation.
  • Better search engine rankings: Search engines like Google penalize websites with broken links, so fixing them can improve your search engine rankings.
  • Reduced maintenance time: By automating the process of checking for broken links, you can save time and effort in maintaining your website or documentation.

Q: What are the requirements for setting up a CI workflow to check for broken links?

A: The requirements for setting up a CI workflow to check for broken links include:

  • GitHub account: You need a GitHub account to set up a CI workflow.
  • Repository: You need a repository on GitHub to store your code and configuration files.
  • CI workflow configuration file: You need to create a CI workflow configuration file (e.g., check-links.yml) in your repository.
  • Link checker tool: You need to install a link checker tool (e.g., broken-link-checker) in your CI workflow.

Q: How do I customize the CI workflow to check for broken links?

A: You can customize the CI workflow to check for broken links by:

  • Adjusting the URL: You can adjust the URL in the blc command to point to your website or documentation.
  • Changing the branch names: You can change the branch names in the on section to trigger the CI workflow on specific branches.
  • Modifying the script: You can modify the script to crawl local files instead of a specific URL.

Q: What are some alternative tools for checking broken links?

A: Some alternative tools for checking broken links include:

  • lychee: A fast, async link checker
  • markdown-link-check: Specifically for checking Markdown files
  • muffet: A fast website link checker

Q: How do I troubleshoot issues with the CI workflow?

A: You can troubleshoot issues with the CI workflow by:

  • Checking the logs: You can check the logs in the GitHub Actions interface to see if there are any errors or warnings.
  • Inspecting the configuration file: You can inspect the configuration file (e.g., check-links.yml) to ensure that it is correct.
  • Contacting GitHub support: You can contact GitHub support for help with troubleshooting issues.

Q: Can I use this CI workflow to check for broken links on other platforms?

A: Yes, you can use this CI workflow to check for broken links on other platforms by:

  • Modifying the script: You can modify the script to use a different link checker tool or to crawl local files instead of a specific URL.
  • Adjusting the configuration file: You can adjust the configuration file (e.g., check-links.yml) to point to your website or documentation on the other platform.