Anubis Ignoring Bot Policy Yaml On Kubernetes

by ADMIN 46 views

Introduction

Anubis is a powerful tool for blocking bots and crawlers on your website. However, when deployed on a Kubernetes cluster, it may ignore the bot policy YAML file. In this article, we will explore the configuration and logs of an Anubis deployment on a K3s Kubernetes cluster to identify the issue.

Configuration

The Anubis deployment is configured using a YAML file that defines the container and volume mounts. The botPolicy.yaml file is mounted as a volume and is used to configure the bot policy.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: anubis
  namespace: anubis
spec:
  selector:
    matchLabels:
      app: anubis
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: anubis
    spec:
      containers:
      - name: anubis
        image: ghcr.io/techarohq/anubis:v1.17.1@sha256:4836f8b86a185f7be92e8c76bf03de73ce9ec793b61a27473a72c41811d07cd5
        env:
          - name: "BIND"
            value: "0.0.0.0:8080"
          - name: "BIND_NETWORK"
            value: "tcp"
          - name: "COOKIE_DOMAIN"
            value: "hks-projekt.at"
          - name: "DIFFICULTY"
            value: "4"
          - name: "ED25519_PRIVATE_KEY_HEX"
            value: "xxx"
          - name: "POLICY_FNAME:"
            value: /data/cfg/botPolicy.yaml
          - name: "SERVE_ROBOTS_TXT"
            value: "1"
          - name: "TARGET"
            value: "http://127.0.0.1:80"
        volumeMounts:
          - name: config-anubis-volume
            mountPath: /data/cfg/botPolicy.yaml
            subPath: botPolicy.yaml
        resources:
          requests:
            memory: "128Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "1"
        ports:
        - containerPort: 8080
      - name: nginx
        image: nginx:1.28.0-alpine@sha256:aed99734248e851764f1f2146835ecad42b5f994081fa6631cc5d79240891ec9
        volumeMounts:
          - name: config-nginx-volume
            mountPath: /etc/nginx/conf.d/default.conf
            subPath: default.conf
        resources:
          requests:
            memory: "128Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
      volumes:
        - name: config-anubis-volume
          configMap:
            name: anubis-config
        - name: config-nginx-volume
          configMap:
            name: nginx-config

Bot Policy YAML

The botPolicy.yaml file is a ConfigMap that defines the bot policy. It includes rules for blocking bots and crawlers, as well as a genericall rule.

apiVersion: v1
kind: ConfigMap
metadata:
  name: anubis-config
  namespace: anubis
data:
  botPolicy.yaml: |
    ## Anubis has the ability to let you import snippets of configuration into the main
    ## configuration file. This allows you to break up your config into smaller parts
    ## that get logically assembled into one big file.
    ##
    ## Of note, a bot rule can either have inline bot configuration or import a
    ## bot config snippet. You cannot do both in a single bot rule.
    ##
    ## Import paths can either be prefixed with (data) to import from the common/shared
    ## rules in the data folder in the Anubis source tree or will point to absolute/relative
    ## paths in your filesystem. If you don't have access to the Anubis source tree, check
    ## /usr/share/docs/anubis/data or in the tarball you extracted Anubis from.

    bots:
    # Pathological bots to deny
    # This correlates to data/bots/ai-robots-txt.yaml in the source tree
    - import: (data)/bots/ai-robots-txt.yaml
    - import: (data)/bots/cloudflare-workers.yaml 
    - import: (data)/bots/headless-browsers.yaml
    - import: (data)/bots/us-ai-scraper.yaml

    # Allow common "keeping the internet working" routes (well-known, favicon, robots.txt)
    - import: (data)/common/keep-internet-working.yaml

    # # Punish any bot with "bot" in the user-agent string
    # # This is known to have a high false-positive rate, use at your own risk
    - name: generic-bot-catchall
      user_agent_regex: (?i:bot|crawler)
      action: CHALLENGE
      challenge:
        difficulty: 16  # impossible
        report_as: 4    # lie to the operator
        algorithm: slow # intentionally waste CPU cycles and time

    # Generic catchall rule
    - name: generic-browser
      user_agent_regex: >-
        Mozilla|Opera
      action: DENY

    - name: jellyfin
      user_agent_regex: 'Ktor client'
      action: ALLOW

    dnsbl: false

    # By default, send HTTP 200 back to clients that either get issued a challenge
    # or a denial. This seems weird, but this is load-bearing due to the fact that
    # the most aggressive scraper bots seem to really really want an HTTP 200 and
    # will stop sending requests once they get it.
    status_codes:
      CHALLENGE: 200
      DENY: 200 

Logs

The logs show that Anubis is ignoring the bot policy YAML file. The REDIRECT_DOMAINS variable is not set, and Anubis is only redirecting to the same domain a request is coming from.

{"time":"2025-05-09T06:34:39.265207479Z","level":"WARN","source":{"function":"main.main","file":"github.com/TecharoHQ/anubis/cmd/anubis/main.go","line":273},"msg":"REDIRECT_DOMAINS is not set, Anubis will only redirect to the same a request is coming from, see https://anubis.techaro.lol/docs/admin/configuration/redirect-domains"}
{"time":"2025-05-09T06:34:39.26586507Z","level":"INFO","source":{"function":"main.main","file":"github.com/TecharoHQ/anubis/cmd/anubis/main.go","line":315},"msg":"listening","url":"http://0.0.0.0:8080","difficulty":4,"serveRobotsTXT":true,"target":"http://127.0.0.1:80","version":"v1.17.1","use-remote-address":false,"debug-benchmark-js":false,"og-passthrough":true,"og-expiry-time":86400000000000,"base-prefix":"","cookie-expiration-time":604800000000000}
{"time":"2025-05-09T06:35:06.591692919Z","level":"INFO","source":{"function":"github.com/TecharoHQ/anubis/lib.(*Server).PassChallenge","file":"github.com/TecharoHQ/anubis/lib/anubis.go","line":295},"msg":"challenge took","user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:138.0) Gecko/20100101 Firefox/138.0","accept_language":"de,en-US;q=0.7,en;q=0.3","priority":"u=0, i","x-forwarded-for":"","x-real-ip":"172.16.1.1","check_result":{"name":"bot/generic-browser","rule":"CHALLENGE"},"elapsedTime":2914}

Conclusion

In conclusion, the issue with Anubis ignoring the bot policy YAML file on a Kubernetes cluster is due to the REDIRECT_DOMAINS variable not being set. This variable is required for Anubis to redirect to the same domain a request is coming from. To fix this issue, you need to set the REDIRECT_DOMAINS variable in the Anubis configuration.

Solution

To solve this issue, you need to set the REDIRECT_DOMAINS variable in the Anubis configuration. You can do this by adding the following environment variable to the Anubis deployment:

env:
  - name: "REDIRECT_DOMAINS"
    value: "your_domain.com"

Replace your_domain.com with the domain you want to redirect to.

Additional Tips

  • Make sure that the botPolicy.yaml file is correctly mounted as a volume in the Anubis deployment.
  • Check that the REDIRECT_DOMAINS variable is set correctly in the Anubis configuration.
  • If you are using a Kubernetes cluster, make sure that the Anubis deployment is running in the correct namespace.

By following these steps, you should be able to fix the issue with Anubis ignoring the bot policy YAML

Introduction

In our previous article, we explored the issue of Anubis ignoring the bot policy YAML file on a Kubernetes cluster. We identified the problem as being due to the REDIRECT_DOMAINS variable not being set. In this article, we will provide a Q&A section to help you better understand the issue and how to fix it.

Q: What is the REDIRECT_DOMAINS variable and why is it important?

A: The REDIRECT_DOMAINS variable is an environment variable that is used by Anubis to determine which domains to redirect to. It is an important variable because it allows Anubis to redirect to the same domain a request is coming from, which is a key feature of the bot policy YAML file.

Q: Why is the REDIRECT_DOMAINS variable not being set in my Anubis deployment?

A: There are several reasons why the REDIRECT_DOMAINS variable may not be being set in your Anubis deployment. Some possible reasons include:

  • The variable is not being defined in the Anubis configuration.
  • The variable is being defined, but it is not being set to the correct value.
  • The variable is being set, but it is not being propagated to the Anubis container.

Q: How do I set the REDIRECT_DOMAINS variable in my Anubis deployment?

A: To set the REDIRECT_DOMAINS variable in your Anubis deployment, you can add the following environment variable to the Anubis configuration:

env:
  - name: "REDIRECT_DOMAINS"
    value: "your_domain.com"

Replace your_domain.com with the domain you want to redirect to.

Q: What are some common mistakes that can cause the REDIRECT_DOMAINS variable to not be set?

A: Some common mistakes that can cause the REDIRECT_DOMAINS variable to not be set include:

  • Not defining the variable in the Anubis configuration.
  • Defining the variable, but not setting it to the correct value.
  • Setting the variable, but not propagating it to the Anubis container.

Q: How do I troubleshoot issues with the REDIRECT_DOMAINS variable not being set?

A: To troubleshoot issues with the REDIRECT_DOMAINS variable not being set, you can try the following:

  • Check the Anubis configuration to ensure that the variable is defined and set to the correct value.
  • Verify that the variable is being propagated to the Anubis container.
  • Check the logs to see if there are any errors related to the REDIRECT_DOMAINS variable.

Q: What are some best practices for setting the REDIRECT_DOMAINS variable in my Anubis deployment?

A: Some best practices for setting the REDIRECT_DOMAINS variable in your Anubis deployment include:

  • Defining the variable in the Anubis configuration.
  • Setting the variable to the correct value.
  • Propagating the variable to the Anubis container.
  • Verifying that the variable is being set correctly in the logs.

Q: Can I use a different variable instead of REDIRECT_DOMAINS?

A: Yes, you can use a different variable instead of REDIRECT_DOMAINS. However, you need to update the Anubis configuration to use the new variable.

Q: How do I update the Anubis configuration to use a different variable?

A: To update the Anubis configuration to use a different variable, you can modify the environment variable in the Anubis configuration to use the new variable. For example:

env:
  - name: "NEW_VARIABLE"
    value: "your_domain.com"

Replace your_domain.com with the domain you want to redirect to.

Conclusion

In conclusion, the REDIRECT_DOMAINS variable is an important environment variable that is used by Anubis to determine which domains to redirect to. By setting this variable correctly in the Anubis configuration, you can ensure that Anubis is redirecting to the correct domains. We hope that this Q&A article has been helpful in answering your questions and providing you with the information you need to troubleshoot issues with the REDIRECT_DOMAINS variable.