About Cotoyogi Crawler

2024年6月28日

Cotoyogi is a Web crawler (aka robot) operated at Center for Research and Development on Data Lake, ROIS-DS for collecting Japanese language data resources. If the repetitive accesses are annoying to you, please follow the Robots Exclusion methods or contact us as described below. Thank you very much for your cooperation.

Crawler basic information

User agent string:: Mozilla/5.0 (compatible; Cotoyogi/4.0; +https://ds.rois.ac.jp/center8/crawler/)
Range of IP addresses:: 157.1.136.4 - 157.1.136.11

Robots exclusion methods

Method 1. /robots.txt file

This method uses robots.txt file located at the top of your site (e.g., http://www.your-site.com/robots.txt) to specify directives to the crawlers. It is suitable for web server administrators. For more details, please refer to RFC 9309.

For example, the following forbids Cotoyogi to retrieve any content from your site.
Note that the values of Disallow are treated as path prefixes.
```
User-agent: Cotoyogi
Disallow: /
```
Disallow accepts wildcard character "*" and end-of-path designator "$" as well.
For example, the following forbids access to the contents below /images directory as well as the files with .gif suffix.
```
User-agent: Cotoyogi
Disallow: /images/
Disallow: *.gif$
```
If the access frequency matters, specify Crawl-delay parameter.
For example, the following directs Cotoyogi to access the site at most once per 30 seconds.
```
User-agent: Cotoyogi
Crawl-delay: 30.0
```

Method 2. Robots meta tags

This method uses meta tags in HTML documents. In a nutshell, if you put

<META NAME="robots" CONTENT="nofollow">

in the HTML headers, Cotoyogi will not follow the links found in the documents.

How to contact us

If you have requests or questions, feel free to send an email to crawler (at) rois.ac.jp (replace "(at)" with @).
Please clarify host name (and aliases if any) and IP address(es) of your site in the message.