Contributing your Weblogs to the Harvester
|
This page describes the steps you need to take to contribute your site's Weblogs to the harvester at JHU.
|
|
The harvesting is done at JHU with a pull via FTP or HTTP wget of the daily weblog file at some time after midnight.
The harvester needs to know the following information to start harvesting your weblogs:
|
- the FTP address
- the account name and password for FTP, if necessary (anonymous preferred)
- the designated directory where the daily weblog file will be deposited
- the time at which it will be deposited
- the file name template that uniquely identifies your daily weblog file (e.g., mastvo<yyyy>-<mm>-<dd>.log -> mastvo2006-10-03.log)
Please email Ani Thakar with this information once your weblogs are ready to be harvested. The weblog file
specifications are listed below.
|
|
|
|
Specifications for daily log file format: |
| Log File format: ASCII CSV (comma separated values) |
| String fields: unquoted (so no embedded commas - must be escaped as %2C in URLs) |
| Date format: UTC, ISO 8601, i.e. yyyy-mm-ddThh:mm:ss |
| Fields in the file (comma separated): |
| | [date] | -- the date/time of the request |
| | clientIP | -- the IP address of the client |
| | serverUrl | -- in case server handles multiple sites |
| | method | -- the operation (GET,POST,...) |
| | request | -- the command executed |
| | userAgent | -- the user agent or browser type |
| | httpStatus | -- the error code if any |
| | bytesOut | -- bytes returned by request |
| | timeElapsed | -- the time it took to execute request in sec |
|
| The date, clientIP, method, request and httpStatus fields at least should be filled.
Fields not filled should be empty, so an entry without serverUrl and timeElapsed fields filled would look like:
|
| | 2006-05-28T13:55:03,JHU,128.220.233.43,,GET,/en/tools/search/xsql.asp?select+top+10+objid+from+phototag,Mozilla,200,3412,
|
|
Column | Format | Description | Unit |
date | datetime | the date/time of the request |
logName | varchar(64) | name of site/log/service corresponding to this log |
clientIP | varchar(128) | the IP address of the client |
serverUrl | varchar(1024) | in case server handles multiple sites |
method | varchar(8) | the operation (GET,POST,...) |
request | varchar(4096) | the command executed |
userAgent | varchar(1024) | the user agent or browser type |
httpStatus | int | the error code if any |
bytesOut | bigint | bytes returned by request |
timeElapsed | int | the time it took to execute request | sec |
Contributing your Service Logs to the Harvester
|
The steps you need to take to contribute your site's service logs to the harvester at JHU are listed below.
|
|
The harvesting is done at JHU with a pull via FTP or HTTP wget of the daily service log file at some time after midnight.
The harvester needs to know the following information to start harvesting your logs:
|
- the FTP address
- the account name and password for FTP, if necessary (anonymous preferred)
- the designated directory where the daily service log file will be deposited
- the time at which it will be deposited
- the file name template that uniquely identifies your daily log file (e.g., mastvo<yyyy>-<mm>-<dd>.log -> mastvo2006-10-03.log)
Please email Ani Thakar with this information once your service logs are ready to be harvested. The service log file
specifications are listed below.
|
|
|
|
Specifications for daily service log file format: |
| Log File format: ASCII CSV (comma separated values) |
| String fields: unquoted (so no embedded commas - must be escaped as %2C in URLs) |
| Date format: UTC, ISO 8601, i.e. yyyy-mm-ddThh:mm:ss |
| Fields in the file (comma separated): |
| | [date] | -- the date/time of the request |
| | clientIP | -- the IP address of the client |
| | server | -- in case server handles multiple sites |
| | accessLevel | -- an integer signifying private, public, internal etc. (0=public, 1=private, ...) |
| | method | -- the operation (GET,POST,...) |
| | userId | -- the userid of the submitter |
| | runId | -- a unique request identifier that traces the request handling |
| | request | -- the command executed (e.g. SQL query) |
| | event | -- whether this was a query or some other type of event |
|
| The date, clientIP, method, userId, runId and request fields at least should be filled.
Fields not filled should be empty.
|
|