VAO
SQL Query How to Contribute Logs Sample Plots
 Virtual Astronomical Observatory
Contributing Weblogs
Contributing Service Logs

Contributing your Weblogs to the Harvester

This page describes the steps you need to take to contribute your site's Weblogs to the harvester at JHU.
 
The harvesting is done at JHU with a pull via FTP or HTTP wget of the daily weblog file at some time after midnight. The harvester needs to know the following information to start harvesting your weblogs:
  • the FTP address
  • the account name and password for FTP, if necessary (anonymous preferred)
  • the designated directory where the daily weblog file will be deposited
  • the time at which it will be deposited
  • the file name template that uniquely identifies your daily weblog file (e.g., mastvo<yyyy>-<mm>-<dd>.log -> mastvo2006-10-03.log)
Please email Ani Thakar with this information once your weblogs are ready to be harvested. The weblog file specifications are listed below.
 

 

Specifications for daily log file format:

 Log File format: ASCII CSV (comma separated values)
 String fields: unquoted (so no embedded commas - must be escaped as %2C in URLs)
 Date format: UTC, ISO 8601, i.e. yyyy-mm-ddThh:mm:ss
 Fields in the file (comma separated):
  [date] -- the date/time of the request
  clientIP -- the IP address of the client
  serverUrl -- in case server handles multiple sites
  method -- the operation (GET,POST,...)
  request -- the command executed
  userAgent -- the user agent or browser type
  httpStatus -- the error code if any
  bytesOut -- bytes returned by request
  timeElapsed -- the time it took to execute request in sec
 
 The date, clientIP, method, request and httpStatus fields at least should be filled. Fields not filled should be empty, so an entry without serverUrl and timeElapsed fields filled would look like:
  2006-05-28T13:55:03,JHU,128.220.233.43,,GET,/en/tools/search/xsql.asp?select+top+10+objid+from+phototag,Mozilla,200,3412,
 

NVO Weblog Schema:

Column Format Description Unit
date datetime the date/time of the request
logName varchar(64) name of site/log/service corresponding to this log
clientIP varchar(128) the IP address of the client
serverUrl varchar(1024) in case server handles multiple sites
method varchar(8) the operation (GET,POST,...)
request varchar(4096) the command executed
userAgent varchar(1024) the user agent or browser type
httpStatus int the error code if any
bytesOut bigint bytes returned by request
timeElapsed int the time it took to execute request sec

 

Contributing your Service Logs to the Harvester

The steps you need to take to contribute your site's service logs to the harvester at JHU are listed below.
 
The harvesting is done at JHU with a pull via FTP or HTTP wget of the daily service log file at some time after midnight. The harvester needs to know the following information to start harvesting your logs:
  • the FTP address
  • the account name and password for FTP, if necessary (anonymous preferred)
  • the designated directory where the daily service log file will be deposited
  • the time at which it will be deposited
  • the file name template that uniquely identifies your daily log file (e.g., mastvo<yyyy>-<mm>-<dd>.log -> mastvo2006-10-03.log)
Please email Ani Thakar with this information once your service logs are ready to be harvested. The service log file specifications are listed below.
 

 

Specifications for daily service log file format:

 Log File format: ASCII CSV (comma separated values)
 String fields: unquoted (so no embedded commas - must be escaped as %2C in URLs)
 Date format: UTC, ISO 8601, i.e. yyyy-mm-ddThh:mm:ss
 Fields in the file (comma separated):
  [date] -- the date/time of the request
  clientIP -- the IP address of the client
  server -- in case server handles multiple sites
  accessLevel -- an integer signifying private, public, internal etc. (0=public, 1=private, ...)
  method -- the operation (GET,POST,...)
  userId -- the userid of the submitter
  runId -- a unique request identifier that traces the request handling
  request -- the command executed (e.g. SQL query)
  event -- whether this was a query or some other type of event
 
 The date, clientIP, method, userId, runId and request fields at least should be filled. Fields not filled should be empty.
 

NVO Service Log Schema:

Column Format Description Unit
date datetime the date/time of the request
clientIP varchar(128) IP address request came from
server varchar(128) server that this ran on
accessLevel int public, private, internal etc.
method varchar(32) GET,POST, etc.
userId varchar(64) can be null
runId varchar(128) either passed in, or generated
request varchar(4096) what was asked for
event varchar(64) what it means internally (query, etc.)

Issues currently being discussed regarding service logs: