Track Status & Examine Responses
Overview
By default Parallec only stores the response status code, but not store response content string. Because (1) we would like to save memory when there is a huge response; (2) user can process response with the response handler in a customized way. However, we give option to turn on saving response content a one line change.
Key Classes
The ParallelTaskBuilder.execute(new ParallecResponseHandler() returns a ParallelTask object, which we could use to track the status of the task.
Please review the javadoc for the following classes (click to enter).
By default we execute the task in "synchronous/blocking" mode, which means after the execution line is completed, the task must be in COMPLETED_WITH_ERROR or COMPLETED_WITHOUT_ERROR state.
Review Sample Code for Track Task Progress and Response Status Aggregation
In the example below, we make it run asynchronously by setting async(), then we use a for loop to check the output. This is useful when frontend ajax call to track the task progress.
ParallelTaskResult
This is an important member field in ParallelTask. It is a hashmap which stores the request parameters, host name, ResponseOnSingleTask. Note that by default, the response content/payload is not saved into the ResponseOnSingleTask to save space. User may overwrite this by calling ParallelTaskBuilder.setSaveResponseToTask(true).
ParallelClient pc = new ParallelClient();
ParallelTask task = pc.prepareHttpGet("").async()
.setConcurrency(500)
.setTargetHostsFromLineByLineText("userdata/sample_target_hosts_top100_old.txt",
HostsSourceType.LOCAL_FILE)
.execute(new ParallecResponseHandler() {
@Override
public void onCompleted(ResponseOnSingleTask res,
Map<String, Object> responseContext) {
System.out.println("Responose Code:"
+ res.getStatusCode() + " host: "
+ res.getHost());
}
});
while (!task.isCompleted()) {
try {
Thread.sleep(100L);
System.out.println(String.format(
"POLL_JOB_PROGRESS (%.5g%%) PT jobid: %s",
task.getProgress(), task.getTaskId()));
pc.logHealth();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
System.out
.println("Result Summary\n "
+ PcStringUtils.renderJson(task
.getAggregateResultFullSummary()));
System.out
.println("Result Brief Summary\n "
+ PcStringUtils.renderJson(task
.getAggregateResultCountSummary()));
pc.releaseExternalResources();
Response Status Code Aggregation
As showed in this example, you may call . getAggregateResultCountSummary() .getAggregateResultFullSummary() to get the response status code aggregation.
As shown in this example, as we are in a firewall/proxy controlled place, certain website will timeout
To get Json String: we can use
PcStringUtils.renderJson(task
.getAggregateResultCountSummary() )
To directly get human readable aggregation String:
task.getAggregatedResultHumanStr()
getAggregateResultCountSummary()
To save space, some host names are not displayed here.
{
"301 Moved Permanently": 17,
"302 Found": 9,
"200 OK": 58,
"301 TLS Redirect": 1,
"302 Moved Temporarily": 6,
"404 Not Found": 1,
"200 Ok": 1,
"301 Redirect": 1,
"301 https://www.pinterest.com/": 1,
"302 FOUND": 1,
"FAIL_GET_RESPONSE: HttpWorker Timedout after 15 SEC (no response but no exception catched). Check URL: may be very slow or stuck.": 1
}
getAggregateResultFullSummary()
{
"301 Moved Permanently": {
"count": 17,
"set": [
"www.twitter.com"
...
"www.mail.ru"
]
},
"302 Found": {
"count": 9,
"set": [
"www.facebook.com",
...
"www.vk.com"
]
},
"200 OK": {
"count": 58,
"set": [
"www.amazon.de",
"www.odnoklassniki.ru",
"www.baidu.com",
"www.uol.com.br",
"www.sohu.com",
"www.ifeng.com"
]
},
"301 TLS Redirect": {
"count": 1,
"set": [
"www.wikipedia.org"
]
},
"302 Moved Temporarily": {
"count": 6,
"set": [
"www.imgur.com",
"www.blogger.com",
"www.microsoft.com",
"www.soso.com",
"www.tumblr.com",
"www.weibo.com"
]
},
"404 Not Found": {
"count": 1,
"set": [
"www.googleusercontent.com"
]
},
"200 Ok": {
"count": 1,
"set": [
"www.yandex.ru"
]
},
"301 Redirect": {
"count": 1,
"set": [
"www.yahoo.com"
]
},
"301 https://www.pinterest.com/": {
"count": 1,
"set": [
"www.pinterest.com"
]
},
"302 FOUND": {
"count": 1,
"set": [
"www.instagram.com"
]
},
"FAIL_GET_RESPONSE: HttpWorker Timedout after 15 SEC (no response but no exception catched). Check URL: may be very slow or stuck.": {
"count": 1,
"set": [
"www.qq.com"
]
}
}
Sample ParallelTask Fields
{
"submitTime": "2015.10.13.23.29.24.890-0700",
"executeStartTime": "2015.10.13.23.29.24.964-0700",
"executionEndTime": "2015.10.13.23.29.25.145-0700",
"durationSec": 0.181,
"requestNum": 3,
"requestNumActual": 3,
"responsedNum": 3,
"taskErrorMetas": [],
"responseContext": {},
"state": "COMPLETED_WITHOUT_ERROR",
"taskId": "PT_3_20151013232924890_9a72cf3c-ecf",
...
}
Save ParallelTask to Logs
Both will put the logs of of the complete tasks into
userdata/tasklogs/**filename
- Before execution: enabled by ParallelTaskBuilder.setAutoSaveLogToLocal()
- After getting ParallelTask: call ParallelTask.saveLogToLocal()
Sample ParallelTask Results
http://www.parallec.io/userdata/sample_tasklogs/PT_3_20151013140312854_a8aa7404-515.jsonlog.txt
- 3 website responses: COMPLETED_WITHOUT_ERROR; set as save response back to the task.
- 97 websites task: COMPLETED_WITH_ERROR: canceled by user in the middle
We have plans to add more fields to add to the output.
Health Check
For convenience, JVM Memory usage and Thread infomration can be obtained from the following APIs.
ParallelClient.logHealth(); //jvm memory as string output
MonitorProvider.getInstance().getLiveThreadCount();
MonitorProvider.getInstance().getJVMMemoryUsage();