hadoop的迭代实在太快,导致出现很多bug。
在运行SYNTH模式时。发现两个问题。
1.官方给出的SYNTH的json脚本:
{ "description" : "tiny jobs workload", //description of the meaning of this collection of workloads "num_nodes" : 10, //total nodes in the simulated cluster "nodes_per_rack" : 4, //number of nodes in each simulated rack "num_jobs" : 10, // total number of jobs being simulated "rand_seed" : 2, //the random seed used for deterministic randomized runs // a list of “workloads”, each of which has job classes, and temporal properties "workloads" : [ { "workload_name" : "tiny-test", // name of the workload "workload_weight": 0.5, // used for weighted random selection of which workload to sample from "queue_name" : "sls_queue_1", //queue the job will be submitted to //different classes of jobs for this workload "job_classes" : [ { "class_name" : "class_1", //name of the class "class_weight" : 1.0, //used for weighted random selection of class within workload //nextr group controls average and standard deviation of a LogNormal distribution that //determines the number of mappers and reducers for thejob. "mtasks_avg" : 5, "mtasks_stddev" : 1, "rtasks_avg" : 5, "rtasks_stddev" : 1, //averge and stdev input param of LogNormal distribution controlling job duration "dur_avg" : 60, "dur_stddev" : 5, //averge and stdev input param of LogNormal distribution controlling mappers and reducers durations "mtime_avg" : 10, "mtime_stddev" : 2, "rtime_avg" : 20, "rtime_stddev" : 4, //averge and stdev input param of LogNormal distribution controlling memory and cores for map and reduce "map_max_memory_avg" : 1024, "map_max_memory_stddev" : 0.001, "reduce_max_memory_avg" : 2048, "reduce_max_memory_stddev" : 0.001, "map_max_vcores_avg" : 1, "map_max_vcores_stddev" : 0.001, "reduce_max_vcores_avg" : 2, "reduce_max_vcores_stddev" : 0.001, //probability of running this job with a reservation "chance_of_reservation" : 0.5, //input parameters of LogNormal distribution that determines the deadline slack (as a multiplier of job duration) "deadline_factor_avg" : 10.0, "deadline_factor_stddev" : 0.001, } ], // for each workload determines with what probability each time bucket is picked to choose the job starttime. // In the example below the jobs have twice as much chance to start in the first minute than in the second minute // of simulation, and then zero chance thereafter. "time_distribution" : [ { "time" : 1, "weight" : 66 }, { "time" : 60, "weight" : 33 }, { "time" : 120, "jobs" : 0 } ] } ]}
首先json文件中不能有任何注释,因此要删除这些注释才能运行。其次在
"deadline_factor_stddev" : 0.001,
这一行最后的逗号不能有,否则不符合json文件的格式,运行报错。
这里给出一个在线查看json文件是否合格的网站:
https://jsonlint.com/
2.在运行过程中
$HADOOP_HOME/share/hadoop/tools/sls/bin/slsrun.sh --tracetype=SYNTH --tracelocation=/home/c/sls/output2/SYNTH.json --output-dir=/home/c/sls/output1 --print-simulation
报错如下:
java.lang.IllegalArgumentException: Null user at org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1225) at org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1212) at org.apache.hadoop.yarn.sls.appmaster.AMSimulator.submitReservationWhenSpecified(AMSimulator.java:177) at org.apache.hadoop.yarn.sls.appmaster.AMSimulator.firstStep(AMSimulator.java:154) at org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:88) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)java.lang.IllegalArgumentException: Null user
报错信息是NULL user传入了。
在最开始传入该参数时:
private void startAMFromSynthGenerator() throws YarnException, IOException { Configuration localConf = new Configuration(); localConf.set("fs.defaultFS", "file:///"); long baselineTimeMS = 0; // if we use the nodeFile this could have been not initialized yet. if (stjp == null) { stjp = new SynthTraceJobProducer(getConf(), new Path(inputTraces[0])); } SynthJob job = null; // we use stjp, a reference to the job producer instantiated during node // creation while ((job = (SynthJob) stjp.getNextJob()) != null) { // only support MapReduce currently String user = job.getUser();
getUser()返回后没有判断是不是为NULL。导致错误。
而对于从SLS和rumen输入的函数,得到user时是做了判断的:
private void createAMForJob(Map jsonJob) throws YarnException { long jobStartTime = Long.parseLong( jsonJob.get(SLSConfiguration.JOB_START_MS).toString()); long jobFinishTime = 0; if (jsonJob.containsKey(SLSConfiguration.JOB_END_MS)) { jobFinishTime = Long.parseLong( jsonJob.get(SLSConfiguration.JOB_END_MS).toString()); } String user = (String) jsonJob.get(SLSConfiguration.JOB_USER); if (user == null) { user = "default"; }
所以我在想是不是hadoop官方对SYNTH的支持不是很完善。