分布式软件测试工具IOR源代码结构分析.doc

资源描述

一、程序的功能 2 二、程序的主要流程 3 三、程序的主要数据结构 5 四、程序的模块 6 4.1 词法分析模块 6 4.2 负载模块 8 4.2.1 参数介绍 10 4.2.2 重要变量 13 4.2.3 任务偏移值的产生（rankOffset） 13 4.2.5 测试文件的命名 15 4.2.6 TestIoSys函数的简要注释说明 17 4.2.7 ReduceIterResults函数的简要注释说明 21 4.2.8 SummarizeResults函数的简要注释说明 23 4.2.9 WriteOrRead函数的简要注释说明 25 4.3 底层函数模块 28 4.3.1 IOR_Xfer_POSIX函数的简要注释说明 28 五、总结 30 5.1 IOR的优点 30 5.2 IOR的缺点 30 5.3 如何修改IOR 30 六、附录 31 6.1 每个文件的作用和信息 31 6.2重要的数据结构 32 6.3 每个c文件所包含的函数 34 6.3.1 Parse_options.c中的函数 34 6.3.2 utilities.c中的函数 36 6.3.3 aiori-POSIX.c中的函数 38 6.3.4 aiori-MPIIO.c中的函数 39 6.3.5 aiori-HDF5.c中的函数 39 6.3.6 aiori-NCMPI.c中的函数 39 6.3.7 IOR.c中的函数 39 一、程序的功能 IOR是测试基准程序，它的功能是接受参数，在client上产生特定的负载，测试系统的系能，并输出测试结果。根据IOR的功能，可以把IOR程序分成三个模块：词法分析模块、负载模块、底层函数模块。输入的参数有三种形式： Ø 选项形式-比如 “-w –r –f –c –Z” 等 Ø 赋值形式，在-O选项后面赋值，比如“ –O api=POSIX” Ø 配置脚本形式，在配置脚本中，给参数结构体IOR_param_t成员赋值，形式如 “–f script.txt”使用配置脚本形式，可以设置几次测试，每次测试的参数都不一样。 IOR能模拟不同的负载，所以负载模块很复杂，其中最重要的函数TestIoSys（）有400+行代码。在TestIoSys（）有许多条件语句，根据参数结构体IOR_param_t里成员的值，产生不同操作。修改IOR，减少不必要的参数，增加读写比例参数，就需要修改结构体IOR_param_t，和修改TestIoSys（）函数。二、程序的主要流程简略分析main函数流程，下面是main函数的不完全代码： int main(int argc, char ** argv) { IOR_queue_t * tests; /*IOR_queue_t是参数结构体IOR_param_t的链表结构*/ tests = SetupTests(argc,argv); /*接受参数设置，并检查参数，然后填充参数结构体*/ While(tests != NULL) { TestIoSys(&tests->testParameters); /*根据参数结构体的成员，进行测试*/ test = test->nextTest; /*下一个测试*/ } } ParseCommandLine //词法分析命令参数，并填充参数结构体 DistributeHints //把环境变量分发到各个进程 VaildTests //检查各个参数结构体内的参数是否有效 TimeDeviation //检查每个任务之间开始时间的偏差 SeedRandGen //产生随机数 ShowTest //显示参数信息 AioriBind //装填I/O接口 ShowSetup //显示设置信息 SummarizeResults //显示测试结果 Time0 //定时器伪代码 ReduceIterResults //归约操作，计算结果三、程序的主要数据结构两个重要的数据结构是： aiori.h中定义的参数结构体IOR_param_t。用来填充参数。 IOR.h中定义的参数结构体队列IOR_queue_t。运行一次程序可以执行几次参数不同的测试。 typedef struct { char debug[MAX_STR]; /* debug info string */ unsigned int mode; /* file permissions */ unsigned int openFlags; /* open flags */ int TestNum; /* test reference number */ char api[MAX_STR]; /* API for I/O */ …… int numTasks; /* number of tasks for test */ int nodes; /* number of nodes for test */ int tasksPerNode; /* number of tasks per node */ int repetitions; /* number of repetitions of test */ ….. int readFile; /* read of existing file */ int writeFile; /* write of file */ …. int keepFile; /* don't delete the testfile on exit */ …. /* POSIX variables */ int singleXferAttempt; /* do not retry transfer if incomplete */ int fsYncPerWrite; /* fsync() after each write */ int fsync; /* fsync() after write */ /* MPI variables */ MPI_Datatype transferType; /* datatype for transfer */ MPI_Datatype fileType; /* filetype for file view */ /* HDF5 variables */ int individualDataSets; /* datasets not shared by all procs */ int noFill; /* no fill in file creation */ IOR_offset_t setAlignment; /* alignment in bytes */ /* NCMPI variables */ int var_id; /* variable id handle for data set */ /* Lustre variables */ int lustre_stripe_count; int lustre_stripe_size; int lustre_start_ost; int lustre_ignore_locks; } IOR_param_t; typedef struct IOR_queue_t { IOR_param_t testParameters; struct IOR_queue_t * nextTest; } IOR_queue_t; 四、程序的模块 4.1 词法分析模块该模块包含的C文件有 Parse_options.c 该文件包括defaults.h头文件，defaults.h头文件是一个参数结构体defaultParameters的定义，该结构体包含默认的参数值。 Parse_options.c包含5个主要函数，如下表所示。函数名功能 ParseCommandLine 对命令行参数进行词法分析 ParseLine 对一行字符串调用DecodeDirective函数进行词法分析 ReadConfigScript 读取配置脚本，分配并填充参数结构体 DecodeDirective 分析诸如“transferSize=64k”一样的偶对，并给参数结构体中的transferSize赋值。 CheckRunSettings 检查和纠正每次测试参数的参数值。 1.参数结构体中的writeFile、readFile、checkWrite、checkRead值为FALSE时，设置readFile和writeFile的值为TRUE。 2.当参数结构体中的numTasks为0时，设置numTasks=MPI进程数 ParseCommandLine函数是词法分析模块的最主要的函数，它和其他函数的关系如下图小结：参数的输入形式有三种，但对于我们编写的聚合带宽程序只需要一种输入形式，即命令行形式。所以我们只使用ParseCommandLine函数，其他函数可以删除。 4.2 负载模块该模块包含的C文件有 IOR.C utilities.c IOR.C文件中有主函数main，SetupTests函数，TestIOSys函数，WriteOrRead函数负载模块的函数关系图如下： 4.2.1 参数介绍编号参数参数作用描述所对应的参数结构体成员成员类型默认值重要程度 1 -A # 测试标志，可以在输出结果上显示，作为标记。 test reference number for easier test identification in log files TestNum 数值 -1 2 -a [POSIX|MPIIO] API接口选项 api 字符串 POSIX 重要 3 -b # 数据块大小 blockSize 数值 1048576 重要 4 -B 用于POSIX接口，使用直接I/O，不使用I/O缓存。 useO_DIRECT 布尔 0 一般 5 -c 聚合(collective)I/O，MPI的文件操作为组调用 collective 布尔 0 一般 6 -C 读操作时，读取其他任务所写的文件。所有任务的任务偏移量是常量。 reorderTasks 布尔 0 一般 7 -Q # 任务偏移量，用于-C –Z选项 taskPerNodeOffset 数值 1 一般 8 -Z 所有任务的任务偏移量不是常量，是随机产生的。 reorderTasksRandom 布尔 0 一般 9 -X # 与-Z选项一起使用。当>0时，每次循环的随机种子都一样，当<0时，每次循环的随机种子都不一样。 reorderTasksRandomSeed 数值 0 一般 10 -d # 每次循环间的延迟时间 interTestDelay 数值 0 11 -D # 每次读写操作的限定时长。数值0表示关闭该选项。 deadlineForStonewalling 数值 0 12 -Y 每次POSIX write操作后都执行同步sync。 fsYncPerWrite 布尔 0 一般 13 -e 每次循环都执行同步sync。 fsync 布尔 0 一般 14 -E 每次写访问前不删除已经存在的文件（使用已经存在的文件）。 useExistingTestFile 布尔 0 15 -f S 配置文件的名字。 16 -F 并发模式(file-per-process) filePerProc 布尔 0 重要 17 -g 所用任务同步进行打开文件操作，同步进行写操作，同步进行读操作，同步进行关闭操作。 use barriers between open, write/read, and close intraTestBarriers 布尔 0 18 -G # 设置时间戳 setTimeStampSignature 数值 0 19 -h 显示帮助文件 showHelp 布尔 0 需要 20 -H 显示提示。 show hints（mpi可能会根据hints进行优化操作，提高性能） showHints 布尔 21 -i # 每次测试的循环次数。 repetitions 数值 1 重要 22 -j # 显示哪个进程操作时间已经超出了平均操作时间#秒。数值0表示不使用该选项。 outlierThreshold 数值 0 23 -J # 设置HDF5数据对齐。 setAlignment 字符串 1 24 -k 退出程序后，不删除测试文件。 keepFile 布尔 0 25 -K 数据检查后，保留错误的文件。(通过设置，可以在读写操作后，进行数据检查) keepFileWithError 布尔 0 26 -l 把文件偏移值写到文件中。 use file offset as stored signature storeFileOffset 布尔 0 27 -m 使用循环次序号作为测试文件名字的一部分。也就是每次循环都创建新的文件。 multiFile 布尔 0 一般 28 -n 创建HDF5文件时不填充。 noFill 布尔 0 29 -N # IOR任务数，当IOR任务数大于MPI任务数时，IOR任务数被设定为MPI任务数的值。数值0表示IOR任务数等于MPI任务数。 numTasks 数值 0 重要 30 -o S 测试文件的名字。注意：在filePerProc被设置的情况下，各个任务会取用多文件名下的单文件名‘-o S1@S2@S3 ’。取用的方式是循环取用。如任务0使用的测试文件名字是‘S1’，任务1使用的测试文件名字是‘S2’，任务2使用的测试文件名诗‘S3’,任务3使用的测试文件名是‘S1’,任务4…… S1、S2、S3可以是不同文件系统下的文件名。 testFileName 字符串 testFile 重要 31 -O S IOR指令字符串。 32 -p 预先分配文件空间。（对于MPI） preallocate 布尔 0 重要 33 -q 当出现文件检查错误时，退出。 quitOnError 布尔 0 34 -r 读文件 readFile 布尔 1 重要 35 -R 读文件后，再读一次，检查是否出错。（数据检查） checkRead 布尔 0 36 -s 文件段数。 segmentCount 数值 1 重要 37 -t 传输大小 transferSize 数值 262144 重要 38 -T # 设定测试时间最大值。数值0表示关闭该选项。 maxTimeDuration 数值 0 39 -u 对于并发模式，每个任务使用一个工作目录。 uniqueDir 布尔 0 一般 40 -U 提示文件的文件名（包含很多hints，传递hints到程序中，并设置hints，hints作为许多MPI操作函数的参数，影响MPI I/O的性能） full name for hints file hintsFileName 字符串 41 -v 输出结果。 verbose 数值 0 42 -V 使用MPI_File_set_view（MPI中） use MPI_File_set_view useFileView 布尔一般 43 -w 写文件 writeFile 布尔 1 重要 44 -W 写文件后，读回文件，检查是否出错。（数据检查） checkWrite 布尔 0 45 -x 假如传输出错，不再次进行传输。 singleXferAttempt 布尔一般 46 -z 随机偏移(randomOffset)，也就是随机访问。 access is to random, not sequential, offsets within a file 该选项与一下选项不可兼容： checkRead storeFileOffset MPIIO collective of useFileView HDF5 or NCMPI randomOffset 布尔 0 一般 4.2.2 重要变量 extern IOR_param_t defaultParameters, initialTestParams; extern int errno; /* error number */ extern char ** environ; int totalErrorCount = 0; int numTasksWorld = 0; /*一共有多少个任务*/ int rank = 0; /*任务编号*/ int rankOffset = 0; /*任务偏移值*/ int tasksPerNode = 0; /* tasks per node */ int verbose = VERBOSE_0; /* verbose output */ double wall_clock_delta = 0; double wall_clock_deviation; MPI_Comm testComm; 4.2.3 任务偏移值的产生（rankOffset）任务偏移值(rankOffset)的作用是，当任务读文件时，任务会读第(rank+rankOffset)%test->numTasks任务所写的文件。以下代码是TestIoSys函数中的部分代码，作用是产生任务偏移值 …… /* *读文件，并在每次I/O操作间计时。 */ if (test->readFile && (maxTimeDuration ? (GetTimeStamp() - startTime < maxTimeDuration) : 1)) { /*基于 -C,-Z,-Q,-X 选项，每个任务为读操作产生任务偏移值*/ /*当设置-C –Q选项时，产生的任务偏移值是常量*/ if (test->reorderTasks) { rankOffset = (test->taskPerNodeOffset*test->tasksPerNode) % test->numTasks; } /* 当设置-Z –Q (-X)选项时，产生的任务偏移值是随机的*/ if (test->reorderTasksRandom) { /*以下操作不会与文件偏移值（-Z选项）冲突，因为GetOffsetArrayRandom函数产生文件偏移值 */ int *rankoffs, *filecont, *filehits, ifile, jfile, nodeoffset; unsigned int iseed0; nodeoffset = test->taskPerNodeOffset; nodeoffset = (nodeoffset < test->nodes) ? nodeoffset : test->nodes-1; iseed0= (test->reorderTasksRandomSeed < 0) ? (-1*test->reorderTasksRandomSeed+rep):test->reorderTasksRandomSeed; srand(rank+iseed0); rankOffset = rand() % test->numTasks; while (rankOffset < (nodeoffset*test->tasksPerNode)) { rankOffset = rand() % test->numTasks; } …… } 4.2.5 测试文件的命名每个任务调用GetTestFileName函数就能得到测试文件的文件名。测试文件的文件名与参数结构体变量中成员（filePerProc、uniqueDir、multiFile、repCounter）有关。以下是GetTestFileName的源代码： void GetTestFileName(char * testFileName, IOR_param_t * test) { char ** fileNames, initialTestFileName[MAXPATHLEN], testFileNameRoot[MAX_STR], tmpString[MAX_STR]; int count; strcpy(initialTestFileName, test->testFileName); fileNames = ParseFileName(initialTestFileName, &count); /*1*/ if (count > 1 && test->uniqueDir == TRUE) /*2*/ ERR("cannot use multiple file names with unique directories"); if (test->filePerProc) /*3*/ { strcpy(testFileNameRoot, fileNames[((rank+rankOffset)%test->numTasks) % count]); /*4*/ } else { strcpy(testFileNameRoot, fileNames[0]); /*5*/ } /* give unique name if using multiple files */ if (test->filePerProc) /*6*/ { /** prepend rank subdirectory before filename, e.g., /dir/file => /dir/<rank>/file*/ if (test->uniqueDir == TRUE) /*7*/ { strcpy(testFileNameRoot, PrependDir(test, testFileNameRoot)); /*8*/ } sprintf(testFileName, "%s.%08d", testFileNameRoot, (rank+rankOffset)%test->numTasks); /*9*/ } else { strcpy(testFileName, testFileNameRoot); /*10*/ } /* add suffix for multiple files */ if (test->repCounter > -1) /*11*/ { sprintf(tmpString, ".%d", test->repCounter); strcat(testFileName, tmpString); /*12*/ } } /* GetTestFileName() */ 为了便于分析，我们设num=(rank+rankOffset)%test->numTasks。在第1行中，ParseFileName函数对参数结构体中成员testFileName进行词法分析，比如当testFileName=”/tmp/myfs@/mnt/testfs@/mnt/nfs”时，fileNames[0]=” /tmp/myfs”， fileNames[1]=” /mnt/testfs”， fileNames[2]=” /mnt/nfs”。第8行中，PrependDir函数能把”/mnt/testfs”变成”/mnt/<num>/testfs”。第11行中，当设置-m选项时，即参数结构体成员multiFile被设置为1时，给测试文件名加上后缀，后缀是循环序号。当count=1时，testFileName=” /mnt/testfs”，各个任务的测试文件名如下： count=1 uniqueDir=1 uniqueDir=0 filePerProc = 1 multiFile=1 /mnt/<num>/testfs.<num>.<rep> /mnt/testfs.<num>.<rep> multiFile=0 /mnt/<num>/testfs.<num> /mnt/testfs.<num> filePerProc = 0 (Single shared file) multiFile=1 /mnt/testfs.<rep> multiFile=0 /mnt/testfs 4.2.6 TestIoSys函数的简要注释说明 void TestIoSys(IOR_param_t *test) { char testFileName[MAX_STR]; double * timer[12]; double startTime; int i, rep, maxTimeDuration; void * fd; MPI_Group orig_group, new_group; int range[3]; IOR_offset_t dataMoved; /* for data rate calculation */ /*创建测试所需要的通信域(MPI)*/ …… /*求出每个节点的任务数*/ test->tasksPerNode = CountTasksPerNode(test->numTasks, testComm); /*初始化计时数组*/ …… /*初始化每次循环用于保存结果的数组*/ …… /*bind I/O calls to specific API */ AioriBind(test->api); /*计算聚合文件的大小*/ for (rep = 0; rep < test->repetitions; rep++) { test->aggFileSizeFromCalc[rep] = test->blockSize * test->segmentCount * test->numTasks; } /*记录开始时间*/ startTime = GetTimeStamp(); /*循环测试,每个循环中有四种主要操作：写文件、检查写、读文件、检查读*/ for (rep = 0; rep < test->repetitions; rep++) { /*每次循环测试由任务0记录开始时间，并向所有任务广播*/ …… /* 使用循环序号作为测试文件名的一部分，影响GetTestFileName 函数的结果*/ if (test->multiFile) test->repCounter = rep; /*写文件，并计时*/ if (test->writeFile) { GetTestFileName(testFileName, test); DelaySecs(test->interTestDelay); if (test->useExistingTestFile == FALSE) RemoveFile(testFileName, test->filePerProc, test); MPI_CHECK(MPI_Barrier(testComm), "barrier error"); test->open = WRITE; timer[0][rep] = GetTimeStamp(); fd = IOR_Create(testFileName, test); timer[1][rep] = GetTimeStamp(); if (test->intraTestBarriers) MPI_CHECK(MPI_Barrier(testComm), "barrier error"); timer[2][rep] = GetTimeStamp(); dataMoved = WriteOrRead(test, fd, WRITE); timer[3][rep] = GetTimeStamp(); if (test->intraTestBarriers) MPI_CHECK(MPI_Barrier(testComm), "barrier error"); timer[4][rep] = GetTimeStamp(); IOR_Close(fd, test); timer[5][rep] = GetTimeStamp(); MPI_CHECK(MPI_Barrier(testComm), "barrier error"); /* get the size of the file just written */ test->aggFileSizeFromStat[rep] = IOR_GetFileSize(test, testComm, testFileName); /* check if stat() of file doesn't equal expected file size, use actual amount of byte moved */ CheckFileSize(test, dataMoved, rep); ReduceIterResults(test, timer, rep, WRITE); if (test->outlierThreshold) CheckForOutliers(test, timer, rep, WRITE); }/*if writeFile*/ /*检查写操作*/ if (test->checkWrite) { MPI_CHECK(MPI_Barrier(testComm), "barrier error"); if (test->corruptFile) CorruptFile(testFileName, test, rep, WRITECHECK); MPI_CHECK(MPI_Barrier(testComm), "barrier error"); if (test->reorderTasks) rankOffset = (2*test->tasksPerNode) % test->numTasks; GetTestFileName(testFileName, test); test->open = WRITECHECK; fd = IOR_Open(testFileName, test); dataMoved = WriteOrRead(test, fd, WRITECHECK); IOR_Close(fd, test); rankOffset = 0; } /*读文件，并计时*/ if (test->readFile) { /*产生任务偏移值*/ …… GetTestFileName(testFileName, test); DelaySecs(test->interTestDelay); MPI_CHECK(MPI_Barrier(testComm), "barrier error"); test->open = READ; timer[6][rep] = GetTimeStamp(); fd = IOR_Open

展开阅读全文