특정 노드에 MPI 프로세스 매핑하기

나는이 질문을하기에는 불리하다고 생각한다. 그러나 나 자신을 도울 수 없었다. 각 노드에 16 개의 코어가있는 노드가 100 개인 클러스터가 있다고 가정합니다. 통신 패턴이 이미 알려진 mpi 응용 프로그램이 있고 클러스터 토폴로지 (노드 간 홉 거리)도 알고 있습니다. 이제 네트워크상의 경합을 줄이는 노드 매핑 프로세스를 알고 있습니다. 예를 들어 프로세스 대 노드 맵핑은 10-> 20,30-> 90입니다. 순위 10의 프로세스를 노드 -20에 어떻게 매핑합니까? 도와주세요.특정 노드에 MPI 프로세스 매핑하기

출처

2013-01-19 Srini

어떤 종류의 대기열 시스템에도 제약이 없으면 자신의 machinefile을 생성하여 순위 대 노드 매핑을 제어 할 수 있습니다. 파일 my_machine_file이 다음에게 1,600 선

node001 
    node002 
    node003 
    .... 
    node100 
    node001 
    node002 
    node003 
    .... 
    node100 
    ... 
    [repeat 13 more times] 
    ... 
    node001 
    node002 
    node003 
    .... 
    node100

이 매핑에 해당하는 것이있는 경우

는 예를 들어

0-> node001, 1 -> node002, ... 99 -> node100, 100 -> node001, ...

당신은

mpirun -machinefile my_machine_file -n 1600 my_app

와 응용 프로그램을 실행해야 할 때를 응용 프로그램에 1600 개가 넘는 프로세스가 필요합니다. machinef를 편집 할 수 있습니다. 이에 따라.

클러스터 관리자는 아마도 상호 연결의 토폴로지를 고려한 노드의 번호를 매겼을 것입니다. 그러나 클러스터 토폴로지를 신중하게 활용하여 성능이 현저히 향상되는 것으로 나타났습니다 (10 % -20 %의 순서). (참고 문헌).

참고 :mpirun으로 MPI 프로그램을 시작하는 것은 표준화되거나 이식되지 않습니다. 그러나 여기서 질문은 명확하게 특정 컴퓨팅 클러스터 및 특정 구현 (OpenMPI)과 관련이 있으며 휴대용 솔루션을 요청하지 않습니다.

출처

2013-01-19 07:51:55

빠른 응답을 주셔서 감사합니다. – Srini

@ srini 수정. 모든 코어는 동일한 노드에 상주하므로 mpirun과 구분할 수 없습니다. OS 스케줄러는 프로세스를 코어에 매핑합니다. 코어에 대한 프로세스 선호도는 [별도의 문제] (http://blogs.cisco.com/performance/open-mpi-v1-5-processor-affinity-options/)입니다. –

이것은 맥락에서 벗어날 수 있지만 사실 Open MPI를 사용하면 주어진 노드의 특정 코어에 대한 각 개별 순위의 매핑을 지정할 수 있습니다. 이것은'-rf' 옵션과 함께'rankfile '을'mpirun'에 넘김으로써 가능합니다. –

이 파티에 조금 늦지 만, 여기에 C + +의 서브 루틴이 있습니다.이 서브 루틴은 노드 통신자와 마스터 커뮤니케이터 (노드의 마스터 만 해당)뿐만 아니라 각각의 크기와 순위를 제공합니다. 서투르지 만 불행히도 더 좋은 방법을 찾지 못했습니다. 운좋게도 벽 시간에 약 0.1 초를 추가합니다. 어쩌면 당신이나 다른 누군가가 그것을 사용하게 될 것입니다.

#define MASTER 0 

using namespace std; 

/* 
* Make a comunicator for each node and another for just 
* the masters of the nodes. Upon completion, everyone is 
* in a new node communicator, knows its size and their rank, 
* and the rank of their master in the master communicator, 
* which can be useful to use for indexing. 
*/ 
bool CommByNode(MPI::Intracomm &NodeComm, 
       MPI::Intracomm &MasterComm, 
       int &NodeRank, int &MasterRank, 
       int &NodeSize, int &MasterSize, 
       string &NodeNameStr) 
{ 
    bool IsOk = true; 

    int Rank = MPI::COMM_WORLD.Get_rank(); 
    int Size = MPI::COMM_WORLD.Get_size(); 

    /* 
    * ====================================================================== 
    * What follows is my best attempt at creating a communicator 
    * for each node in a job such that only the cores on that 
    * node are in the node's communicator, and each core groups 
    * itself and the node communicator is made using the Split() function. 
    * The end of this (lengthly) process is indicated by another comment. 
    * ====================================================================== 
    */ 
    char *NodeName, *NodeNameList; 
    NodeName = new char [1000]; 
    int NodeNameLen, 
     *NodeNameCountVect, 
     *NodeNameOffsetVect, 
     NodeNameTotalLen = 0; 
    // Get the name and name character count of each core's node 
    MPI::Get_processor_name(NodeName, NodeNameLen); 

    // Prepare a vector for character counts of node names 
    if (Rank == MASTER) 
     NodeNameCountVect = new int [Size]; 

    // Gather node name lengths to master to prepare c-array 
    MPI::COMM_WORLD.Gather(&NodeNameLen, 1, MPI::INT, NodeNameCountVect, 1, MPI::INT, MASTER); 

    if (Rank == MASTER){ 
     // Need character count information for navigating node name c-array 
     NodeNameOffsetVect = new int [Size]; 
     NodeNameOffsetVect[0] = 0; 
     NodeNameTotalLen = NodeNameCountVect[0]; 

     // build offset vector and total char count for all node names 
     for (int i = 1 ; i < Size ; ++i){ 
      NodeNameOffsetVect[i] = NodeNameCountVect[i-1] + NodeNameOffsetVect[i-1]; 
      NodeNameTotalLen += NodeNameCountVect[i]; 
     } 
     // char-array for all node names 
     NodeNameList = new char [NodeNameTotalLen]; 
    } 

    // Gatherv node names to char-array in master 
    MPI::COMM_WORLD.Gatherv(NodeName, NodeNameLen, MPI::CHAR, NodeNameList, NodeNameCountVect, NodeNameOffsetVect, MPI::CHAR, MASTER); 

    string *FullStrList, *NodeStrList; 
    // Each core keeps its node's name in a str for later comparison 
    stringstream ss; 
    ss << NodeName; 
    ss >> NodeNameStr; 

    delete NodeName; // node name in str, so delete c-array 

    int *NodeListLenVect, NumUniqueNodes = 0, NodeListCharLen = 0; 
    string NodeListStr; 

    if (Rank == MASTER){ 
     /* 
     * Need to prepare a list of all unique node names, so first 
     * need all node names (incl duplicates) as strings, then 
     * can make a list of all unique node names. 
     */ 
     FullStrList = new string [Size]; // full list of node names, each will be checked 
     NodeStrList = new string [Size]; // list of unique node names, used for checking above list 
     // i loops over node names, j loops over characters for each node name. 
     for (int i = 0 ; i < Size ; ++i){ 
      stringstream ss; 
      for (int j = 0 ; j < NodeNameCountVect[i] ; ++j) 
       ss << NodeNameList[NodeNameOffsetVect[i] + j]; // each char into the stringstream 
      ss >> FullStrList[i]; // stringstream into string for each node name 
      ss.str(""); // This and below clear the contents of the stringstream, 
      ss.clear(); // since the >> operator doesn't clear as it extracts 
      //cout << FullStrList[i] << endl; // for testing 
     } 
     delete NodeNameList; // master is done with full c-array 
     bool IsUnique; // flag for breaking from for loop 
     stringstream ss; // used for a full c-array of unique node names 
     for (int i = 0 ; i < Size ; ++i){ // Loop over EVERY name 
      IsUnique = true; 
      for (int j = 0 ; j < NumUniqueNodes ; ++j) 
       if (FullStrList[i].compare(NodeStrList[j]) == 0){ // check against list of uniques 
        IsUnique = false; 
        break; 
       } 
      if (IsUnique){ 
       NodeStrList[NumUniqueNodes] = FullStrList[i]; // add unique names so others can be checked against them 
       ss << NodeStrList[NumUniqueNodes].c_str(); // build up a string of all unique names back-to-back 
       ++NumUniqueNodes; // keep a tally of number of unique nodes 
      } 
     } 
     ss >> NodeListStr; // make a string of all unique node names 
     NodeListCharLen = NodeListStr.size(); // char length of all unique node names 
     NodeListLenVect = new int [NumUniqueNodes]; // list of unique node name lengths 
     /* 
     * Because Bcast simply duplicates the buffer of the Bcaster to all cores, 
     * the buffer needs to be a char* so that the other cores can have a similar 
     * buffer prepared to receive. This wouldn't work if we passed string.c_str() 
     * as the buffer, becuase the receiving cores don't have string.c_str() to 
     * receive into, and even if they did, c_srt() is a method and can't be used 
     * that way. 
     */ 
     NodeNameList = new char [NodeListCharLen]; // even though c_str is used, allocate necessary memory 
     NodeNameList = const_cast<char*>(NodeListStr.c_str()); // c_str() returns const char*, so need to recast 
     for (int i = 0 ; i < NumUniqueNodes ; ++i) // fill list of unique node name char lengths 
      NodeListLenVect[i] = NodeStrList[i].size(); 
     /*for (int i = 0 ; i < NumUnique ; ++i) 
      cout << UniqueNodeStrList[i] << endl; 
     MPI::COMM_WORLD.Abort(1);*/ 
     //delete NodeStrList; // Arrays of string don't need to be deallocated, 
     //delete FullStrList; // I'm guessing becuase of something weird in the string class. 
     delete NodeNameCountVect; 
     delete NodeNameOffsetVect; 
    } 
    /* 
    * Now we send the list of node names back to all cores 
    * so they can group themselves appropriately. 
    */ 

    // Bcast the number of nodes in use 
    MPI::COMM_WORLD.Bcast(&NumUniqueNodes, 1, MPI::INT, MASTER); 
    // Bcast the full length of all node names 
    MPI::COMM_WORLD.Bcast(&NodeListCharLen, 1, MPI::INT, MASTER); 

    // prepare buffers for node name Bcast's 
    if (Rank > MASTER){ 
     NodeListLenVect = new int [NumUniqueNodes]; 
     NodeNameList = new char [NodeListCharLen]; 
    } 

    // Lengths of node names for navigating c-string 
    MPI::COMM_WORLD.Bcast(NodeListLenVect, NumUniqueNodes, MPI::INT, MASTER); 
    // The actual full list of unique node names 
    MPI::COMM_WORLD.Bcast(NodeNameList, NodeListCharLen, MPI::CHAR, MASTER); 

    /* 
    * Similar to what master did before, each core (incl master) 
    * needs to build an actual list of node names as strings so they 
    * can compare the c++ way. 
    */ 
    int Offset = 0; 
    NodeStrList = new string[NumUniqueNodes]; 
    for (int i = 0 ; i < NumUniqueNodes ; ++i){ 
     stringstream ss; 
     for (int j = 0 ; j < NodeListLenVect[i] ; ++j) 
      ss << NodeNameList[Offset + j]; 
     ss >> NodeStrList[i]; 
     ss.str(""); 
     ss.clear(); 
     Offset += NodeListLenVect[i]; 
     //cout << FullStrList[i] << endl; 
    } 
    // Now since everyone has the same list, just check your node and find your group. 
    int CommGroup = -1; 
    for (int i = 0 ; i < NumUniqueNodes ; ++i) 
     if (NodeNameStr.compare(NodeStrList[i]) == 0){ 
      CommGroup = i; 
      break; 
     } 
    if (Rank > MASTER){ 
     delete NodeListLenVect; 
     delete NodeNameList; 
    } 
    // In case process fails, error prints and job aborts. 
    if (CommGroup < 0){ 
     cout << "**ERROR** Rank " << Rank << " didn't identify comm group correctly." << endl; 
     IsOk = false; 
    } 

    /* 
    * ====================================================================== 
    * The above method uses c++ strings wherever possible so that things 
    * like node name comparisons can be done the c++ way. I'm sure there's 
    * a better way to do this because that was way too many lines of code... 
    * ====================================================================== 
    */ 

    // Create node communicators 
    NodeComm = MPI::COMM_WORLD.Split(CommGroup, 0); 
    NodeSize = NodeComm.Get_size(); 
    NodeRank = NodeComm.Get_rank(); 

    // Group for master communicator 
    int MasterGroup; 
    if (NodeRank == MASTER) 
     MasterGroup = 0; 
    else 
     MasterGroup = MPI_UNDEFINED; 

    // Create master communicator 
    MasterComm = MPI::COMM_WORLD.Split(MasterGroup, 0); 
    MasterRank = -1; 
    MasterSize = -1; 
    if (MasterComm != MPI::COMM_NULL){ 
     MasterRank = MasterComm.Get_rank(); 
     MasterSize = MasterComm.Get_size(); 
    } 

    MPI::COMM_WORLD.Bcast(&MasterSize, 1, MPI::INT, MASTER); 
    NodeComm.Bcast(&MasterRank, 1, MPI::INT, MASTER); 

    return IsOk; 
}

출처

2013-12-12 20:34:50 twilsonco

특정 노드에 MPI 프로세스 매핑하기

답변

관련 문제