MP/MPI IAG machines git HTML codes

general | OpenMP | OpenMPI |

general

loop unwinding, OpenMPI project
  • loop unrolling -> less loop-overhead
  • Under the following two conditions:
    • swap loops to have the longest one vectorized (larger vector length)
    • innermost loop over the first index
    The result is: the longest one has to be the first index, and multiple loop procedures have to be with the innermost loop on the first index.

OpenMP

OpenMP is parallel processing with shared memory.

OpenMPI

uv100 | alphacrucis: connect; usage; queuing jobs; submission script; resources requested; |

uv100

64 cores and 512 GB RAM

alphacrucis

connect

As with any other Linux machine, you can connect with alphacrucis with either one of the two commands:
$ ssh username@alphacrucis.iag.usp.br
$ ssh username@10.180.0.63
For security reasons, the machine does not allow you to connect from outside IAG. Therefore, if you are e.g. at home, you have to connect gina.iag.usp.br first and then execute one of the previous commands.

Before you start working in alphacrucis, you should better check your ~/.bashrc file, to set the path of the desired fortran compiler. I suspect that, if at the time of applying for an account you checked on that you will be using fortran, your ~/.bashrc muist have been filled with all the available fortran compilers and all you have to do is to comment off the lines corresponding to the desired compiler, and comment on the lines corresponding to the rest of the compilers.

usage

In general in big clusters like alphacrucis, you should better not simply run/execute programs. All time-consuming commands should be "submitted" to the cluster system, ask for permission to run, and wait for the cluster resources to be freed in order to start your job. This thing is called queuing, i.e. you submit a request to use some resources, and your request is run when the system is ready.

You may perform some tests before continuing. To this purpose, create a file machines, in order to list the nodes used by MPI. For example, the following machines file defines the execution of a program in 4 nodes and 96 cores, as each node contains 24 cores (processors).

r1i0n0
r1i0n1
r1i0n2
r1i0n3 
For a run on more nodes, you simply add more lines/nodes in the above file
r1i0n0
r1i0n1
r1i0n2
 ...
r1i0n15
This uses an IRU (building block of nodes) of 16 nodes. We have 4 IRUs in tower 1 and 2 IRUs in tower 2. In the next IRU, the nodes start at
r1i1n0
 ...
r1i1n15

queuing jobs

You can run the parallel version simply as:
$ mpirun -np 8 ./hdustparv2.02.bc input = hdust_bestar2.02.inp
But it is recommended that even for a test, you should better queue the job. In particular, you should use the queuing system Torque/Maui to submit your jobs. In that case, you do not need to use the file machines mentioned above, and instead define the number of nodes and processors you need directly in the submission script. A sample for submission script is runs/hdust/sample.job (see below). You submit the job with the command:
$ qsub sample.job
In other systems (e.g. the UWO cluster) the submission occurs with the command:
$ sqsub -q mpi -n 64 -r 10h -o outfile
With qstat you can see the job ID in the first column, which will be something like [6-digit number].alphacrucis, if not defined elsehow in the #PBS-type commands at the beginning of the submission script.
$ qstat # see status of your jobs
$ showq # see all running jobs in the cluster
$ psall # see the CPUs running for you right now (IAG alias)
You can kill this job with the command:
$ qdel [job ID]
You can see the situation of the cluster in ganglia. You can get help from the following Wikis: Gina, emu2009 (invalid link), LAi (Laboratório de Astroinformática).

submission script

A request is submitted via a submission script (usually with an extension .job), as the following:

#PBS -S /bin/bash

#PBS -V
#PBS -N hdust_d                       # name fo the program
#PBS -l nodes=128,walltime=36:00:00   # number of processors, max time
#PBS -o output_${PBS_JOBID}           # log file
#PBS -e error_${PBS_JOBID}            # error file
#PBS -m e
#PBS -M panoglou@on.br                # your email

  MASTERFILE=hdust_bestar2.02.inp     # the master file
  HDUST=./hdustparv2.02.bc            # the executable

#### NO NEED DO MODIFY BEYOND THIS POINT
NSLOTS=`cat $PBS_NODEFILE | wc -l` 

echo "---------------------------------" 
echo "Running MPI HDUST on" $NSLOTS "cores" 
echo "Executable: " $HDUST
echo "Master input file: " $MASTERFILE
echo "---------------------------------" 

cd ${PBS_O_WORKDIR}
START=$(date +%s.%N)
mpirun -n $NSLOTS --mca btl_tcp_if_exclude ib1 -machinefile $PBS_NODEFILE $HDUST file=$MASTERFILE
END=$(date +%s.%N)

DIFF=$(echo "$END - $START" | bc) 
MIN=`echo "$DIFF*0.0166667" | bc` 

echo "--------------------" 
echo "Finished MPI HDUST  " 
echo Exectution time: $MIN minutes "($DIFF seconds)"
echo "--------------------" 
The lines starting with #PBS are directives to the cluster (see relevant wiki).

You can define the working directory; if omitted, the default is where we are when running qsub. The working directory is where qsub searches for the executables and input files.

#PBS -d /sto/home/despo/distribution_current/runs/hdust/

resources requested

The most important is the directive for the cluster resources requested.
#PBS -l nodes=128,walltime=36:00:00
With the above line, we request the use of 128 nodes for 36 hours. This is important, because your submission script will be killed after 36 hours, so if your program has not been completed yet, it will be interrupted. In general, the more the nodes the less the time, while you should better enter a longer time than you expect (otherwise, if the execution needs longer time than you had thought, you will need to resubmit with the time revised - to a longer period). But again, if you request a lot of nodes for a long time, your submission script might wait for longer until it finally starts, as the resources requested are high, and this means you will have those resources tied in time that other people might need them too.

A rough estimation of the time to be requested, could be achived as follows. Let's say we request Nc processors. Check how long the program runs in your PC, say tp. Assuming that the processors of the cluster and the processor in your PC are of similar capabilities, and also that the case that you want to run in the cluster is similar to the case that you run in your PC (having Np threads) is needed in your PC. Then, the time to be requested should be of the order tc = tp Nc / Np, provided that the main part of the code can be executed in parallel (e.g. as is the case when you calculate the trajectories of photons: the calculation of each trajectory can be executed in parallel with the calculation of the trajectory of every other photon - if each trajectory does not affect the other trajectories)

#PBS -l nodes=128           # number of nodes
#PBS -l ppn=2               # number of CPUs per node (total CPUs=nodes*ppn)
#PBS -l walltime=36:00:00   # max time

To get an idea, a run of HDUST with the use of 48 processors takes ~1-2 hours.

You might also run a serial program in the cluster. In this case you should set the number of nodes to 1, and the walltime should be set to the maximum preferably.

preliminaries: preparation; terminology; general information; | creation of a repository: in bitbucket; local repository; multiple repositories; | commits: commit ID; selection of files; comments; | branches: branching; versions; | replication: evaluation of methods; add; clone; example; |

preliminaries

preparation

To install git in a linux machine:
$ sudo apt-get install git-core
With the following commands some info will be added to your git configuration file (~/.gitconfig):
$ git config --global user.name "Despo Panoglou"
$ git config --global user.email "panoglou@on.br"
You can override this data for a specific project, running the same commands but without the --global option. The changes will be recorded in .gitconfig, where .git is a directory located in the project's folder.

With

$ man git config
you access a manual with all the options.

terminology

  • We modify our code, add/delete files, change the subdirectory structure etc. From time to time (usually after we are sure that the changes we made work alright), we want to register those changes. then we make a commit. Finally, we push those registered changes in the main repository. Every time we push, we actually send all the commits we made since we last pushed.
  • In cases we have two different versions or branches of a project, we often want to merge the two into one. Then we say we merge.

    A case in which we might need to merge is the following: Person A makes some changes in a code, person B makes some other changes. So now we want everything merged together in a single code.

    In some cases, there might exist some conflicts, e.g. if a specific piece of one file was modified by both people. Then git could not possibly know which change to select and put forward. In those cases there will be a warning message, and the person who attempts to merge the two versions has to select one of the two for this exact point of the code.

    In some other cases our project will not work well, but we will not be able to know this until we run the ececutable. Those cases are far more tricky, but also are those cases that programmers very often encounter, when they work alone on one code: You make a change and then everything goes wrong! But git cannot possibly foresee this kind of problem, and therefore after every merge, you have to attempt a run and check it out.

  • Branches are the different routes an initial version of a project might take. Whenever you create a branch, it is identical to the parent branch (meaning the branch that was being used at the time of creation of the new branch), i.e. whatever exists in the parent branch will be present in the new branch. This situation occurs when we want to start from a point where our code works in a stable way, and we want to test or experiment on something, without harming the parent initial version.

general information

To check the commit history of the project, including the commit ID, the name of the person who made each commit, the date and the commit message, you have to type git log. This will print a list of the commit history - by pressing "space" you can reach the beginning of the git history. Here we show an example of the last three commits:
$ git log
commit 1710b030a285861f4ec4a8b2f487fef5599c2f08
Author: Despo Panoglou <panoglou@on.br>
Date:   Mon Feb 10 18:05:15 2014 +0200

    updated the HDUST manual

     * added a section of the list of runs, describing the sample master file
       step by step
     * added a section on the SIMULATION section, with information of the
       different modes (STEP's)
     * some additional minor changes here and there

commit 633c60fd49c84c4e069b87eca3e925743e4356b7
Author: Despo Panoglou <panoglou@on.br>
Date:   Mon Feb 10 01:23:32 2014 +0200

    added a first manual for HDUST

commit cb9e1f9b2409e641d4a7c571230e0f33c5b8fa47
Author: Dan Moser <dmfaes@gmail.com>
Date:   Thu Feb 6 17:39:32 2014 -0200

    Primeiro commit de Moser
If you want to see the changes between the last commit by me and the commit of Daniel (see notes on commit ID), you have to type:
git diff 1710b030a2..cb9e1f9b

When the changes are made on text files, it's much easier to compare, whereas when the changes are made on binary files (such as pdf), there's not much to understand (at least for me). That's why git is mainly used for programming codes.

Other commands which provide information:
$ git log --stat   # shows the commit history (lines changed in each file)
$ git log foo.txt  # shows the commit history of file "foo.txt"
$ git status       # shows whether there are changes to be commited
$ git diff         # see what is there to be commited (differencies from last commit)

creation of a repository

in bitbucket

To create a new repository in bitbucket
  1. Log in in bitbucket
  2. At the top of the page, click on Create (a new repository.)
  3. See here for how to replicate and process a project at this new repository.
Please, DO NOT do in bitbucket anything else than view. DO NOT upload into or download things from bitbucket. In general DO NOT use the visual interface of bitbucket. Use only the command line. When our repository here is ready, we will certainly need to do everything from the command line. So you should better learn the real commands and NOT bitbucket's interface.

What you expect to do by using the upload button of bitbucket, you should do with add+commit+push. Whatever you expect to accomplish by using the upload button of bitbucket, you should do with git pull. It will be quicker in the first place (ignoring the fact that you will have to enter your password). Add information on how you can save the password, so as to get over this delay.

local repository

To create a new repository in your own computer:
$ REPOS=/jungle/Backup/repos
$ SRC=/home/despo/Documents/programs/tests
$ SRCGIT=tests.git
$ cd $REPOS
$ git init --bare $SRCGIT
$ cd $SRC
$ git init
$ git add .
$ git commit -m "my first comment: commit" .
$ git remote add origin $REPOS/$SRCGIT
$ git push -u origin master

multiple repositories

$ cd $SRC
$ git init
$ git remote add origin https://desprh@bitbucket.org/git_learn/git_tut.git
$ git add .
$ git commit -m "creation of repo"
$ git push -u origin --all
$ mkdir $REPOS
$ cd $REPOS
$ git init --bare git_tut.git
$ git remote add bitbucket /jungle/Backup/repos/git_tut.git
$ git push bitbucket master
    

commits

commit ID

The full ID of a commit is a 40-character string. As git makes sure that the first few characters of the full ID is a unique sequence among the distinct versions of the same repository, you can refer to each commit with its ID's first few characters (usually 6 characters are enough).

selection of files

After you make some changes in any of the files within the project directory, you can give the following commands:
$ git add .
$ git commit .
The dot (.) in the end of both commands means add+commit of all changes encountered in current directory. If you don't want to add+commit the changes in all files, but only the changes in file foo.txt, then the above commands have to be given as:
$ git add foo.txt
$ git commit foo.txt

comments

Let's explore the command:
$ git commit -m "first commit" .
If you omit the "-m message" option, then by running the commit command, it will prompt for a message. Instead of the short form allowed in the command line, now you will be allowed to give a short message of 50 characters, then leave an empty line, and finally write a more detailed description of the changes incorporated in this commit (as in the first commit message of the sample that is shown in § general information).

branches

branching

Imagine that you have a 1D code and are about to make changes in order to add a 2D option.
$ git branch          # list existing branches (* points to current/active branch)
  master
* onedim
$ git branch twodim   # create a new branch initially identical to onedim
$ git checkout twodim # switch to (enter) the branch "twodim"
$ git branch          # list existing branches (* points to current/active branch)
  master
  onedim
* twodim
$ vi Makefile         # modify some file(s)
If you now go back to the master branch or the onedim branch, with
$ git checkout {master|onedim}
you will see that none of the changes you made in Makefile can be found there. To add to the original repository the branch you created in your local directory:
$ git push --set-upstream origin twodim
Your current branch is twodim. You can merge it to onedim by
$ git merge onedim
If there is any conflict in merging the two branches, it will return a relevant message. You can see the conflicts by:
$ git diff
Remember that the same command can be given before any commit, in order to see what there is to be commited, i.e. the differencies from last commit (in current branch; see § general information).

If you want to delete a branch:

$ git branch -d onedim   # keeping it in the history tree
$ git branch -D onedim   # removing it permanently from the history tree
If you get stuck with the conflicts and want to start over:
$ git reset --hard HEAD      # in case you have not yet commited the merge
$ git reset --hard ORIG_HEAD # in case you have already commited the merge
You can check the differencies between two branches with
$ git diff onedim..twodim

versions

Let's say we have a big project that has been being developed for years, and there are more than one versions of it, say version 1 and version 2, which are located in directories Dv1 and Dv2, respectively. I will create a short shell script (no need for a long one) on how to create a git repository that will include the two versions (if you copy it, make sure to correct the directory names; it is recommended that you type the commands one by one, though). If you set Dv1 and Dv2 as shown in the first two lines, the rest of the lines may be copied as they are. You should check the path names, while you might also want to change the path/name of the repository directory (TARGET).

Each directory may be accessed both locally and internally, since we'll use the command rsync, which may contact remote systems via a remote shell program (e.g. ssh) or through contacting an rsync deamon directly via TCP. We will suppose that Dv1 is in alphacrucis, and that Dv2 is in my home directory:

Dv1=despo@alphacrucis.iag.usp.br:/sto/home/despo/HDUSTv1
Dv2=$HOME/Projects/HDUSTv2
TARGET=$HOME/repos/HDUSTworking
rsync -avz $Dv1 $TARGET
cd $TARGET
git init
git add .
git commit -m "version 1" .
The execution of the last command will output a few lines that describe the differences that were commited.
commit 1fd89e9c014ea69982ee2f941d55066c3d58b32f
Author: Despo Panoglou <panoglou@on.br>
Date:   Thu Feb 6 15:07:20 2014 +0200

    version 1
It's you first commit, no changes on files, therefore no more information are printed out. You might also see something as simple like this.
 [master 1fd89e9] version 1
   
Which form of message you will see depends on the configuration of git. In any case, the first of these line contains a 40-character string, which is the full ID of the commit, whereas only the first part of it (1fd89e) is reported in the short version. But we don't want to have to remember this strange number, even if it is as short as 6 characters long. We can tag this version of the code with the following command:
$ git tag -a "HDUSTv1" 1fd89e9c014ea69982ee2f941d55066c3d58b32f
so that now we can refer to it just by HDUSTv1. Although just by tagging each version we make it possible to turn back to it at any time, for various reasons it is good that we also make a branch for version 1.
$ git branch version1
As we are still in the beginning, we are in the master branch, and by the last command we created another branch called version1.

Now we have to enter version 2 in the repository.

$ rsync -avz --delete  --exclude '*.git*' $Dv2 $TARGET
Take caution to not add any slashes in the end of the directory names (this has to do with how the rsync command works). This command should substitute everything in current directory with the contents of our local copy of version 2, deleting everything from the old version that does not exist in the new version (except from the *.git* files, otherwise it would delete all previous information about the repository, i.e. it would forget all about version 1, including the previous logs).
$ git add .
$ git commit -m "version 2" .

[master 1cf4231] version 2
 39 files changed, 39 insertions(+), 88 deletions(-)
 mode change 100644 => 100755 README
  [. . .]
We tag this commit and branch it as in the case of version 1:
$ git tag -a "HDUSTv2" 1cf4231
$ git branch version2
Say we don't yet have a version 3 of the code. In case we do have an intermediate version in some other local directory $Dv3, we may copy it as before.
$ rsync -avz --delete  --exclude '*.git*' $Dv3 $TARGET
Now we can add+commit, but there's no need to tag+branch, as we're still working on it. We make some changes, add+commit and so on, until we do have a stable version 3, and that's when we tag+branch.

At any time we can switch to previously created branches, e.g.

$ git checkout version1
and then we can return to our master branch:
$ git checkout master
At any time we can see the differences between two tags, branches or commits, e.g.
$ git diff 1cf4231..1fd89e9
$ git diff version1..master
$ git diff HDUSTv2..HDUSTv1

replication

evaluation of methods

There are ways to copy/replicate a git repository. The easiest and simplest is "cloning". "Adding" is just referred to for completeness. I am not aware of any real differences between the two ways. But I say that cloning is simpler for the following (minor) reasons:
  • In "adding", you have to create and enter the new directory, while in "cloning" the directory is created at the time of "cloning".
  • In "cloning" defaults are automatically assumed, and you do not have to configure anything (except if you want to change the defaults), while in "adding" you have to spend some time configuring (although not so long), after you have just added the repository.

add

  1. Go to the directory where you wish to work on the project:
    $ mkdir test
    $ cd test
    $ git init
    $ git remote add origin https://desprh@bitbucket.org/desprh/test.git
    $ git pull https://desprh@bitbucket.org/desprh/test
  2. Every time you have made some changes (e.g. create/modify/delete a file), you'd better add those changes and commit them to the current branch (master). You can add+commit as often as you want.
  3. After one or more commits, you may push to the original repository. You can push as often as you want.
    $ git push --set-upstream origin master
    After the first time you give that command (where you have defined the upstream, i.e. where you are psuhing to by default), you can simply push with
    $ git push
    Note: Instead of setting the upstream in the push command as above, you can simply set the upstream with the command
    $ git config --global push.default simple

clone

To clone an existing repository (that was created e.g. in bitbucket as explained in § in bitbucket):
git clone https://desprh@bitbucket.org/desprh/test.git
The name of the directory that will be created will be test
$ cd test
$ git add .
$ git commit -m "first commit" .
$ git push
When you have cloned, by default it pushes directly to the master branch of the repository.

You go home and want to update what is done. Provided that in the past you have cloned the repository, you receive the changes since last commit by:

$ git pull
Remember that pull=fetch+merge, i.e. git goes to the repository, fetches the changes, and merges them with your current version. pull always merges with your current branch.

example

. The complete configuration created during cloning is shown with
$ git config -l

general | HTML | CSS | XML | XSLT | XSD | XPath | XForms | XQuery |

general

  • HTML is NOT case-sensitive: One can write <P>[...]</p>
  • XML and XHTML are case-sensitive: One CANNOT write <P>[...]</p>

HTML

  • In order to have bookmarks at certain points of a web page, you have to use the
        <a name="[...]">[...]</a>
    construct. Then you can refer to them by a link to the web page such as
        [web page]#[bookmark]
  • execute shell commands from an HTML file
        <!--#include virtual=[file.html]-->
  • If you want to transfer data from a form to a script, but want to avoid modification of this data from the user, the best way to achieve that is the hidden elements:
        <input type='hidden' [...]/>
  • A PHP code piece can be included in an HTML document with the following tag:
        <?php echo '<p>Hello World</p>'</?php>

CSS

CSS: Cascading Style Sheet. Can be used to describe a form's appearance

If you want a new style, you can create it based on another style, defining it in the head part as

[oldstyle].[newstyle] {[changes in comparison to [oldstyle]]} 
The new style can be called as
<[oldstyle] class="[newstyle]">

XML

Check out: XML::Synchrotron, XML developer toolbar
  • A root element is OBLIGATORY!
  • tag names:
    • They have to start with letters or "_".
    • After the first character, numbers, hyphens and periods are allowed.
    • They cannot start with "xml", where x,m,l are in lower- or upper-case.
  • Attributes:
    • They have to be followed by "=" plus a value enclosed in quotes.
    • Their names have to follow the same rules as for the names names.
  • The first line of an XML document needs to be
    <?xml version="1.0" encoding="utf-8" standalone='yes'?>
    If standalone='no' is given, then the XML document needs also a DOCTYPE declaration to be validated in W3C.
trang -I rnc -O xsd -i [input parameters] -o [output parameters] \
	[input file] [output file]

XSLT

XSLT is a declarative programming language. Often the output is XML or HTML, but in general XSLT can be used to produce arbitrary output from any XML source. E.g it can be used to restructure an XML document to conform to another schema.

To process an XML file with an XSLT file and output an HTML file, one has to install saxonb and type:

saxonb-xslt -s:myCloud.in.xml -xsl:nstan.xslt -o:myCloud.html
saxonb-xslt -s:styles/videos.xml -xsl:styles/videos.xslt -o:videos.html
fS: Try with the -o option, if you want to print to STDOUT

Data given in a form by a user can be validated against an XML schema. During this procedure, it can be made possible to disable some of the data contained in the XML file, to output values calculated from the form data etc.

XSD

  • To validate an XML document against an XSD schema:
    xmllint --schema styles/list.xsd --postvalid --noout videos.xml
  • Στο sequence μπορείς να κάνεις mix από elements και groups.
  • To include an external schema with declarations, include the line
    <include schemaLocation='general.xsd'/>
  • According to MSDN Data Development Center, The max occurs of all the particles in the particles of an all group must be 0 or 1". So it cannot be given the "unbounded" value. Therefore all has to be turned to a sequence, if you want to attach the maxOccurs='unbounded' attribute to an inner element.
  • In case you need some environment with the properties of a sequence, but you don't need the inner staff to really be in an specific order, then you have to do it indirectly (μπακάλικα!), as explained in the ZVON tutorials.
the parent element being: allowable elements
element annotation, simpleType, complexType, unique, key, keyref
sequence annotation, element, group, choice, sequence, any
element attribute what it means
default this value is implied when the element is not included in the XML file, so that the validation with the XSD does not fail

XPath

  • The XML Path language (XPath) allows the developer to select specific parts of an XML document. It gives the elements' path tree.
  • XPath was designed specifically for being used in combination with XSLT and XPointer (?). Lately also XForms makes use of XPath.

XForms

XForms (-> JS, AJAX) can be viewed via:
  • native browser implementation (in firefox)
  • browser plug-ins (in IE)
  • JS implementations
Pure server implementations have been developed in order to use XForms without the need of a browser:
  • They translate XForms into a plain HTML file, and keep all the logic in the server side.
  • They are relatively simple to implement.
  • Each user action requires full client/server interaction.

XQuery

  • XQuery can use the SQL (Structured Query Language).
  • XQuery cannot modify or delete data from an XML file, and neither can it add new data.
  • In SQL Server 2005, the XML DML (Data Modification Language) can be used; it provides the following functions: delete, insert, replace value of.

mamoj: v0; |

mamoj

v0

species.in.save is the standard species.in file. We included SiO* so the elemental abundances of Si and O are slightly different from the standard ones. If the line of SiO* is removed the elemental abundances become the standard ones.

chemistry.in.save is the standard chemistry file, after CO photodissociation was crudely included, as well as three SiO* reactions (actually copied from the corresponding reactions for H2O*).

input_mhd.in.save is the standard file that is used for the steady state runs. After you have run the steady state run, you have to copy species.out to species.in. On the first line of species.out the final temperature and final drift speed of the steady state run are written. You have to set these values to input_mhd.in, and after changing S shock type to C, you can run the program.

wind.in.save includes the parameters for the standard run at 1e-6 solar masses per year and anchor point at 1 AU.

The executable is (run with the command):
./mamoj
(I believe) changes that have to be made in the code. You will indeed need to make some changes in the source, and one that definitely needs to be done is in subroutine GET_MTRCS (wind.f90). In particular, changes have to be done in the parameters that are declared in the beginning and include SQRT's.
top of the page