per-tool job resource defaults

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

per-tool job resource defaults

Stephan Oepen
dear colleagues,

at the university of oslo, we develop a galaxy-based portal for
natural language processing (LAP: Language Analysis Portal).  jobs are
submitted to a compute cluster via DRMAA and SLURM.  current
development is against the galaxy release of march 2015.

i am wondering about fine-grained control of job resources.  our goal
is that most users need not look past the ‘Use default job resource
parameters’ toggle in the job configuration dialogue.

as i understand it, i think we can populate the ‘nativeSpecification’
parameter in ‘job_conf.xml’ with SLURM-specific command-line options
to set defaults, for example the project, maximum run-time, number of
cores, memory per core, and such.  i assume these defaults will be
combined with and overwritten by ‘custom’ job resource parameters, in
case any are specified in the job configuration dialogue?

i tried to track the flow of information from
‘lib/galaxy/jobs/runners/drmaa.py’ via
‘scripts/drmaa_external_runner.py’ into the drmaa-python egg, but i
could not easily work out where the merging of ‘nativeSpecification’
and custom resource parameters happens; presumably at about the same
time as an actual job script file is created, for submission to SLURM?
 could someone point me in the right direction here?

—more importantly, maybe: we would like to establish per-tool resource
defaults.  for example, some of our tools require substantially more
memory per core than others.  i cannot easily find a way of
associating resource default with individual tools.  i looked at the
tool configuration syntax, ‘job_conf.xml.sample_advanced’, and
‘job_resource_params_conf.xml.sample’, as well as at the following
documentation pages:

  https://wiki.galaxyproject.org/Admin/Config/Jobs
  https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster

i am hoping i am overlooking something :-).  is there a way to define
job resource defaults on a per-tool basis?

with warmest thanks in advance, oe
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: per-tool job resource defaults

Gildas Le Corguillé

Hi Stephan,


I will just quickly answer to your last question. Because, I’m not sure to understand the first part of your message or take the time to :P

i am hoping i am overlooking something :-).  is there a way to define
job resource defaults on a per-tool basis?

Perhaps, I didn’t understand your message at all:


In your tool wrapper, you can use "\${GALAXY_SLOTS:-8}" to dynamically set the ressource according to the setting in the job_conf.xml.
Here by default, the job will take 8 CPU (personally, I find that it’s a trap when the administrator/me miss this default value, I prefer to set the default value to 1)

<tool id="my_amazing_wrapper" name="My Amazing" >

<command>

     my_amazing_tool -query "$query" […] -num_threads "\${GALAXY_SLOTS:-8}" […]

</command>



In your job_conf.xml, you can set per tool a destination. Thus, you can specify the number of CPU/Slot, the memory needed, the queue, the nodes ...

<destinations  default="sge_default">

     <destination id="thread4-men_free10" runner="sge">

           <param id="nativeSpecification">-V -w n -q galaxy.q -R y -pe thread 4 -l mem_free=10G </param>

     </destination>

</destinations>

<tools>

     <tool id="my_amazing_wrapper" destination="thread4-men_free10"/>

</tools>



I hope it will help you

Cheers



Gildas

-----------------------------------------------------------------
Gildas Le Corguillé - Bioinformatician/Bioanalyste

Plateform ABiMS (Analyses and Bioinformatics for Marine Science)
http://abims.sb-roscoff.fr

Member of the Workflow4Metabolomics project
http://workflow4metabolomics.org

Station Biologique de Roscoff - UPMC/CNRS - FR2424
Place Georges Teissier 29680 Roscoff FRANCE
tel: +33 2 98 29 23 81
------------------------------------------------------------------



Le 12 mars 2016 à 10:36, Stephan Oepen <[hidden email]> a écrit :

dear colleagues,

at the university of oslo, we develop a galaxy-based portal for
natural language processing (LAP: Language Analysis Portal).  jobs are
submitted to a compute cluster via DRMAA and SLURM.  current
development is against the galaxy release of march 2015.

i am wondering about fine-grained control of job resources.  our goal
is that most users need not look past the ‘Use default job resource
parameters’ toggle in the job configuration dialogue.

as i understand it, i think we can populate the ‘nativeSpecification’
parameter in ‘job_conf.xml’ with SLURM-specific command-line options
to set defaults, for example the project, maximum run-time, number of
cores, memory per core, and such.  i assume these defaults will be
combined with and overwritten by ‘custom’ job resource parameters, in
case any are specified in the job configuration dialogue?

i tried to track the flow of information from
‘lib/galaxy/jobs/runners/drmaa.py’ via
‘scripts/drmaa_external_runner.py’ into the drmaa-python egg, but i
could not easily work out where the merging of ‘nativeSpecification’
and custom resource parameters happens; presumably at about the same
time as an actual job script file is created, for submission to SLURM?
could someone point me in the right direction here?

—more importantly, maybe: we would like to establish per-tool resource
defaults.  for example, some of our tools require substantially more
memory per core than others.  i cannot easily find a way of
associating resource default with individual tools.  i looked at the
tool configuration syntax, ‘job_conf.xml.sample_advanced’, and
‘job_resource_params_conf.xml.sample’, as well as at the following
documentation pages:

 https://wiki.galaxyproject.org/Admin/Config/Jobs
 https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster

i am hoping i am overlooking something :-).  is there a way to define
job resource defaults on a per-tool basis?

with warmest thanks in advance, oe
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: per-tool job resource defaults

Stephan Oepen
many thanks for taking the time to answer my query, gildas!

> In your job_conf.xml, you can set per tool a destination.

i had realized that much (sending some of our tools to SLURM, running
others on the local node), but i had failed to realize that one can of
course have /multiple/ SLURM destinations, which all send to the same
cluster but differ in their default resource parameters.

thanks again, oe
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: per-tool job resource defaults

Stephan Oepen
hallo again, fellow galaxy users and developers,

as an extension to my original query, i am now wondering how the
parameters in ‘job_resource_params_conf.xml’ map onto SLURM options?
for example, i assume <param ... name="processors" ...> maps onto
something like ‘--ntasks’ (or maybe ‘--ntasks-per-node’).  are the
‘name’ values in the definition of job resource parameters standard
keys defined for DRMAA, and drmaa-python know how to map these into
SLURM parameters?  or is there an explicit specification of that
mapping somewhere?

we have succeeded in establishing per-tool defaults by putting these
into the ‘nativeSpecifiation’ of multiple variants of the DRMAA
destination.  but now we would also like to customize the valid range
and initial value that is presented to users when they decide to use
the ‘custom’ job resource form in the tool configuration dialogue.  in
other words, we would like to do something like the following in
‘job_resource_params_conf.xml’

  <param label="Memory" name="memory1" type="integer" size="2" min="1"
max="16" value="1" ... />
  <param label="Memory" name="memory4" type="integer" size="2" min="4"
max="24" value="4" ... />
  <param label="Memory" name="memory6" type="integer" size="2" min="6"
max="24" value="6" ... />

and then associate a specific memory parameter with individual tools
in ‘job_conf.xml’.  but for that to work, i would have to understand
the mapping to SLURM options and make it so that ‘memory1’ to
‘memory6’ all map to ‘--mem’ (or maybe ‘--mme-per-cpu’).

once i understand things better, i would of course be happy to
contribute a summary for the galaxy wiki.  for all i can see, current
documentation does not cover job configuration and job resources in
full detail.

with thanks in advance, oe


On Sun, Mar 13, 2016 at 2:31 PM, Stephan Oepen <[hidden email]> wrote:

> many thanks for taking the time to answer my query, gildas!
>
>> In your job_conf.xml, you can set per tool a destination.
>
> i had realized that much (sending some of our tools to SLURM, running
> others on the local node), but i had failed to realize that one can of
> course have /multiple/ SLURM destinations, which all send to the same
> cluster but differ in their default resource parameters.
>
> thanks again, oe
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: per-tool job resource defaults

John Chilton-4
This mapping is not automatic - you need to write a small Python
method to take these parameters specified by the user and map them to
your cluster parameters. These methods are called dynamic job
destinations and described on the wiki at:

https://wiki.galaxyproject.org/Admin/Config/Jobs#Dynamic_Destination_Mapping

If your method takes in a function keyword argument called
"resource_params", Galaxy will build a dictionary from the user
supplied parameters and send them to your function. So in your case
{"memory1": 300} or something like that - and the method should build
a destination with a native specification that uses this information.

Hope this helps.

-John


On Fri, Mar 18, 2016 at 5:45 PM, Stephan Oepen <[hidden email]> wrote:

> hallo again, fellow galaxy users and developers,
>
> as an extension to my original query, i am now wondering how the
> parameters in ‘job_resource_params_conf.xml’ map onto SLURM options?
> for example, i assume <param ... name="processors" ...> maps onto
> something like ‘--ntasks’ (or maybe ‘--ntasks-per-node’).  are the
> ‘name’ values in the definition of job resource parameters standard
> keys defined for DRMAA, and drmaa-python know how to map these into
> SLURM parameters?  or is there an explicit specification of that
> mapping somewhere?
>
> we have succeeded in establishing per-tool defaults by putting these
> into the ‘nativeSpecifiation’ of multiple variants of the DRMAA
> destination.  but now we would also like to customize the valid range
> and initial value that is presented to users when they decide to use
> the ‘custom’ job resource form in the tool configuration dialogue.  in
> other words, we would like to do something like the following in
> ‘job_resource_params_conf.xml’
>
>   <param label="Memory" name="memory1" type="integer" size="2" min="1"
> max="16" value="1" ... />
>   <param label="Memory" name="memory4" type="integer" size="2" min="4"
> max="24" value="4" ... />
>   <param label="Memory" name="memory6" type="integer" size="2" min="6"
> max="24" value="6" ... />
>
> and then associate a specific memory parameter with individual tools
> in ‘job_conf.xml’.  but for that to work, i would have to understand
> the mapping to SLURM options and make it so that ‘memory1’ to
> ‘memory6’ all map to ‘--mem’ (or maybe ‘--mme-per-cpu’).
>
> once i understand things better, i would of course be happy to
> contribute a summary for the galaxy wiki.  for all i can see, current
> documentation does not cover job configuration and job resources in
> full detail.
>
> with thanks in advance, oe
>
>
> On Sun, Mar 13, 2016 at 2:31 PM, Stephan Oepen <[hidden email]> wrote:
>> many thanks for taking the time to answer my query, gildas!
>>
>>> In your job_conf.xml, you can set per tool a destination.
>>
>> i had realized that much (sending some of our tools to SLURM, running
>> others on the local node), but i had failed to realize that one can of
>> course have /multiple/ SLURM destinations, which all send to the same
>> cluster but differ in their default resource parameters.
>>
>> thanks again, oe
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/