wrapper for tool with multiple outputs

classic Classic list List threaded Threaded
5 messages Options
| Threaded
Open this post in threaded view
|

wrapper for tool with multiple outputs

Jochen
Hi,

I have a tool that produces multiple output files a log file, two bam
files. (https://github.com/nugentechnologies/nudup)

The tool it self provides an option called --out to specify a path to a
directory with a prefix that will be added to the output files:

--out /tmp/out

this will produce 3 files:

/tmp/out_dup_log.txt
/tmp/out.sorted.dedup.bam
/tmp/out.sorted.markdup.bam

so my question is if this out prefix will give me problems overwriting
next output files coming from this tool?


Cheers Jochen



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|

Re: wrapper for tool with multiple outputs

Devon Ryan
Hi Jochen,

Don't have it use /tmp, but rather the current working directory and
then everything will work. Galaxy jobs are run in individual working
directories, so you can exploit that to ensure that files aren't
overwritten.

Devon
--
Devon Ryan, Ph.D.
Email: [hidden email]
Data Manager/Bioinformatician
Max Planck Institute of Immunobiology and Epigenetics
Stübeweg 51
79108 Freiburg
Germany


On Wed, Jan 4, 2017 at 12:32 PM, Jochen Bick <[hidden email]> wrote:

> Hi,
>
> I have a tool that produces multiple output files a log file, two bam files.
> (https://github.com/nugentechnologies/nudup)
>
> The tool it self provides an option called --out to specify a path to a
> directory with a prefix that will be added to the output files:
>
> --out /tmp/out
>
> this will produce 3 files:
>
> /tmp/out_dup_log.txt
> /tmp/out.sorted.dedup.bam
> /tmp/out.sorted.markdup.bam
>
> so my question is if this out prefix will give me problems overwriting next
> output files coming from this tool?
>
>
> Cheers Jochen
>
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|

Re: wrapper for tool with multiple outputs

Jochen

Thanks Devan,

I just changed it! I have an other question. I used this to catch the output files:

  <outputs>
      <data format="bam" name="bam_file1" label="${tool.name} on ${on_string}: sorted.dedup.bam">
        <discover_datasets pattern="out.sorted.dedup.bam" format="bam" visible="true" /> <!-- can use ext or format attribute. -->
      </data>
      <data format="bam" name="bam_file2" label="${tool.name} on ${on_string}: sorted.markdup.bam">
        <discover_datasets pattern="out.sorted.markdup.bam" format="bam" visible="true" /> <!-- can use ext or format attribute. -->
      </data>
      <data format="txt" name="logfile" label="${tool.name} on ${on_string}: log">
        <discover_datasets pattern="out_dup_log.txt" format="txt" visible="true" />
      </data>
    </outputs>


my problem is that I first get an empty file in the history and after some moments I also get 3 more history icons containing the actual information:

304 NuGen nudup on data 266 and data 278: sorted.markdup.bam
empty
308 NuGen nudup on data 266 and data 278: sorted.markdup.bam (None)
506.3 MB

with an additional (None) in the history name? Any ideas?

Cheers Jochen


On 04.01.2017 13:10, Devon Ryan wrote:
Hi Jochen,

Don't have it use /tmp, but rather the current working directory and
then everything will work. Galaxy jobs are run in individual working
directories, so you can exploit that to ensure that files aren't
overwritten.

Devon
--
Devon Ryan, Ph.D.
Email: [hidden email]
Data Manager/Bioinformatician
Max Planck Institute of Immunobiology and Epigenetics
Stübeweg 51
79108 Freiburg
Germany


On Wed, Jan 4, 2017 at 12:32 PM, Jochen Bick [hidden email] wrote:
Hi,

I have a tool that produces multiple output files a log file, two bam files.
(https://github.com/nugentechnologies/nudup)

The tool it self provides an option called --out to specify a path to a
directory with a prefix that will be added to the output files:

--out /tmp/out

this will produce 3 files:

/tmp/out_dup_log.txt
/tmp/out.sorted.dedup.bam
/tmp/out.sorted.markdup.bam

so my question is if this out prefix will give me problems overwriting next
output files coming from this tool?


Cheers Jochen



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|

Re: wrapper for tool with multiple outputs

Devon Ryan
Try the following:

<outputs>
      <data format="bam" name="bam_file1" label="${tool.name} on
${on_string}: sorted.dedup.bam" from_work_dir="out.sorted.dedup.bam"
/>
      <data format="bam" name="bam_file2" label="${tool.name} on
${on_string}: sorted.markdup.bam"
from_work_dir="out.sorted.markdup.bam" />
      <data format="txt" name="logfile" label="${tool.name} on
${on_string}: log" from_work_dir="out_dup_log.txt" />
</outputs>

If you have it output to the working directory then that might solve
the problem.

Devon
--
Devon Ryan, Ph.D.
Email: [hidden email]
Data Manager/Bioinformatician
Max Planck Institute of Immunobiology and Epigenetics
Stübeweg 51
79108 Freiburg
Germany


On Wed, Jan 4, 2017 at 1:18 PM, Jochen Bick <[hidden email]> wrote:

> Thanks Devan,
>
> I just changed it! I have an other question. I used this to catch the output
> files:
>
>   <outputs>
>       <data format="bam" name="bam_file1" label="${tool.name} on
> ${on_string}: sorted.dedup.bam">
>         <discover_datasets pattern="out.sorted.dedup.bam" format="bam"
> visible="true" /> <!-- can use ext or format attribute. -->
>       </data>
>       <data format="bam" name="bam_file2" label="${tool.name} on
> ${on_string}: sorted.markdup.bam">
>         <discover_datasets pattern="out.sorted.markdup.bam" format="bam"
> visible="true" /> <!-- can use ext or format attribute. -->
>       </data>
>       <data format="txt" name="logfile" label="${tool.name} on ${on_string}:
> log">
>         <discover_datasets pattern="out_dup_log.txt" format="txt"
> visible="true" />
>       </data>
>     </outputs>
>
>
> my problem is that I first get an empty file in the history and after some
> moments I also get 3 more history icons containing the actual information:
>
> 304 NuGen nudup on data 266 and data 278: sorted.markdup.bam
> empty
> 308 NuGen nudup on data 266 and data 278: sorted.markdup.bam (None)
> 506.3 MB
>
> with an additional (None) in the history name? Any ideas?
>
> Cheers Jochen
>
>
> On 04.01.2017 13:10, Devon Ryan wrote:
>
> Hi Jochen,
>
> Don't have it use /tmp, but rather the current working directory and
> then everything will work. Galaxy jobs are run in individual working
> directories, so you can exploit that to ensure that files aren't
> overwritten.
>
> Devon
> --
> Devon Ryan, Ph.D.
> Email: [hidden email]
> Data Manager/Bioinformatician
> Max Planck Institute of Immunobiology and Epigenetics
> Stübeweg 51
> 79108 Freiburg
> Germany
>
>
> On Wed, Jan 4, 2017 at 12:32 PM, Jochen Bick <[hidden email]>
> wrote:
>
> Hi,
>
> I have a tool that produces multiple output files a log file, two bam files.
> (https://github.com/nugentechnologies/nudup)
>
> The tool it self provides an option called --out to specify a path to a
> directory with a prefix that will be added to the output files:
>
> --out /tmp/out
>
> this will produce 3 files:
>
> /tmp/out_dup_log.txt
> /tmp/out.sorted.dedup.bam
> /tmp/out.sorted.markdup.bam
>
> so my question is if this out prefix will give me problems overwriting next
> output files coming from this tool?
>
>
> Cheers Jochen
>
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/
>
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|

Re: wrapper for tool with multiple outputs

Jochen
Perfect this is working!

thanks Jochen


On 04.01.2017 13:27, Devon Ryan wrote:

> Try the following:
>
> <outputs>
>        <data format="bam" name="bam_file1" label="${tool.name} on
> ${on_string}: sorted.dedup.bam" from_work_dir="out.sorted.dedup.bam"
> />
>        <data format="bam" name="bam_file2" label="${tool.name} on
> ${on_string}: sorted.markdup.bam"
> from_work_dir="out.sorted.markdup.bam" />
>        <data format="txt" name="logfile" label="${tool.name} on
> ${on_string}: log" from_work_dir="out_dup_log.txt" />
> </outputs>
>
> If you have it output to the working directory then that might solve
> the problem.
>
> Devon
> --
> Devon Ryan, Ph.D.
> Email: [hidden email]
> Data Manager/Bioinformatician
> Max Planck Institute of Immunobiology and Epigenetics
> Stübeweg 51
> 79108 Freiburg
> Germany
>
>
> On Wed, Jan 4, 2017 at 1:18 PM, Jochen Bick <[hidden email]> wrote:
>> Thanks Devan,
>>
>> I just changed it! I have an other question. I used this to catch the output
>> files:
>>
>>    <outputs>
>>        <data format="bam" name="bam_file1" label="${tool.name} on
>> ${on_string}: sorted.dedup.bam">
>>          <discover_datasets pattern="out.sorted.dedup.bam" format="bam"
>> visible="true" /> <!-- can use ext or format attribute. -->
>>        </data>
>>        <data format="bam" name="bam_file2" label="${tool.name} on
>> ${on_string}: sorted.markdup.bam">
>>          <discover_datasets pattern="out.sorted.markdup.bam" format="bam"
>> visible="true" /> <!-- can use ext or format attribute. -->
>>        </data>
>>        <data format="txt" name="logfile" label="${tool.name} on ${on_string}:
>> log">
>>          <discover_datasets pattern="out_dup_log.txt" format="txt"
>> visible="true" />
>>        </data>
>>      </outputs>
>>
>>
>> my problem is that I first get an empty file in the history and after some
>> moments I also get 3 more history icons containing the actual information:
>>
>> 304 NuGen nudup on data 266 and data 278: sorted.markdup.bam
>> empty
>> 308 NuGen nudup on data 266 and data 278: sorted.markdup.bam (None)
>> 506.3 MB
>>
>> with an additional (None) in the history name? Any ideas?
>>
>> Cheers Jochen
>>
>>
>> On 04.01.2017 13:10, Devon Ryan wrote:
>>
>> Hi Jochen,
>>
>> Don't have it use /tmp, but rather the current working directory and
>> then everything will work. Galaxy jobs are run in individual working
>> directories, so you can exploit that to ensure that files aren't
>> overwritten.
>>
>> Devon
>> --
>> Devon Ryan, Ph.D.
>> Email: [hidden email]
>> Data Manager/Bioinformatician
>> Max Planck Institute of Immunobiology and Epigenetics
>> Stübeweg 51
>> 79108 Freiburg
>> Germany
>>
>>
>> On Wed, Jan 4, 2017 at 12:32 PM, Jochen Bick <[hidden email]>
>> wrote:
>>
>> Hi,
>>
>> I have a tool that produces multiple output files a log file, two bam files.
>> (https://github.com/nugentechnologies/nudup)
>>
>> The tool it self provides an option called --out to specify a path to a
>> directory with a prefix that will be added to the output files:
>>
>> --out /tmp/out
>>
>> this will produce 3 files:
>>
>> /tmp/out_dup_log.txt
>> /tmp/out.sorted.dedup.bam
>> /tmp/out.sorted.markdup.bam
>>
>> so my question is if this out prefix will give me problems overwriting next
>> output files coming from this tool?
>>
>>
>> Cheers Jochen
>>
>>
>>
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>   https://lists.galaxyproject.org/
>>
>> To search Galaxy mailing lists use the unified search at:
>>   http://galaxyproject.org/search/mailinglists/
>>
>>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/