Re: Existing efforts to convert the QIIME pipeline to Galaxy?

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Jim Johnson-3

It is easiest to generate tools for galaxy when the applications or scripts can take arbitrarily named input files and generate output to given path names.  
Input directories, output directories are very convenient on the command line, but more of a challenge when crafting a galaxy tool. 
That said, many applications require a wrapper script to work with in galaxy.   
Thank you for the consistent script_info[] help/usage syntax in the qiime scripts,  which enabled me to generate a skeleton galaxy tool_config file for each qiime script.

I had some time last spring to work on integrating qiime into galaxy.
Unfortunately, I haven't had any time since to work on this.
I put those partial results  on the Galaxy Tool Shed:    http://toolshed.g2.bx.psu.edu/
There's a continuing effort at George Mason University to incorporate qiime into galaxy tools, so you may want to ask them what they need. 


I started by generating galaxy tool_config files, e.g. align_seqs.xml,  by using python to get the script_info[] from the qiime script:

$ cat generate_tool_config.bash
#!/usr/bin/env bash
python $1 > ${1%.*}.help
cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h > ${1%.*}.log

(I'll attach tool_template.txt )

This generated skeleton tool_config .xml files that I could then edit as needed.
( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )

I originally was calling all qiime scripts from a tool wrapper:  qiime_wrapper.py
But, if a script can be called with any input filepaths and write its results to any filepaths, and only writes to STDERR when it fails, then you could call that script directly.
  

When should you use a tool_wrapper or call the qiime script directly?
  Many of the qiime scripts could probably be called directly, especially if it can be called with arbitary input/output file pathnames.
  The reasons for using a tool wrapper may be if input/output needs to be manipulated, moved, renamed in order to be used by the qiime script.
  You'll also need a tool wrapper if the names or number of the output files can not be determined from the parameter settings.
  ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
  If your tool relies on a file ext to determine a format, you'll have to rename the input.
  ( Galaxy dataset pathnames will look something like:  /<your_galaxy_file_path>/072/dataset_72931.dat )
  The format/type of a dataset is stored in its metadata, so the tool_config can use that information, especially if a script can take muliple alternative input formats.
  A tool_wrapper can also be used to manage the stdout or stderr from a tool.   Galaxy currently interprets any output on stderr as a failure.



A couple changes in galaxy should make somethings easier than when I first attempted this:  
  - galaxy now accepts dataset requests with sub directories. ( https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch )
    That means that output HTML files with links into sub directories can be left intact, with the html copied to the output dataset and the linked files to its "extra_files_path".
  - if you know the pathname of an output relative to the working directory, galaxy can copy it automatically to the output dataset using the from_work_dir attribute.
    ( see example in:  https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml )

Datatypes
  You may want to create new datatypes to make it easier for the user to correctly select inputs to a tool from previous outputs. 
  For example, the qiime mapping file is a tabular file with specific requirements.  I put a 'qiimemapping' datatype in lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
  so an input could generate a select list containing only qiimemapping datasets rather than all tabular ones. 

Generating a configfile
  You can generate configfiles in the galaxy tool_config .xml file.   The configfile is generated by the Cheetah interpreter just as the commandline is.
  see:  alpha_rarefaction.xml

The qiime_wrapper.py was patterned after the mothur_wrapper.py   with some of the same wrapper params to handle run time determined output (perhaps not needed):
  --galaxy_datasets
         a comma separated list of regex:output_dataset the wrapper searches the working_dir and copies the file that matches the regex to the outout dataset
         if the exact pathname is known, use the "from_work_dir" attribute instead
  --galaxy_datasetid
         would be an output dataset id that would be used to dynamically create additional new datasets at job termination
         ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files "Number of Output datasets cannot be determined until tool run")
  --galaxy_new_datasets
         a comma separated list of regex:datatype used to dynamically create additional new datasets at job termination
  --galaxy_new_files_path
         the galaxy dir for dynamically generated output datasets




On 12/14/11 12:11 PM, Jose Carlos Clemente wrote:
Hi Jim,

we just had a meeting to discuss ideas for a QIIME GUI, and were wondering how far did you manage to get with your plan to develop a QIIME wrapper for Galaxy. Do you have something working that we could take a look at? Were there any particular issues with the integration? We might have some bandwidth to work on this over next year, but thought it would be good to check with you first.

Thanks,
Jose


On Mon, Feb 7, 2011 at 20:24, Rob Knight <[hidden email]> wrote:
Hi Jim,

Just to echo Greg's comments; it would be excellent to connect QIIME to Galaxy: let me know what is necessary from my end to ensure that this happens.

Thanks,
Rob

On Feb 7, 2011, at 12:30 PM, Greg Caporaso wrote:

Hi Jim,
This is great! We're very enthusiastic about hooking up QIIME with Galaxy, and definitely encourage you to work on it. It's something we've discussed doing, but no one in our lab is actively working on it. One particular use case that I've been thinking about would be to set up Galaxy/QIIME in our QIIME EC2 images so our EC2 users could access their images via the Galaxy web server. Are you planning to release all of the xml files under an open source/free distribution model?

I'm definitely interested in progress updates. Let me know if there are ways that we can help out, for example by giving you a list of what we believe are the most commonly used QIIME scripts so you can focus your efforts on those. I'm also cc'ing some of the folks in the lab here who are interested in Galaxy integration.

Greg

On Mon, Feb 7, 2011 at 10:36 AM, Jim Johnson <[hidden email]> wrote:
Greg,

Tim te Beek let me know that you have an interest in incorporating Qiime into Galaxy.   I'm currently working on that in support of a grant at the University of Minnesota.   I'll keep you apprised of my progress if you are interested.  

Thanks,

JJ

James Johnson
University of Minnesota Supercomputing Institute
117 Pleasant St. SE
Minneapolis, MN 55455
Email: [hidden email]



-------- Original Message --------
Subject: Re: [galaxy-dev] Existing efforts to convert the QIIME pipeline to Galaxy?
Date: Mon, 07 Feb 2011 11:15:38 -0600
From: Jim Johnson [hidden email]
Reply-To: [hidden email]
Organization: MSI University of Minnesota
To: Tim te Beek [hidden email]


Tim,

Sorry to hear we won't be doing this together.   I'm finding some sporadic time to work on this.  I also put my Mothur metagenomics wrapper code on  http://community.g2.bx.psu.edu/

I looked at the initial work you did, and combined that with the script design ideas I used with Mothur.   I'm writing a single script to call all the Qiime scripts, and generating a tool config (.xml) for each qiime script.   Since the qiime scripts all contain a rather consistent script_info dictionary in the code, I wrote a script to generate an initial  tool config from that script_info.   I'm now starting the tedious process of editing those for each script.  I'll attach the few I've started on.  If you have any users that would able to beta test this when I have this more complete, please let me know.  

Thanks,

JJ


On 2/7/11 10:15 AM, Tim te Beek wrote:
Hi Jim,

Sorry for the lack of feedback / updates on my part in converting QIIME to Galaxy. Following the initial difficulties with the conversion, the support project was put on hold pending further discussion. This morning the clients decided to forgo the Galaxy implementation, as their implementation would only ever be used by a single (technical) user, reducing the benefit of having a user friendly graphical interface.

So I'm sorry to say I won't be able to contribute to the conversion, although your wrapper script did look like a viable solution to work around the unknown output file problems. Should you be able to complete the conversion yourself and be willing to share your implementation, I would still very much like to know.

Perhaps the following could still be of some use to you: I contacted the QIIME development group about adding output file name arguments to the scripts, as can be seen here:
Although it appears they wont be able to add output file name script parameters, they did say they're interested in Galaxy, and would be willing to help if there's a need for that.

Additionally, we at NBIC do have a team that's more experienced at using Galaxy and converting tools to work in Galaxy, and if you want I could put you in contact with them about supporting this effort. In informal discussions they did seem to look favorable upon your solution, so perhaps that could be extended into a collaboration.

Best regards,
Tim

On Fri, Jan 14, 2011 at 20:01, Tim te Beek <[hidden email]> wrote:
Ah ok, that's still fine. Thanks for the archive! I'll see what I can learn from how you've handled running command and handling in- & output for the non-trivial scripts, and apply that to the QIIME scripts.
Should you wish so I could get back to you if there's any significant new developments on my part in moving the QIIME scripts to Galaxy, other than the Trac / SVN changes.

Have a good weekend!
Tim


On Fri, Jan 14, 2011 at 19:12, Jim Johnson <[hidden email]> wrote:
I should probably be more clear, in that mothur and qiime overlap in functionality and in the algorithms they use. 
They do use different data formats.



On 1/14/11 12:02 PM, Tim te Beek wrote:
Hi Jim,

If you could send me the Mothur package, that would be great. If there's indeed as much overlap then it could be a great way to kick-start the QIIME pipeline conversion development. I can look into it starting monday, with hopefully nothing else to distract me the whole week.

Best regards,
Tim


On Fri, Jan 14, 2011 at 18:49, Jim Johnson <[hidden email]> wrote:
Tim,

I just started looking at qiime myself.   We are hoping to release a galaxy server to the University of Minnesota researchers in the next month,
so there has been much to do in preparation.   I'll take a look at what you've done.  Thanks.

I finished a galaxy interface to the Mothur metagenomics package http://www.mothur.org/wiki/Main_Page.   It appears to have a lot of overlap with Qiime.  
I generated a number of datatypes,  and made use of many of the  tool config concepts that are available in galaxy.   
If you are interested,  I can send that to you:   mothur.tar.gz  (55108 bytes).   I hope to submit it tohttp://community.g2.bx.psu.edu/ in the next month.

Regards,

JJ


On 1/14/11 9:52 AM, Tim te Beek wrote:
Hi Jim,

Sorry to only respond just now: I've been tied up in other projects, so have not been able to get any work done on this up to today.

What I currently have is not much, since my progress has been slowed by my unfamiliarity with both Galaxy and the QIIME tools (I'm working on this as an outsider for a small support project). But what I do have I've made available through the following Trac       / SVN instance:

https://trac.nbic.nl/brs2010p26/browser/trunk/galaxy_dist/tools/qiime

These configuration files so far can run:
  • check_id_map
  • split_libraries
  • pick_otus_through_otu_table
For the first two the output is correctly directed to the Galaxy history, the last one generates quite a large set of output files which might have to be made available otherwise (the documentation on this gives a few options: https://bitbucket.org/galaxy/galaxy-central/wiki/ToolsMultipleOutput)).

I had some trouble working around the fact that most QIIME scripts do not allow one to specify output filenames on the command line, which seems to be a requirement for Galaxy to import files into the history pane. What I've done is written a bash file to accompany each XML file, that expects as first command line argument the command that should be evaluated, and as additional arguments the output filenames expected by Galaxy. After running a tool I then move whatever output files were created to the paths expected by Galaxy. I must say I'm not (at all) sure if this current approach is something to recommend, or if better alternatives are available. Any feedback on this would be highly appreciated.
An additional source of pain was that the split_libraries script requires specific file extensions for it's fasta & quality files, which required passing them separately and symbolic linking them as .fna & .qual files.

So far I've not taken the time to introduce new QIIME specific data types, but this could be something to consider given the large amount of 'tabular' input and output files.

I hope this email and the scripts linked above can still be of some use to you (or vice-versa), but in either case any feedback or collaboration would be much appreciated.

Best regards,
Tim


On Wed, Nov 24, 2010 at 15:48, Jim Johnson <[hidden email]> wrote:
> Tim,
>
> I hope to also look at incorporating QIIME into our local Galaxy instance at
> the University of Minnesota, but probably won't be able to start for a
> couple weeks.  It would be good to develop that in coordination with others.
>
> I just finished incorporating "Mothur" metagenomics suite
http://www.mothur.org/ (Dr. Patrick Schloss,  Department of Microbiology &
> Immunology at The University of Michigan) into our Galaxy server at the
> University of Minnesota.   I hope to contribute that to
http://community.g2.bx.psu.edu/ after some testing by our researchers.  If
> the Galaxy wrappings for Mothur are of any interest to you, I can send you a
> copy any time.
>
> Thanks,
>
> JJ
>
> James E Johnson
> Minnesota Supercomputing Institute
> University of Minnesota
>
>
> On Nov 23, 2010 at 07:22 AM, Tim te Beek wrote:
>
>> Hello all,
>>
>> Is anyone aware of any existing efforts to port the QIIME sequencing
>> pipeline (http://qiime.sourceforge.net/) to Galaxy? I would like to run
>> QIIME analyses through Galaxy to get better control of
>> intermediate processing steps, but before I start to convert (a subset of)
>> some 90 scripts, I'd first like to make sure this has not been done before
>> by anyone willing to share their work.
>>
>> So: has anyone converted the QIIME pipeline to Galaxy before, and would
>> they
>> be willing to share their scripts?
>>
>> Best regards,
>> Tim
>
> _______________________________________________
> galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev
>












___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

tool_template.txt (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Mattias

Thanks Jim for giving this extensive update on how to improve the Qiime wrappers.

We had contact about the Qiime wrappers before, and I have been using them in Galaxy on a weekly basis. Therefore I can say they have been proven to be very useful. I mainly edited the parameter and output sections of some configuration files, but all still use the wrapper script. The most important scripts are the split libraries and the workflow scripts (pick_otus_through_otu_table, alpha/beta diversity workflows).

I use the Qiime scripts in a Galaxy Cloudman environment adjusted to a multicore server. Setup on EC2 shouldn't be a problem, although you need to place the requirement tag for all the needed dependecies on top of the xml file.

I will have a look at the suggestions and I am willing to help to improve Qiime support for galaxy either by working on the configuration files or testing them out. So please keep me in the loop...

Greetings,

Mattias


-----Original Message-----
From: [hidden email] on behalf of [hidden email]
Sent: Sat 12/31/2011 6:00 PM
To: [hidden email]
Subject: galaxy-dev Digest, Vol 66, Issue 26
 
Send galaxy-dev mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.bx.psu.edu/listinfo/galaxy-dev
or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of galaxy-dev digest..."


HEY!  This is important!  If you reply to a thread in a digest, please
1. Change the subject of your response from "Galaxy-dev Digest Vol ..." to the original subject for the thread.
2. Strip out everything else in the digest that is not part of the thread you are responding to.

Why?
1. This will keep the subject meaningful.  People will have some idea from the subject line if they should read it or not.
2. Not doing this greatly increases the number of emails that match search queries, but that aren't actually informative.

Today's Topics:

   1. Re: Existing efforts to convert the QIIME pipeline to Galaxy?
      (Jim Johnson)


----------------------------------------------------------------------

Message: 1
Date: Thu, 29 Dec 2011 14:02:04 -0600
From: Jim Johnson <[hidden email]>
To: Jose Carlos Clemente <[hidden email]>,
        "[hidden email]" <[hidden email]>
Cc: Greg Caporaso <[hidden email]>, Rob Knight
        <[hidden email]>, Jesse Stombaugh <[hidden email]>,
        Zuzolo Amanda <[hidden email]>, Antonio Gonz?lez
        <[hidden email]>, Gillevet Patrick <[hidden email]>, [hidden email],
        Daniel McDonald <[hidden email]>
Subject: Re: [galaxy-dev] Existing efforts to convert the QIIME
        pipeline to Galaxy?
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"


It is easiest to generate tools for galaxy when the applications or scripts can take arbitrarily named input files and generate output to given path names.
Input directories, output directories are very convenient on the command line, but more of a challenge when crafting a galaxy tool.
That said, many applications require a wrapper script to work with in galaxy.
Thank you for the consistent script_info[] help/usage syntax in the qiime scripts,  which enabled me to generate a skeleton galaxy tool_config file for each qiime script.

I had some time last spring to work on integrating qiime into galaxy.
Unfortunately, I haven't had any time since to work on this.
I put those partial results  on the Galaxy Tool Shed:    http://toolshed.g2.bx.psu.edu/
There's a continuing effort at George Mason University to incorporate qiime into galaxy tools, so you may want to ask them what they need.


I started by generating galaxy tool_config files, e.g. align_seqs.xml,  by using python to get the script_info[] from the qiime script:

$ cat generate_tool_config.bash
#!/usr/bin/env bash
python $1 > ${1%.*}.help
cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h > ${1%.*}.log

(I'll attach tool_template.txt )

This generated skeleton tool_config .xml files that I could then edit as needed.
( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )

I originally was calling all qiime scripts from a tool wrapper:  qiime_wrapper.py
But, if a script can be called with any input filepaths and write its results to any filepaths, and only writes to STDERR when it fails, then you could call that script directly.


When should you use a tool_wrapper or call the qiime script directly?
   Many of the qiime scripts could probably be called directly, especially if it can be called with arbitary input/output file pathnames.
   The reasons for using a tool wrapper may be if input/output needs to be manipulated, moved, renamed in order to be used by the qiime script.
   You'll also need a tool wrapper if the names or number of the output files can not be determined from the parameter settings.
   ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
   If your tool relies on a file ext to determine a format, you'll have to rename the input.
   ( Galaxy dataset pathnames will look something like:  /<your_galaxy_file_path>/072/dataset_72931.dat )
   The format/type of a dataset is stored in its metadata, so the tool_config can use that information, especially if a script can take muliple alternative input formats.
   A tool_wrapper can also be used to manage the stdout or stderr from a tool.   Galaxy currently interprets any output on stderr as a failure.



A couple changes in galaxy should make somethings easier than when I first attempted this:
   - galaxy now accepts dataset requests with sub directories. ( https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch )
     That means that output HTML files with links into sub directories can be left intact, with the html copied to the output dataset and the linked files to its "extra_files_path".
   - if you know the pathname of an output relative to the working directory, galaxy can copy it automatically to the output dataset using the from_work_dir attribute.
     ( see example in:  https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml )

Datatypes
   You may want to create new datatypes to make it easier for the user to correctly select inputs to a tool from previous outputs.
   For example, the qiime mapping file is a tabular file with specific requirements.  I put a 'qiimemapping' datatype in lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
   so an input could generate a select list containing only qiimemapping datasets rather than all tabular ones.

Generating a configfile
   You can generate configfiles in the galaxy tool_config .xml file.   The configfile is generated by the Cheetah interpreter just as the commandline is.
   see:  alpha_rarefaction.xml

The qiime_wrapper.py was patterned after the mothur_wrapper.py   with some of the same wrapper params to handle run time determined output (perhaps not needed):
   --galaxy_datasets
          a comma separated list of regex:output_dataset the wrapper searches the working_dir and copies the file that matches the regex to the outout dataset
          if the exact pathname is known, use the "from_work_dir" attribute instead
   --galaxy_datasetid
          would be an output dataset id that would be used to dynamically create additional new datasets at job termination
          ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files "Number of Output datasets cannot be determined until tool run")
   --galaxy_new_datasets
          a comma separated list of regex:datatype used to dynamically create additional new datasets at job termination
   --galaxy_new_files_path
          the galaxy dir for dynamically generated output datasets




On 12/14/11 12:11 PM, Jose Carlos Clemente wrote:

> Hi Jim,
>
> we just had a meeting to discuss ideas for a QIIME GUI, and were wondering how far did you manage to get with your plan to develop a QIIME wrapper for Galaxy. Do you have something working that we could take a look at? Were there any particular issues with the integration? We might have some bandwidth to work on this over next year, but thought it would be good to check with you first.
>
> Thanks,
> Jose
>
>
> On Mon, Feb 7, 2011 at 20:24, Rob Knight <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hi Jim,
>
>     Just to echo Greg's comments; it would be excellent to connect QIIME to Galaxy: let me know what is necessary from my end to ensure that this happens.
>
>     Thanks,
>     Rob
>
>     On Feb 7, 2011, at 12:30 PM, Greg Caporaso wrote:
>
>>     Hi Jim,
>>     This is great! We're very enthusiastic about hooking up QIIME with Galaxy, and definitely encourage you to work on it. It's something we've discussed doing, but no one in our lab is actively working on it. One particular use case that I've been thinking about would be to set up Galaxy/QIIME in our QIIME EC2 images so our EC2 users could access their images via the Galaxy web server. Are you planning to release all of the xml files under an open source/free distribution model?
>>
>>     I'm definitely interested in progress updates. Let me know if there are ways that we can help out, for example by giving you a list of what we believe are the most commonly used QIIME scripts so you can focus your efforts on those. I'm also cc'ing some of the folks in the lab here who are interested in Galaxy integration.
>>
>>     Greg
>>
>>     On Mon, Feb 7, 2011 at 10:36 AM, Jim Johnson <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>         Greg,
>>
>>         Tim te Beek let me know that you have an interest in incorporating Qiime into Galaxy.   I'm currently working on that in support of a grant at the University of Minnesota.   I'll keep you apprised of my progress if you are interested.
>>
>>         Thanks,
>>
>>         JJ
>>
>>         James Johnson
>>         University of Minnesota Supercomputing Institute
>>         117 Pleasant St. SE
>>         Minneapolis, MN 55455
>>         Email: [hidden email] <mailto:[hidden email]>
>>
>>
>>
>>         -------- Original Message --------
>>         Subject: Re: [galaxy-dev] Existing efforts to convert the QIIME pipeline to Galaxy?
>>         Date: Mon, 07 Feb 2011 11:15:38 -0600
>>         From: Jim Johnson <[hidden email]> <mailto:[hidden email]>
>>         Reply-To: [hidden email] <mailto:[hidden email]>
>>         Organization: MSI University of Minnesota
>>         To: Tim te Beek <[hidden email]> <mailto:[hidden email]>
>>
>>
>>
>>         Tim,
>>
>>         Sorry to hear we won't be doing this together.   I'm finding some sporadic time to work on this.  I also put my Mothur metagenomics wrapper code on http://community.g2.bx.psu.edu/
>>
>>         I looked at the initial work you did, and combined that with the script design ideas I used with Mothur.   I'm writing a single script to call all the Qiime scripts, and generating a tool config (.xml) for each qiime script.   Since the qiime scripts all contain a rather consistent script_info dictionary in the code, I wrote a script to generate an initial  tool config from that script_info.   I'm now starting the tedious process of editing those for each script.  I'll attach the few I've started on.  If you have any users that would able to beta test this when I have this more complete, please let me know.
>>
>>         Thanks,
>>
>>         JJ
>>
>>
>>         On 2/7/11 10:15 AM, Tim te Beek wrote:
>>>         Hi Jim,
>>>
>>>         Sorry for the lack of feedback / updates on my part in converting QIIME to Galaxy. Following the initial difficulties with the conversion, the support project was put on hold pending further discussion. This morning the clients decided to forgo the Galaxy implementation, as their implementation would only ever be used by a single (technical) user, reducing the benefit of having a user friendly graphical interface.
>>>
>>>         So I'm sorry to say I won't be able to contribute to the conversion, although your wrapper script did look like a viable solution to work around the unknown output file problems. Should you be able to complete the conversion yourself and be willing to share your implementation, I would still very much like to know.
>>>
>>>         Perhaps the following could still be of some use to you: I contacted the QIIME development group about adding output file name arguments to the scripts, as can be seen here:
>>>         http://sourceforge.net/tracker/?func=detail&atid=1157167&aid=3164813&group_id=272178 <http://sourceforge.net/tracker/?func=detail&atid=1157167&aid=3164813&group_id=272178>
>>>         Although it appears they wont be able to add output file name script parameters, they did say they're interested in Galaxy, and would be willing to help if there's a need for that.
>>>
>>>         Additionally, we at NBIC do have a team that's more experienced at using Galaxy and converting tools to work in Galaxy, and if you want I could put you in contact with them about supporting this effort. In informal discussions they did seem to look favorable upon your solution, so perhaps that could be extended into a collaboration.
>>>
>>>         Best regards,
>>>         Tim
>>>
>>>         On Fri, Jan 14, 2011 at 20:01, Tim te Beek <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>>             Ah ok, that's still fine. Thanks for the archive! I'll see what I can learn from how you've handled running command and handling in- & output for the non-trivial scripts, and apply that to the QIIME scripts.
>>>             Should you wish so I could get back to you if there's any significant new developments on my part in moving the QIIME scripts to Galaxy, other than the Trac / SVN changes.
>>>
>>>             Have a good weekend!
>>>             Tim
>>>
>>>
>>>             On Fri, Jan 14, 2011 at 19:12, Jim Johnson <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>>                 I should probably be more clear, in that mothur and qiime overlap in functionality and in the algorithms they use.
>>>                 They do use different data formats.
>>>
>>>
>>>
>>>                 On 1/14/11 12:02 PM, Tim te Beek wrote:
>>>>                 Hi Jim,
>>>>
>>>>                 If you could send me the Mothur package, that would be great. If there's indeed as much overlap then it could be a great way to kick-start the QIIME pipeline conversion development. I can look into it starting monday, with hopefully nothing else to distract me the whole week.
>>>>
>>>>                 Best regards,
>>>>                 Tim
>>>>
>>>>
>>>>                 On Fri, Jan 14, 2011 at 18:49, Jim Johnson <[hidden email] <mailto:[hidden email]>> wrote:
>>>>
>>>>                     Tim,
>>>>
>>>>                     I just started looking at qiime myself.   We are hoping to release a galaxy server to the University of Minnesota researchers in the next month,
>>>>                     so there has been much to do in preparation.   I'll take a look at what you've done.  Thanks.
>>>>
>>>>                     I finished a galaxy interface to the Mothur metagenomics package http://www.mothur.org/wiki/Main_Page.   It appears to have a lot of overlap with Qiime.
>>>>                     I generated a number of datatypes,  and made use of many of the  tool config concepts that are available in galaxy.
>>>>                     If you are interested,  I can send that to you:   mothur.tar.gz  (55108 bytes).   I hope to submit it tohttp://community.g2.bx.psu.edu/ in the next month.
>>>>
>>>>                     Regards,
>>>>
>>>>                     JJ
>>>>
>>>>
>>>>                     On 1/14/11 9:52 AM, Tim te Beek wrote:
>>>>>                     Hi Jim,
>>>>>
>>>>>                     Sorry to only respond just now: I've been tied up in other projects, so have not been able to get any work done on this up to today.
>>>>>
>>>>>                     What I currently have is not much, since my progress has been slowed by my unfamiliarity with both Galaxy and the QIIME tools (I'm working on this as an outsider for a small support project). But what I do have I've made available through the following Trac       / SVN instance:
>>>>>                     _
>>>>>                     _
>>>>>
>>>>>                         https://trac.nbic.nl/brs2010p26/browser/trunk/galaxy_dist/tools/qiime
>>>>>
>>>>>
>>>>>                     These configuration files so far can run:
>>>>>
>>>>>                       * check_id_map
>>>>>                       * split_libraries
>>>>>                       * pick_otus_through_otu_table
>>>>>
>>>>>                     For the first two the output is correctly directed to the Galaxy history, the last one generates quite a large set of output files which might have to be made available otherwise (the documentation on this gives a few options: https://bitbucket.org/galaxy/galaxy-central/wiki/ToolsMultipleOutput)).
>>>>>
>>>>>                     I had some trouble working around the fact that most QIIME scripts do not allow one to specify output filenames on the command line, which seems to be a requirement for Galaxy to import files into the history pane. What I've done is written a bash file to accompany each XML file, that expects as first command line argument the command that should be evaluated, and as additional arguments the output filenames expected by Galaxy. After running a tool I then move whatever output files were created to the paths expected by Galaxy. I must say I'm not (at all) sure if this current approach is something to recommend, or if better alternatives are available. Any feedback on this would be highly appreciated.
>>>>>                     An additional source of pain was that the split_libraries script requires specific file extensions for it's fasta & quality files, which required passing them separately and symbolic linking them as .fna & .qual files.
>>>>>
>>>>>                     So far I've not taken the time to introduce new QIIME specific data types, but this could be something to consider given the large amount of 'tabular' input and output files.
>>>>>
>>>>>                     I hope this email and the scripts linked above can still be of some use to you (or vice-versa), but in either case any feedback or collaboration would be much appreciated.
>>>>>
>>>>>                     Best regards,
>>>>>                     Tim
>>>>>
>>>>>
>>>>>                     On Wed, Nov 24, 2010 at 15:48, Jim Johnson <[hidden email] <mailto:[hidden email]>> wrote:
>>>>>                     > Tim,
>>>>>                     >
>>>>>                     > I hope to also look at incorporating QIIME into our local Galaxy instance at
>>>>>                     > the University of Minnesota, but probably won't be able to start for a
>>>>>                     > couple weeks.  It would be good to develop that in coordination with others.
>>>>>                     >
>>>>>                     > I just finished incorporating "Mothur" metagenomics suite
>>>>>                     > http://www.mothur.org/ (Dr. Patrick Schloss,  Department of Microbiology &
>>>>>                     > Immunology at The University of Michigan) into our Galaxy server at the
>>>>>                     > University of Minnesota.   I hope to contribute that to
>>>>>                     > http://community.g2.bx.psu.edu/ after some testing by our researchers.  If
>>>>>                     > the Galaxy wrappings for Mothur are of any interest to you, I can send you a
>>>>>                     > copy any time.
>>>>>                     >
>>>>>                     > Thanks,
>>>>>                     >
>>>>>                     > JJ
>>>>>                     >
>>>>>                     > James E Johnson
>>>>>                     > Minnesota Supercomputing Institute
>>>>>                     > University of Minnesota
>>>>>                     >
>>>>>                     >
>>>>>                     > On Nov 23, 2010 at 07:22 AM, Tim te Beek wrote:
>>>>>                     >
>>>>>                     >> Hello all,
>>>>>                     >>
>>>>>                     >> Is anyone aware of any existing efforts to port the QIIME sequencing
>>>>>                     >> pipeline (http://qiime.sourceforge.net/) to Galaxy? I would like to run
>>>>>                     >> QIIME analyses through Galaxy to get better control of
>>>>>                     >> intermediate processing steps, but before I start to convert (a subset of)
>>>>>                     >> some 90 scripts, I'd first like to make sure this has not been done before
>>>>>                     >> by anyone willing to share their work.
>>>>>                     >>
>>>>>                     >> So: has anyone converted the QIIME pipeline to Galaxy before, and would
>>>>>                     >> they
>>>>>                     >> be willing to share their scripts?
>>>>>                     >>
>>>>>                     >> Best regards,
>>>>>                     >> Tim
>>>>>                     >
>>>>>                     > _______________________________________________
>>>>>                     > galaxy-dev mailing list
>>>>>                     > [hidden email] <mailto:[hidden email]>
>>>>>                     > http://lists.bx.psu.edu/listinfo/galaxy-dev
>>>>>                     >
>>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20111229/286f3564/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: tool_template.txt
URL: <http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20111229/286f3564/attachment-0001.txt>

------------------------------

_______________________________________________
galaxy-dev mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-dev


End of galaxy-dev Digest, Vol 66, Issue 26
******************************************


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

winmail.dat (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Gillevet Patrick
In reply to this post by Jim Johnson-3
Jim et al

Amanda has most of the scripts working now and will be putting them up on the toolshed.
She will be in touch as soon as the scripts are validated a couple of times with different datasets.

cheers...
Pat



On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:


It is easiest to generate tools for galaxy when the applications or scripts can take arbitrarily named input files and generate output to given path names.  
Input directories, output directories are very convenient on the command line, but more of a challenge when crafting a galaxy tool. 
That said, many applications require a wrapper script to work with in galaxy.   
Thank you for the consistent script_info[] help/usage syntax in the qiime scripts,  which enabled me to generate a skeleton galaxy tool_config file for each qiime script.

I had some time last spring to work on integrating qiime into galaxy.
Unfortunately, I haven't had any time since to work on this.
I put those partial results  on the Galaxy Tool Shed:    http://toolshed.g2.bx.psu.edu/
There's a continuing effort at George Mason University to incorporate qiime into galaxy tools, so you may want to ask them what they need. 


I started by generating galaxy tool_config files, e.g. align_seqs.xml,  by using python to get the script_info[] from the qiime script:

$ cat generate_tool_config.bash
#!/usr/bin/env bash
python $1 > ${1%.*}.help
cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h > ${1%.*}.log

(I'll attach tool_template.txt )

This generated skeleton tool_config .xml files that I could then edit as needed.
( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )

I originally was calling all qiime scripts from a tool wrapper:  qiime_wrapper.py
But, if a script can be called with any input filepaths and write its results to any filepaths, and only writes to STDERR when it fails, then you could call that script directly.
  

When should you use a tool_wrapper or call the qiime script directly?
  Many of the qiime scripts could probably be called directly, especially if it can be called with arbitary input/output file pathnames.
  The reasons for using a tool wrapper may be if input/output needs to be manipulated, moved, renamed in order to be used by the qiime script.
  You'll also need a tool wrapper if the names or number of the output files can not be determined from the parameter settings.
  ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
  If your tool relies on a file ext to determine a format, you'll have to rename the input.
  ( Galaxy dataset pathnames will look something like:  /<your_galaxy_file_path>/072/dataset_72931.dat )
  The format/type of a dataset is stored in its metadata, so the tool_config can use that information, especially if a script can take muliple alternative input formats.
  A tool_wrapper can also be used to manage the stdout or stderr from a tool.   Galaxy currently interprets any output on stderr as a failure.



A couple changes in galaxy should make somethings easier than when I first attempted this:  
  - galaxy now accepts dataset requests with sub directories. ( https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch )
    That means that output HTML files with links into sub directories can be left intact, with the html copied to the output dataset and the linked files to its "extra_files_path".
  - if you know the pathname of an output relative to the working directory, galaxy can copy it automatically to the output dataset using the from_work_dir attribute.
    ( see example in:  https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml )

Datatypes
  You may want to create new datatypes to make it easier for the user to correctly select inputs to a tool from previous outputs. 
  For example, the qiime mapping file is a tabular file with specific requirements.  I put a 'qiimemapping' datatype in lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
  so an input could generate a select list containing only qiimemapping datasets rather than all tabular ones. 

Generating a configfile
  You can generate configfiles in the galaxy tool_config .xml file.   The configfile is generated by the Cheetah interpreter just as the commandline is.
  see:  alpha_rarefaction.xml

The qiime_wrapper.py was patterned after the mothur_wrapper.py   with some of the same wrapper params to handle run time determined output (perhaps not needed):
  --galaxy_datasets
         a comma separated list of regex:output_dataset the wrapper searches the working_dir and copies the file that matches the regex to the outout dataset
         if the exact pathname is known, use the "from_work_dir" attribute instead
  --galaxy_datasetid
         would be an output dataset id that would be used to dynamically create additional new datasets at job termination
         ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files "Number of Output datasets cannot be determined until tool run")
  --galaxy_new_datasets
         a comma separated list of regex:datatype used to dynamically create additional new datasets at job termination
  --galaxy_new_files_path
         the galaxy dir for dynamically generated output datasets




*****************************************************************************************
                                Patrick M. Gillevet, Ph.D.
                       Director, Microbiome Analysis Center
    Professor, Department of Environmental Science and Policy
               Affiliate Professor, School of Systems Biology
             George Mason University, Prince William Campus
                    10900 University Boulevard, MSN 4D4
                             Manassas, Virginia  20110

Office 703-993-1057     Room Occoquan-426     FAX 703-993-8430
                                      http://mbac.gmu.edu
******************************************************************************************
















___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Jim Johnson-3
Pat,

That sounds great.   Do one of you want to take ownership of the toolshed repository? 
At minimum, we should add developers to the list that can push changes. 

Thanks,

JJ 

On 1/28/12 9:37 AM, Gillevet Patrick wrote:
Jim et al

Amanda has most of the scripts working now and will be putting them up on the toolshed.
She will be in touch as soon as the scripts are validated a couple of times with different datasets.

cheers...
Pat



On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:


It is easiest to generate tools for galaxy when the applications or scripts can take arbitrarily named input files and generate output to given path names.  
Input directories, output directories are very convenient on the command line, but more of a challenge when crafting a galaxy tool. 
That said, many applications require a wrapper script to work with in galaxy.   
Thank you for the consistent script_info[] help/usage syntax in the qiime scripts,  which enabled me to generate a skeleton galaxy tool_config file for each qiime script.

I had some time last spring to work on integrating qiime into galaxy.
Unfortunately, I haven't had any time since to work on this.
I put those partial results  on the Galaxy Tool Shed:    http://toolshed.g2.bx.psu.edu/
There's a continuing effort at George Mason University to incorporate qiime into galaxy tools, so you may want to ask them what they need. 


I started by generating galaxy tool_config files, e.g. align_seqs.xml,  by using python to get the script_info[] from the qiime script:

$ cat generate_tool_config.bash
#!/usr/bin/env bash
python $1 > ${1%.*}.help
cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h > ${1%.*}.log

(I'll attach tool_template.txt )

This generated skeleton tool_config .xml files that I could then edit as needed.
( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )

I originally was calling all qiime scripts from a tool wrapper:  qiime_wrapper.py
But, if a script can be called with any input filepaths and write its results to any filepaths, and only writes to STDERR when it fails, then you could call that script directly.
  

When should you use a tool_wrapper or call the qiime script directly?
  Many of the qiime scripts could probably be called directly, especially if it can be called with arbitary input/output file pathnames.
  The reasons for using a tool wrapper may be if input/output needs to be manipulated, moved, renamed in order to be used by the qiime script.
  You'll also need a tool wrapper if the names or number of the output files can not be determined from the parameter settings.
  ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
  If your tool relies on a file ext to determine a format, you'll have to rename the input.
  ( Galaxy dataset pathnames will look something like:  /<your_galaxy_file_path>/072/dataset_72931.dat )
  The format/type of a dataset is stored in its metadata, so the tool_config can use that information, especially if a script can take muliple alternative input formats.
  A tool_wrapper can also be used to manage the stdout or stderr from a tool.   Galaxy currently interprets any output on stderr as a failure.



A couple changes in galaxy should make somethings easier than when I first attempted this:  
  - galaxy now accepts dataset requests with sub directories. ( https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch )
    That means that output HTML files with links into sub directories can be left intact, with the html copied to the output dataset and the linked files to its "extra_files_path".
  - if you know the pathname of an output relative to the working directory, galaxy can copy it automatically to the output dataset using the from_work_dir attribute.
    ( see example in:  https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml )

Datatypes
  You may want to create new datatypes to make it easier for the user to correctly select inputs to a tool from previous outputs. 
  For example, the qiime mapping file is a tabular file with specific requirements.  I put a 'qiimemapping' datatype in lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
  so an input could generate a select list containing only qiimemapping datasets rather than all tabular ones. 

Generating a configfile
  You can generate configfiles in the galaxy tool_config .xml file.   The configfile is generated by the Cheetah interpreter just as the commandline is.
  see:  alpha_rarefaction.xml

The qiime_wrapper.py was patterned after the mothur_wrapper.py   with some of the same wrapper params to handle run time determined output (perhaps not needed):
  --galaxy_datasets
         a comma separated list of regex:output_dataset the wrapper searches the working_dir and copies the file that matches the regex to the outout dataset
         if the exact pathname is known, use the "from_work_dir" attribute instead
  --galaxy_datasetid
         would be an output dataset id that would be used to dynamically create additional new datasets at job termination
         ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files "Number of Output datasets cannot be determined until tool run")
  --galaxy_new_datasets
         a comma separated list of regex:datatype used to dynamically create additional new datasets at job termination
  --galaxy_new_files_path
         the galaxy dir for dynamically generated output datasets




*****************************************************************************************
                                Patrick M. Gillevet, Ph.D.
                       Director, Microbiome Analysis Center
    Professor, Department of Environmental Science and Policy
               Affiliate Professor, School of Systems Biology
             George Mason University, Prince William Campus
                    10900 University Boulevard, MSN 4D4
                             Manassas, Virginia  20110

Office 703-993-1057     Room Occoquan-426     FAX 703-993-8430
                                      http://mbac.gmu.edu
******************************************************************************************

















___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Rob Knight
This is great news -- thanks for letting us know, and for your hard work on this!

Rob

On Jan 29, 2012, at 9:46 AM, Jim Johnson wrote:

Pat,

That sounds great.   Do one of you want to take ownership of the toolshed repository? 
At minimum, we should add developers to the list that can push changes. 

Thanks,

JJ 

On 1/28/12 9:37 AM, Gillevet Patrick wrote:
Jim et al

Amanda has most of the scripts working now and will be putting them up on the toolshed.
She will be in touch as soon as the scripts are validated a couple of times with different datasets.

cheers...
Pat



On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:


It is easiest to generate tools for galaxy when the applications or scripts can take arbitrarily named input files and generate output to given path names.  
Input directories, output directories are very convenient on the command line, but more of a challenge when crafting a galaxy tool. 
That said, many applications require a wrapper script to work with in galaxy.   
Thank you for the consistent script_info[] help/usage syntax in the qiime scripts,  which enabled me to generate a skeleton galaxy tool_config file for each qiime script.

I had some time last spring to work on integrating qiime into galaxy.
Unfortunately, I haven't had any time since to work on this.
I put those partial results  on the Galaxy Tool Shed:    http://toolshed.g2.bx.psu.edu/
There's a continuing effort at George Mason University to incorporate qiime into galaxy tools, so you may want to ask them what they need. 


I started by generating galaxy tool_config files, e.g. align_seqs.xml,  by using python to get the script_info[] from the qiime script:

$ cat generate_tool_config.bash
#!/usr/bin/env bash
python $1 > ${1%.*}.help
cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h > ${1%.*}.log

(I'll attach tool_template.txt )

This generated skeleton tool_config .xml files that I could then edit as needed.
( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )

I originally was calling all qiime scripts from a tool wrapper:  qiime_wrapper.py
But, if a script can be called with any input filepaths and write its results to any filepaths, and only writes to STDERR when it fails, then you could call that script directly.
  

When should you use a tool_wrapper or call the qiime script directly?
  Many of the qiime scripts could probably be called directly, especially if it can be called with arbitary input/output file pathnames.
  The reasons for using a tool wrapper may be if input/output needs to be manipulated, moved, renamed in order to be used by the qiime script.
  You'll also need a tool wrapper if the names or number of the output files can not be determined from the parameter settings.
  ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
  If your tool relies on a file ext to determine a format, you'll have to rename the input.
  ( Galaxy dataset pathnames will look something like:  /<your_galaxy_file_path>/072/dataset_72931.dat )
  The format/type of a dataset is stored in its metadata, so the tool_config can use that information, especially if a script can take muliple alternative input formats.
  A tool_wrapper can also be used to manage the stdout or stderr from a tool.   Galaxy currently interprets any output on stderr as a failure.



A couple changes in galaxy should make somethings easier than when I first attempted this:  
  - galaxy now accepts dataset requests with sub directories. ( https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch )
    That means that output HTML files with links into sub directories can be left intact, with the html copied to the output dataset and the linked files to its "extra_files_path".
  - if you know the pathname of an output relative to the working directory, galaxy can copy it automatically to the output dataset using the from_work_dir attribute.
    ( see example in:  https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml )

Datatypes
  You may want to create new datatypes to make it easier for the user to correctly select inputs to a tool from previous outputs. 
  For example, the qiime mapping file is a tabular file with specific requirements.  I put a 'qiimemapping' datatype in lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
  so an input could generate a select list containing only qiimemapping datasets rather than all tabular ones. 

Generating a configfile
  You can generate configfiles in the galaxy tool_config .xml file.   The configfile is generated by the Cheetah interpreter just as the commandline is.
  see:  alpha_rarefaction.xml

The qiime_wrapper.py was patterned after the mothur_wrapper.py   with some of the same wrapper params to handle run time determined output (perhaps not needed):
  --galaxy_datasets
         a comma separated list of regex:output_dataset the wrapper searches the working_dir and copies the file that matches the regex to the outout dataset
         if the exact pathname is known, use the "from_work_dir" attribute instead
  --galaxy_datasetid
         would be an output dataset id that would be used to dynamically create additional new datasets at job termination
         ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files "Number of Output datasets cannot be determined until tool run")
  --galaxy_new_datasets
         a comma separated list of regex:datatype used to dynamically create additional new datasets at job termination
  --galaxy_new_files_path
         the galaxy dir for dynamically generated output datasets




*****************************************************************************************
                                Patrick M. Gillevet, Ph.D.
                       Director, Microbiome Analysis Center
    Professor, Department of Environmental Science and Policy
               Affiliate Professor, School of Systems Biology
             George Mason University, Prince William Campus
                    10900 University Boulevard, MSN 4D4
                             Manassas, Virginia  20110

Office 703-993-1057     Room Occoquan-426     FAX 703-993-8430
                                      http://mbac.gmu.edu
******************************************************************************************


















___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Greg Caporaso
Hi all,
Do you have a working tool definition file for QIIME's
beta_diversity_through_plots.py script? We're investigating whether
replacing the Fast UniFrac website with this would be feasible, and
I'd like to see if you have something together before I try to write
one myself.

Thanks!
Greg

On Sun, Jan 29, 2012 at 10:01 AM, Rob Knight <[hidden email]> wrote:

> This is great news -- thanks for letting us know, and for your hard work on
> this!
>
> Rob
>
> On Jan 29, 2012, at 9:46 AM, Jim Johnson wrote:
>
> Pat,
>
> That sounds great.   Do one of you want to take ownership of the toolshed
> repository?
> At minimum, we should add developers to the list that can push changes.
>
> Thanks,
>
> JJ
>
> On 1/28/12 9:37 AM, Gillevet Patrick wrote:
>
> Jim et al
>
> Amanda has most of the scripts working now and will be putting them up on
> the toolshed.
> She will be in touch as soon as the scripts are validated a couple of times
> with different datasets.
>
> cheers...
> Pat
>
>
>
> On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:
>
>
> It is easiest to generate tools for galaxy when the applications or scripts
> can take arbitrarily named input files and generate output to given path
> names.
> Input directories, output directories are very convenient on the command
> line, but more of a challenge when crafting a galaxy tool.
> That said, many applications require a wrapper script to work with in
> galaxy.
> Thank you for the consistent script_info[] help/usage syntax in the qiime
> scripts,  which enabled me to generate a skeleton galaxy tool_config file
> for each qiime script.
>
> I had some time last spring to work on integrating qiime into galaxy.
> Unfortunately, I haven't had any time since to work on this.
> I put those partial results  on the Galaxy Tool Shed:
> http://toolshed.g2.bx.psu.edu/
> There's a continuing effort at George Mason University to incorporate qiime
> into galaxy tools, so you may want to ask them what they need.
>
>
> I started by generating galaxy tool_config files, e.g. align_seqs.xml,  by
> using python to get the script_info[] from the qiime script:
>
> $ cat generate_tool_config.bash
> #!/usr/bin/env bash
> python $1 > ${1%.*}.help
> cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h >
> ${1%.*}.log
>
> (I'll attach tool_template.txt )
>
> This generated skeleton tool_config .xml files that I could then edit as
> needed.
> ( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )
>
> I originally was calling all qiime scripts from a tool wrapper:
> qiime_wrapper.py
> But, if a script can be called with any input filepaths and write its
> results to any filepaths, and only writes to STDERR when it fails, then you
> could call that script directly.
>
>
> When should you use a tool_wrapper or call the qiime script directly?
>   Many of the qiime scripts could probably be called directly, especially if
> it can be called with arbitary input/output file pathnames.
>   The reasons for using a tool wrapper may be if input/output needs to be
> manipulated, moved, renamed in order to be used by the qiime script.
>   You'll also need a tool wrapper if the names or number of the output files
> can not be determined from the parameter settings.
>   ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
>   If your tool relies on a file ext to determine a format, you'll have to
> rename the input.
>   ( Galaxy dataset pathnames will look something like:
> /<your_galaxy_file_path>/072/dataset_72931.dat )
>   The format/type of a dataset is stored in its metadata, so the tool_config
> can use that information, especially if a script can take muliple
> alternative input formats.
>   A tool_wrapper can also be used to manage the stdout or stderr from a
> tool.   Galaxy currently interprets any output on stderr as a failure.
>
>
>
> A couple changes in galaxy should make somethings easier than when I first
> attempted this:
>   - galaxy now accepts dataset requests with sub directories. (
> https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch
> )
>     That means that output HTML files with links into sub directories can be
> left intact, with the html copied to the output dataset and the linked files
> to its "extra_files_path".
>   - if you know the pathname of an output relative to the working directory,
> galaxy can copy it automatically to the output dataset using the
> from_work_dir attribute.
>     ( see example in:
> https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml
> )
>
> Datatypes
>   You may want to create new datatypes to make it easier for the user to
> correctly select inputs to a tool from previous outputs.
>   For example, the qiime mapping file is a tabular file with specific
> requirements.  I put a 'qiimemapping' datatype in
> lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
>   so an input could generate a select list containing only qiimemapping
> datasets rather than all tabular ones.
>
> Generating a configfile
>   You can generate configfiles in the galaxy tool_config .xml file.   The
> configfile is generated by the Cheetah interpreter just as the commandline
> is.
>   see:  alpha_rarefaction.xml
>
> The qiime_wrapper.py was patterned after the mothur_wrapper.py   with some
> of the same wrapper params to handle run time determined output (perhaps not
> needed):
>   --galaxy_datasets
>          a comma separated list of regex:output_dataset the wrapper searches
> the working_dir and copies the file that matches the regex to the outout
> dataset
>          if the exact pathname is known, use the "from_work_dir" attribute
> instead
>   --galaxy_datasetid
>          would be an output dataset id that would be used to dynamically
> create additional new datasets at job termination
>          ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files
> "Number of Output datasets cannot be determined until tool run")
>   --galaxy_new_datasets
>          a comma separated list of regex:datatype used to dynamically create
> additional new datasets at job termination
>   --galaxy_new_files_path
>          the galaxy dir for dynamically generated output datasets
>
>
>
>
> *****************************************************************************************
>                                 Patrick M. Gillevet, Ph.D.
>                        Director, Microbiome Analysis Center
>     Professor, Department of Environmental Science and Policy
>                Affiliate Professor, School of Systems Biology
>              George Mason University, Prince William Campus
>                     10900 University Boulevard, MSN 4D4
>                              Manassas, Virginia  20110
>
> Office 703-993-1057     Room Occoquan-426     FAX 703-993-8430
>                                       http://mbac.gmu.edu
> ******************************************************************************************
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Amanda Zuzolo
In reply to this post by Jim Johnson-3
Hello, all.

I have been working on getting the Qiime scripts into Galaxy as
mentioned before, and they are working with Qiime 1.3.0. I have edited
the wrapper file that Jim Johnson wrote to create more flexibility,
especially in cases where the tool looks for a specific file type
extension (for example, a .fna file), or where the tool normally
outputs something to the command line that is not normally picked up
in Galaxy.

So far, I have completely finished fixing the XML files to the latest
documentation for the entire Pick OTU process, Alpha Diversity, and
Beta Diversity, as well as other miscellaneous functions. Currently, I
am working on making scripts for jack-knifing functional. I determined
that it would be easier to get individual scripts functional, rather
than workflow scripts, since that allows the end-user to have more
control. Additionally, the workflow scripts can easily be recreated by
using Galaxy's workflows.

As far as the toolshed goes, I don't believe I know the ins and outs
yet, but I would be more than willing to learn if people would benefit
from having these versions in that repository.

2012/1/29 Jim Johnson <[hidden email]>:

> Pat,
>
> That sounds great.   Do one of you want to take ownership of the toolshed
> repository?
> At minimum, we should add developers to the list that can push changes.
>
> Thanks,
>
> JJ
>
> On 1/28/12 9:37 AM, Gillevet Patrick wrote:
>
> Jim et al
>
> Amanda has most of the scripts working now and will be putting them up on
> the toolshed.
> She will be in touch as soon as the scripts are validated a couple of times
> with different datasets.
>
> cheers...
> Pat
>
>
>
> On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:
>
>
> It is easiest to generate tools for galaxy when the applications or scripts
> can take arbitrarily named input files and generate output to given path
> names.
> Input directories, output directories are very convenient on the command
> line, but more of a challenge when crafting a galaxy tool.
> That said, many applications require a wrapper script to work with in
> galaxy.
> Thank you for the consistent script_info[] help/usage syntax in the qiime
> scripts,  which enabled me to generate a skeleton galaxy tool_config file
> for each qiime script.
>
> I had some time last spring to work on integrating qiime into galaxy.
> Unfortunately, I haven't had any time since to work on this.
> I put those partial results  on the Galaxy Tool Shed:
> http://toolshed.g2.bx.psu.edu/
> There's a continuing effort at George Mason University to incorporate qiime
> into galaxy tools, so you may want to ask them what they need.
>
>
> I started by generating galaxy tool_config files, e.g. align_seqs.xml,  by
> using python to get the script_info[] from the qiime script:
>
> $ cat generate_tool_config.bash
> #!/usr/bin/env bash
> python $1 > ${1%.*}.help
> cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h >
> ${1%.*}.log
>
> (I'll attach tool_template.txt )
>
> This generated skeleton tool_config .xml files that I could then edit as
> needed.
> ( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )
>
> I originally was calling all qiime scripts from a tool wrapper:
> qiime_wrapper.py
> But, if a script can be called with any input filepaths and write its
> results to any filepaths, and only writes to STDERR when it fails, then you
> could call that script directly.
>
>
> When should you use a tool_wrapper or call the qiime script directly?
>   Many of the qiime scripts could probably be called directly, especially if
> it can be called with arbitary input/output file pathnames.
>   The reasons for using a tool wrapper may be if input/output needs to be
> manipulated, moved, renamed in order to be used by the qiime script.
>   You'll also need a tool wrapper if the names or number of the output files
> can not be determined from the parameter settings.
>   ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
>   If your tool relies on a file ext to determine a format, you'll have to
> rename the input.
>   ( Galaxy dataset pathnames will look something like:
> /<your_galaxy_file_path>/072/dataset_72931.dat )
>   The format/type of a dataset is stored in its metadata, so the tool_config
> can use that information, especially if a script can take muliple
> alternative input formats.
>   A tool_wrapper can also be used to manage the stdout or stderr from a
> tool.   Galaxy currently interprets any output on stderr as a failure.
>
>
>
> A couple changes in galaxy should make somethings easier than when I first
> attempted this:
>   - galaxy now accepts dataset requests with sub directories. (
> https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch
> )
>     That means that output HTML files with links into sub directories can be
> left intact, with the html copied to the output dataset and the linked files
> to its "extra_files_path".
>   - if you know the pathname of an output relative to the working directory,
> galaxy can copy it automatically to the output dataset using the
> from_work_dir attribute.
>     ( see example in:
> https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml
> )
>
> Datatypes
>   You may want to create new datatypes to make it easier for the user to
> correctly select inputs to a tool from previous outputs.
>   For example, the qiime mapping file is a tabular file with specific
> requirements.  I put a 'qiimemapping' datatype in
> lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
>   so an input could generate a select list containing only qiimemapping
> datasets rather than all tabular ones.
>
> Generating a configfile
>   You can generate configfiles in the galaxy tool_config .xml file.   The
> configfile is generated by the Cheetah interpreter just as the commandline
> is.
>   see:  alpha_rarefaction.xml
>
> The qiime_wrapper.py was patterned after the mothur_wrapper.py   with some
> of the same wrapper params to handle run time determined output (perhaps not
> needed):
>   --galaxy_datasets
>          a comma separated list of regex:output_dataset the wrapper searches
> the working_dir and copies the file that matches the regex to the outout
> dataset
>          if the exact pathname is known, use the "from_work_dir" attribute
> instead
>   --galaxy_datasetid
>          would be an output dataset id that would be used to dynamically
> create additional new datasets at job termination
>          ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files
> "Number of Output datasets cannot be determined until tool run")
>   --galaxy_new_datasets
>          a comma separated list of regex:datatype used to dynamically create
> additional new datasets at job termination
>   --galaxy_new_files_path
>          the galaxy dir for dynamically generated output datasets
>
>
>
>
> *****************************************************************************************
>                                 Patrick M. Gillevet, Ph.D.
>                        Director, Microbiome Analysis Center
>     Professor, Department of Environmental Science and Policy
>                Affiliate Professor, School of Systems Biology
>              George Mason University, Prince William Campus
>                     10900 University Boulevard, MSN 4D4
>                              Manassas, Virginia  20110
>
> Office 703-993-1057     Room Occoquan-426     FAX 703-993-8430
>                                       http://mbac.gmu.edu
> ******************************************************************************************
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



--
Amanda Zuzolo
Bioengineering Major, George Mason University
Metabiome Informatics Group, Environmental Biocomplexity

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Florent Angly
Hi Amanda,
I would certainly be interested in using your helpful QIIME wrappers if
you put them on the Toolshed.
Best,
Florent

On 06/02/12 06:22, Amanda Zuzolo wrote:

> Hello, all.
>
> I have been working on getting the Qiime scripts into Galaxy as
> mentioned before, and they are working with Qiime 1.3.0. I have edited
> the wrapper file that Jim Johnson wrote to create more flexibility,
> especially in cases where the tool looks for a specific file type
> extension (for example, a .fna file), or where the tool normally
> outputs something to the command line that is not normally picked up
> in Galaxy.
>
> So far, I have completely finished fixing the XML files to the latest
> documentation for the entire Pick OTU process, Alpha Diversity, and
> Beta Diversity, as well as other miscellaneous functions. Currently, I
> am working on making scripts for jack-knifing functional. I determined
> that it would be easier to get individual scripts functional, rather
> than workflow scripts, since that allows the end-user to have more
> control. Additionally, the workflow scripts can easily be recreated by
> using Galaxy's workflows.
>
> As far as the toolshed goes, I don't believe I know the ins and outs
> yet, but I would be more than willing to learn if people would benefit
> from having these versions in that repository.
>
> 2012/1/29 Jim Johnson<[hidden email]>:
>> Pat,
>>
>> That sounds great.   Do one of you want to take ownership of the toolshed
>> repository?
>> At minimum, we should add developers to the list that can push changes.
>>
>> Thanks,
>>
>> JJ
>>
>> On 1/28/12 9:37 AM, Gillevet Patrick wrote:
>>
>> Jim et al
>>
>> Amanda has most of the scripts working now and will be putting them up on
>> the toolshed.
>> She will be in touch as soon as the scripts are validated a couple of times
>> with different datasets.
>>
>> cheers...
>> Pat
>>
>>
>>
>> On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:
>>
>>
>> It is easiest to generate tools for galaxy when the applications or scripts
>> can take arbitrarily named input files and generate output to given path
>> names.
>> Input directories, output directories are very convenient on the command
>> line, but more of a challenge when crafting a galaxy tool.
>> That said, many applications require a wrapper script to work with in
>> galaxy.
>> Thank you for the consistent script_info[] help/usage syntax in the qiime
>> scripts,  which enabled me to generate a skeleton galaxy tool_config file
>> for each qiime script.
>>
>> I had some time last spring to work on integrating qiime into galaxy.
>> Unfortunately, I haven't had any time since to work on this.
>> I put those partial results  on the Galaxy Tool Shed:
>> http://toolshed.g2.bx.psu.edu/
>> There's a continuing effort at George Mason University to incorporate qiime
>> into galaxy tools, so you may want to ask them what they need.
>>
>>
>> I started by generating galaxy tool_config files, e.g. align_seqs.xml,  by
>> using python to get the script_info[] from the qiime script:
>>
>> $ cat generate_tool_config.bash
>> #!/usr/bin/env bash
>> python $1>  ${1%.*}.help
>> cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h>
>> ${1%.*}.log
>>
>> (I'll attach tool_template.txt )
>>
>> This generated skeleton tool_config .xml files that I could then edit as
>> needed.
>> ( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )
>>
>> I originally was calling all qiime scripts from a tool wrapper:
>> qiime_wrapper.py
>> But, if a script can be called with any input filepaths and write its
>> results to any filepaths, and only writes to STDERR when it fails, then you
>> could call that script directly.
>>
>>
>> When should you use a tool_wrapper or call the qiime script directly?
>>    Many of the qiime scripts could probably be called directly, especially if
>> it can be called with arbitary input/output file pathnames.
>>    The reasons for using a tool wrapper may be if input/output needs to be
>> manipulated, moved, renamed in order to be used by the qiime script.
>>    You'll also need a tool wrapper if the names or number of the output files
>> can not be determined from the parameter settings.
>>    ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
>>    If your tool relies on a file ext to determine a format, you'll have to
>> rename the input.
>>    ( Galaxy dataset pathnames will look something like:
>> /<your_galaxy_file_path>/072/dataset_72931.dat )
>>    The format/type of a dataset is stored in its metadata, so the tool_config
>> can use that information, especially if a script can take muliple
>> alternative input formats.
>>    A tool_wrapper can also be used to manage the stdout or stderr from a
>> tool.   Galaxy currently interprets any output on stderr as a failure.
>>
>>
>>
>> A couple changes in galaxy should make somethings easier than when I first
>> attempted this:
>>    - galaxy now accepts dataset requests with sub directories. (
>> https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch
>> )
>>      That means that output HTML files with links into sub directories can be
>> left intact, with the html copied to the output dataset and the linked files
>> to its "extra_files_path".
>>    - if you know the pathname of an output relative to the working directory,
>> galaxy can copy it automatically to the output dataset using the
>> from_work_dir attribute.
>>      ( see example in:
>> https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml
>> )
>>
>> Datatypes
>>    You may want to create new datatypes to make it easier for the user to
>> correctly select inputs to a tool from previous outputs.
>>    For example, the qiime mapping file is a tabular file with specific
>> requirements.  I put a 'qiimemapping' datatype in
>> lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
>>    so an input could generate a select list containing only qiimemapping
>> datasets rather than all tabular ones.
>>
>> Generating a configfile
>>    You can generate configfiles in the galaxy tool_config .xml file.   The
>> configfile is generated by the Cheetah interpreter just as the commandline
>> is.
>>    see:  alpha_rarefaction.xml
>>
>> The qiime_wrapper.py was patterned after the mothur_wrapper.py   with some
>> of the same wrapper params to handle run time determined output (perhaps not
>> needed):
>>    --galaxy_datasets
>>           a comma separated list of regex:output_dataset the wrapper searches
>> the working_dir and copies the file that matches the regex to the outout
>> dataset
>>           if the exact pathname is known, use the "from_work_dir" attribute
>> instead
>>    --galaxy_datasetid
>>           would be an output dataset id that would be used to dynamically
>> create additional new datasets at job termination
>>           ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files
>> "Number of Output datasets cannot be determined until tool run")
>>    --galaxy_new_datasets
>>           a comma separated list of regex:datatype used to dynamically create
>> additional new datasets at job termination
>>    --galaxy_new_files_path
>>           the galaxy dir for dynamically generated output datasets
>>
>>
>>
>>
>> *****************************************************************************************
>>                                  Patrick M. Gillevet, Ph.D.
>>                         Director, Microbiome Analysis Center
>>      Professor, Department of Environmental Science and Policy
>>                 Affiliate Professor, School of Systems Biology
>>               George Mason University, Prince William Campus
>>                      10900 University Boulevard, MSN 4D4
>>                               Manassas, Virginia  20110
>>
>> Office 703-993-1057     Room Occoquan-426     FAX 703-993-8430
>>                                        http://mbac.gmu.edu
>> ******************************************************************************************
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Jeffrey Long
Hello Amanda,
I was just about to embark on EXACTLY this process, so I would certainly be very interested in saving myself some work.  
Would there be any issue (that you're aware of, of course) with using QIIME 1.4.0 instead of 1.3?

-Jeff

On Tue, Feb 7, 2012 at 2:32 AM, Florent Angly <[hidden email]> wrote:
Hi Amanda,
I would certainly be interested in using your helpful QIIME wrappers if you put them on the Toolshed.
Best,
Florent

On 06/02/12 06:22, Amanda Zuzolo wrote:
Hello, all.

I have been working on getting the Qiime scripts into Galaxy as
mentioned before, and they are working with Qiime 1.3.0. I have edited
the wrapper file that Jim Johnson wrote to create more flexibility,
especially in cases where the tool looks for a specific file type
extension (for example, a .fna file), or where the tool normally
outputs something to the command line that is not normally picked up
in Galaxy.

So far, I have completely finished fixing the XML files to the latest
documentation for the entire Pick OTU process, Alpha Diversity, and
Beta Diversity, as well as other miscellaneous functions. Currently, I
am working on making scripts for jack-knifing functional. I determined
that it would be easier to get individual scripts functional, rather
than workflow scripts, since that allows the end-user to have more
control. Additionally, the workflow scripts can easily be recreated by
using Galaxy's workflows.

As far as the toolshed goes, I don't believe I know the ins and outs
yet, but I would be more than willing to learn if people would benefit
from having these versions in that repository.

2012/1/29 Jim Johnson<[hidden email]>:
Pat,

That sounds great.   Do one of you want to take ownership of the toolshed
repository?
At minimum, we should add developers to the list that can push changes.

Thanks,

JJ

On 1/28/12 9:37 AM, Gillevet Patrick wrote:

Jim et al

Amanda has most of the scripts working now and will be putting them up on
the toolshed.
She will be in touch as soon as the scripts are validated a couple of times
with different datasets.

cheers...
Pat



On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:


It is easiest to generate tools for galaxy when the applications or scripts
can take arbitrarily named input files and generate output to given path
names.
Input directories, output directories are very convenient on the command
line, but more of a challenge when crafting a galaxy tool.
That said, many applications require a wrapper script to work with in
galaxy.
Thank you for the consistent script_info[] help/usage syntax in the qiime
scripts,  which enabled me to generate a skeleton galaxy tool_config file
for each qiime script.

I had some time last spring to work on integrating qiime into galaxy.
Unfortunately, I haven't had any time since to work on this.
I put those partial results  on the Galaxy Tool Shed:
http://toolshed.g2.bx.psu.edu/
There's a continuing effort at George Mason University to incorporate qiime
into galaxy tools, so you may want to ask them what they need.


I started by generating galaxy tool_config files, e.g. align_seqs.xml,  by
using python to get the script_info[] from the qiime script:

$ cat generate_tool_config.bash
#!/usr/bin/env bash
python $1>  ${1%.*}.help
cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h>
${1%.*}.log

(I'll attach tool_template.txt )

This generated skeleton tool_config .xml files that I could then edit as
needed.
( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )

I originally was calling all qiime scripts from a tool wrapper:
qiime_wrapper.py
But, if a script can be called with any input filepaths and write its
results to any filepaths, and only writes to STDERR when it fails, then you
could call that script directly.


When should you use a tool_wrapper or call the qiime script directly?
  Many of the qiime scripts could probably be called directly, especially if
it can be called with arbitary input/output file pathnames.
  The reasons for using a tool wrapper may be if input/output needs to be
manipulated, moved, renamed in order to be used by the qiime script.
  You'll also need a tool wrapper if the names or number of the output files
can not be determined from the parameter settings.
  ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
  If your tool relies on a file ext to determine a format, you'll have to
rename the input.
  ( Galaxy dataset pathnames will look something like:
/<your_galaxy_file_path>/072/dataset_72931.dat )
  The format/type of a dataset is stored in its metadata, so the tool_config
can use that information, especially if a script can take muliple
alternative input formats.
  A tool_wrapper can also be used to manage the stdout or stderr from a
tool.   Galaxy currently interprets any output on stderr as a failure.



A couple changes in galaxy should make somethings easier than when I first
attempted this:
  - galaxy now accepts dataset requests with sub directories. (
https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch
)
    That means that output HTML files with links into sub directories can be
left intact, with the html copied to the output dataset and the linked files
to its "extra_files_path".
  - if you know the pathname of an output relative to the working directory,
galaxy can copy it automatically to the output dataset using the
from_work_dir attribute.
    ( see example in:
https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml
)

Datatypes
  You may want to create new datatypes to make it easier for the user to
correctly select inputs to a tool from previous outputs.
  For example, the qiime mapping file is a tabular file with specific
requirements.  I put a 'qiimemapping' datatype in
lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
  so an input could generate a select list containing only qiimemapping
datasets rather than all tabular ones.

Generating a configfile
  You can generate configfiles in the galaxy tool_config .xml file.   The
configfile is generated by the Cheetah interpreter just as the commandline
is.
  see:  alpha_rarefaction.xml

The qiime_wrapper.py was patterned after the mothur_wrapper.py   with some
of the same wrapper params to handle run time determined output (perhaps not
needed):
  --galaxy_datasets
         a comma separated list of regex:output_dataset the wrapper searches
the working_dir and copies the file that matches the regex to the outout
dataset
         if the exact pathname is known, use the "from_work_dir" attribute
instead
  --galaxy_datasetid
         would be an output dataset id that would be used to dynamically
create additional new datasets at job termination
         ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files
"Number of Output datasets cannot be determined until tool run")
  --galaxy_new_datasets
         a comma separated list of regex:datatype used to dynamically create
additional new datasets at job termination
  --galaxy_new_files_path
         the galaxy dir for dynamically generated output datasets




*****************************************************************************************
                                Patrick M. Gillevet, Ph.D.
                       Director, Microbiome Analysis Center
    Professor, Department of Environmental Science and Policy
               Affiliate Professor, School of Systems Biology
             George Mason University, Prince William Campus
                    10900 University Boulevard, MSN 4D4
                             Manassas, Virginia  20110

Office <a href="tel:703-993-1057" value="+17039931057" target="_blank">703-993-1057     Room Occoquan-426     FAX <a href="tel:703-993-8430" value="+17039938430" target="_blank">703-993-8430
                                      http://mbac.gmu.edu
******************************************************************************************



















___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Rob Knight
I can't make the call at that time (am in Dhaka) but am very enthusiastic about that effort; please keep me in the loop. I am cc:ing a couple of the people in my lab who also indicated interest in the qiime/galaxy integration effort (though Antonio won't be able to make it either, for the same reason). Thanks!

Rob

On Feb 7, 2012, at 8:48 AM, Jeffrey Long wrote:

Hello Amanda,
I was just about to embark on EXACTLY this process, so I would certainly be very interested in saving myself some work.  
Would there be any issue (that you're aware of, of course) with using QIIME 1.4.0 instead of 1.3?

-Jeff

On Tue, Feb 7, 2012 at 2:32 AM, Florent Angly <[hidden email]> wrote:
Hi Amanda,
I would certainly be interested in using your helpful QIIME wrappers if you put them on the Toolshed.
Best,
Florent

On 06/02/12 06:22, Amanda Zuzolo wrote:
Hello, all.

I have been working on getting the Qiime scripts into Galaxy as
mentioned before, and they are working with Qiime 1.3.0. I have edited
the wrapper file that Jim Johnson wrote to create more flexibility,
especially in cases where the tool looks for a specific file type
extension (for example, a .fna file), or where the tool normally
outputs something to the command line that is not normally picked up
in Galaxy.

So far, I have completely finished fixing the XML files to the latest
documentation for the entire Pick OTU process, Alpha Diversity, and
Beta Diversity, as well as other miscellaneous functions. Currently, I
am working on making scripts for jack-knifing functional. I determined
that it would be easier to get individual scripts functional, rather
than workflow scripts, since that allows the end-user to have more
control. Additionally, the workflow scripts can easily be recreated by
using Galaxy's workflows.

As far as the toolshed goes, I don't believe I know the ins and outs
yet, but I would be more than willing to learn if people would benefit
from having these versions in that repository.

2012/1/29 Jim Johnson<[hidden email]>:
Pat,

That sounds great.   Do one of you want to take ownership of the toolshed
repository?
At minimum, we should add developers to the list that can push changes.

Thanks,

JJ

On 1/28/12 9:37 AM, Gillevet Patrick wrote:

Jim et al

Amanda has most of the scripts working now and will be putting them up on
the toolshed.
She will be in touch as soon as the scripts are validated a couple of times
with different datasets.

cheers...
Pat



On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:


It is easiest to generate tools for galaxy when the applications or scripts
can take arbitrarily named input files and generate output to given path
names.
Input directories, output directories are very convenient on the command
line, but more of a challenge when crafting a galaxy tool.
That said, many applications require a wrapper script to work with in
galaxy.
Thank you for the consistent script_info[] help/usage syntax in the qiime
scripts,  which enabled me to generate a skeleton galaxy tool_config file
for each qiime script.

I had some time last spring to work on integrating qiime into galaxy.
Unfortunately, I haven't had any time since to work on this.
I put those partial results  on the Galaxy Tool Shed:
http://toolshed.g2.bx.psu.edu/
There's a continuing effort at George Mason University to incorporate qiime
into galaxy tools, so you may want to ask them what they need.


I started by generating galaxy tool_config files, e.g. align_seqs.xml,  by
using python to get the script_info[] from the qiime script:

$ cat generate_tool_config.bash
#!/usr/bin/env bash
python $1>  ${1%.*}.help
cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h>
${1%.*}.log

(I'll attach tool_template.txt )

This generated skeleton tool_config .xml files that I could then edit as
needed.
( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )

I originally was calling all qiime scripts from a tool wrapper:
qiime_wrapper.py
But, if a script can be called with any input filepaths and write its
results to any filepaths, and only writes to STDERR when it fails, then you
could call that script directly.


When should you use a tool_wrapper or call the qiime script directly?
  Many of the qiime scripts could probably be called directly, especially if
it can be called with arbitary input/output file pathnames.
  The reasons for using a tool wrapper may be if input/output needs to be
manipulated, moved, renamed in order to be used by the qiime script.
  You'll also need a tool wrapper if the names or number of the output files
can not be determined from the parameter settings.
  ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
  If your tool relies on a file ext to determine a format, you'll have to
rename the input.
  ( Galaxy dataset pathnames will look something like:
/<your_galaxy_file_path>/072/dataset_72931.dat )
  The format/type of a dataset is stored in its metadata, so the tool_config
can use that information, especially if a script can take muliple
alternative input formats.
  A tool_wrapper can also be used to manage the stdout or stderr from a
tool.   Galaxy currently interprets any output on stderr as a failure.



A couple changes in galaxy should make somethings easier than when I first
attempted this:
  - galaxy now accepts dataset requests with sub directories. (
https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch
)
    That means that output HTML files with links into sub directories can be
left intact, with the html copied to the output dataset and the linked files
to its "extra_files_path".
  - if you know the pathname of an output relative to the working directory,
galaxy can copy it automatically to the output dataset using the
from_work_dir attribute.
    ( see example in:
https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml
)

Datatypes
  You may want to create new datatypes to make it easier for the user to
correctly select inputs to a tool from previous outputs.
  For example, the qiime mapping file is a tabular file with specific
requirements.  I put a 'qiimemapping' datatype in
lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
  so an input could generate a select list containing only qiimemapping
datasets rather than all tabular ones.

Generating a configfile
  You can generate configfiles in the galaxy tool_config .xml file.   The
configfile is generated by the Cheetah interpreter just as the commandline
is.
  see:  alpha_rarefaction.xml

The qiime_wrapper.py was patterned after the mothur_wrapper.py   with some
of the same wrapper params to handle run time determined output (perhaps not
needed):
  --galaxy_datasets
         a comma separated list of regex:output_dataset the wrapper searches
the working_dir and copies the file that matches the regex to the outout
dataset
         if the exact pathname is known, use the "from_work_dir" attribute
instead
  --galaxy_datasetid
         would be an output dataset id that would be used to dynamically
create additional new datasets at job termination
         ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files
"Number of Output datasets cannot be determined until tool run")
  --galaxy_new_datasets
         a comma separated list of regex:datatype used to dynamically create
additional new datasets at job termination
  --galaxy_new_files_path
         the galaxy dir for dynamically generated output datasets




*****************************************************************************************
                                Patrick M. Gillevet, Ph.D.
                       Director, Microbiome Analysis Center
    Professor, Department of Environmental Science and Policy
               Affiliate Professor, School of Systems Biology
             George Mason University, Prince William Campus
                    10900 University Boulevard, MSN 4D4
                             Manassas, Virginia  20110

Office <a href="tel:703-993-1057" value="+17039931057" target="_blank">703-993-1057     Room Occoquan-426     FAX <a href="tel:703-993-8430" value="+17039938430" target="_blank">703-993-8430
                                      http://mbac.gmu.edu
******************************************************************************************



















___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Greg Caporaso
OK, let's plan on a Skype call at 1pm MT/3pm ET this Thursday (9 Feb
2012). I will initiate the call - my Skype ID is gregcaporaso. Please
let me know if you'd like to join the call, and send my your skype id.

Looking forward to talking about this!

Greg

2012/2/7 Rob Knight <[hidden email]>:

> I can't make the call at that time (am in Dhaka) but am very enthusiastic
> about that effort; please keep me in the loop. I am cc:ing a couple of the
> people in my lab who also indicated interest in the qiime/galaxy integration
> effort (though Antonio won't be able to make it either, for the same
> reason). Thanks!
>
> Rob
>
> On Feb 7, 2012, at 8:48 AM, Jeffrey Long wrote:
>
> Hello Amanda,
> I was just about to embark on EXACTLY this process, so I would certainly be
> very interested in saving myself some work.
> Would there be any issue (that you're aware of, of course) with using QIIME
> 1.4.0 instead of 1.3?
>
> -Jeff
>
> On Tue, Feb 7, 2012 at 2:32 AM, Florent Angly <[hidden email]>
> wrote:
>>
>> Hi Amanda,
>> I would certainly be interested in using your helpful QIIME wrappers if
>> you put them on the Toolshed.
>> Best,
>> Florent
>>
>> On 06/02/12 06:22, Amanda Zuzolo wrote:
>>>
>>> Hello, all.
>>>
>>> I have been working on getting the Qiime scripts into Galaxy as
>>> mentioned before, and they are working with Qiime 1.3.0. I have edited
>>> the wrapper file that Jim Johnson wrote to create more flexibility,
>>> especially in cases where the tool looks for a specific file type
>>> extension (for example, a .fna file), or where the tool normally
>>> outputs something to the command line that is not normally picked up
>>> in Galaxy.
>>>
>>> So far, I have completely finished fixing the XML files to the latest
>>> documentation for the entire Pick OTU process, Alpha Diversity, and
>>> Beta Diversity, as well as other miscellaneous functions. Currently, I
>>> am working on making scripts for jack-knifing functional. I determined
>>> that it would be easier to get individual scripts functional, rather
>>> than workflow scripts, since that allows the end-user to have more
>>> control. Additionally, the workflow scripts can easily be recreated by
>>> using Galaxy's workflows.
>>>
>>> As far as the toolshed goes, I don't believe I know the ins and outs
>>> yet, but I would be more than willing to learn if people would benefit
>>> from having these versions in that repository.
>>>
>>> 2012/1/29 Jim Johnson<[hidden email]>:
>>>>
>>>> Pat,
>>>>
>>>> That sounds great.   Do one of you want to take ownership of the
>>>> toolshed
>>>> repository?
>>>> At minimum, we should add developers to the list that can push changes.
>>>>
>>>> Thanks,
>>>>
>>>> JJ
>>>>
>>>> On 1/28/12 9:37 AM, Gillevet Patrick wrote:
>>>>
>>>> Jim et al
>>>>
>>>> Amanda has most of the scripts working now and will be putting them up
>>>> on
>>>> the toolshed.
>>>> She will be in touch as soon as the scripts are validated a couple of
>>>> times
>>>> with different datasets.
>>>>
>>>> cheers...
>>>> Pat
>>>>
>>>>
>>>>
>>>> On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:
>>>>
>>>>
>>>> It is easiest to generate tools for galaxy when the applications or
>>>> scripts
>>>> can take arbitrarily named input files and generate output to given path
>>>> names.
>>>> Input directories, output directories are very convenient on the command
>>>> line, but more of a challenge when crafting a galaxy tool.
>>>> That said, many applications require a wrapper script to work with in
>>>> galaxy.
>>>> Thank you for the consistent script_info[] help/usage syntax in the
>>>> qiime
>>>> scripts,  which enabled me to generate a skeleton galaxy tool_config
>>>> file
>>>> for each qiime script.
>>>>
>>>> I had some time last spring to work on integrating qiime into galaxy.
>>>> Unfortunately, I haven't had any time since to work on this.
>>>> I put those partial results  on the Galaxy Tool Shed:
>>>> http://toolshed.g2.bx.psu.edu/
>>>> There's a continuing effort at George Mason University to incorporate
>>>> qiime
>>>> into galaxy tools, so you may want to ask them what they need.
>>>>
>>>>
>>>> I started by generating galaxy tool_config files, e.g. align_seqs.xml,
>>>>  by
>>>> using python to get the script_info[] from the qiime script:
>>>>
>>>> $ cat generate_tool_config.bash
>>>> #!/usr/bin/env bash
>>>> python $1>  ${1%.*}.help
>>>> cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h>
>>>> ${1%.*}.log
>>>>
>>>> (I'll attach tool_template.txt )
>>>>
>>>> This generated skeleton tool_config .xml files that I could then edit as
>>>> needed.
>>>> ( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )
>>>>
>>>> I originally was calling all qiime scripts from a tool wrapper:
>>>> qiime_wrapper.py
>>>> But, if a script can be called with any input filepaths and write its
>>>> results to any filepaths, and only writes to STDERR when it fails, then
>>>> you
>>>> could call that script directly.
>>>>
>>>>
>>>> When should you use a tool_wrapper or call the qiime script directly?
>>>>   Many of the qiime scripts could probably be called directly,
>>>> especially if
>>>> it can be called with arbitary input/output file pathnames.
>>>>   The reasons for using a tool wrapper may be if input/output needs to
>>>> be
>>>> manipulated, moved, renamed in order to be used by the qiime script.
>>>>   You'll also need a tool wrapper if the names or number of the output
>>>> files
>>>> can not be determined from the parameter settings.
>>>>   ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
>>>>   If your tool relies on a file ext to determine a format, you'll have
>>>> to
>>>> rename the input.
>>>>   ( Galaxy dataset pathnames will look something like:
>>>> /<your_galaxy_file_path>/072/dataset_72931.dat )
>>>>   The format/type of a dataset is stored in its metadata, so the
>>>> tool_config
>>>> can use that information, especially if a script can take muliple
>>>> alternative input formats.
>>>>   A tool_wrapper can also be used to manage the stdout or stderr from a
>>>> tool.   Galaxy currently interprets any output on stderr as a failure.
>>>>
>>>>
>>>>
>>>> A couple changes in galaxy should make somethings easier than when I
>>>> first
>>>> attempted this:
>>>>   - galaxy now accepts dataset requests with sub directories. (
>>>>
>>>> https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch
>>>> )
>>>>     That means that output HTML files with links into sub directories
>>>> can be
>>>> left intact, with the html copied to the output dataset and the linked
>>>> files
>>>> to its "extra_files_path".
>>>>   - if you know the pathname of an output relative to the working
>>>> directory,
>>>> galaxy can copy it automatically to the output dataset using the
>>>> from_work_dir attribute.
>>>>     ( see example in:
>>>>
>>>> https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml
>>>> )
>>>>
>>>> Datatypes
>>>>   You may want to create new datatypes to make it easier for the user to
>>>> correctly select inputs to a tool from previous outputs.
>>>>   For example, the qiime mapping file is a tabular file with specific
>>>> requirements.  I put a 'qiimemapping' datatype in
>>>> lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
>>>>   so an input could generate a select list containing only qiimemapping
>>>> datasets rather than all tabular ones.
>>>>
>>>> Generating a configfile
>>>>   You can generate configfiles in the galaxy tool_config .xml file.
>>>> The
>>>> configfile is generated by the Cheetah interpreter just as the
>>>> commandline
>>>> is.
>>>>   see:  alpha_rarefaction.xml
>>>>
>>>> The qiime_wrapper.py was patterned after the mothur_wrapper.py   with
>>>> some
>>>> of the same wrapper params to handle run time determined output (perhaps
>>>> not
>>>> needed):
>>>>   --galaxy_datasets
>>>>          a comma separated list of regex:output_dataset the wrapper
>>>> searches
>>>> the working_dir and copies the file that matches the regex to the outout
>>>> dataset
>>>>          if the exact pathname is known, use the "from_work_dir"
>>>> attribute
>>>> instead
>>>>   --galaxy_datasetid
>>>>          would be an output dataset id that would be used to dynamically
>>>> create additional new datasets at job termination
>>>>          (
>>>> http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files
>>>> "Number of Output datasets cannot be determined until tool run")
>>>>   --galaxy_new_datasets
>>>>          a comma separated list of regex:datatype used to dynamically
>>>> create
>>>> additional new datasets at job termination
>>>>   --galaxy_new_files_path
>>>>          the galaxy dir for dynamically generated output datasets
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *****************************************************************************************
>>>>                                 Patrick M. Gillevet, Ph.D.
>>>>                        Director, Microbiome Analysis Center
>>>>     Professor, Department of Environmental Science and Policy
>>>>                Affiliate Professor, School of Systems Biology
>>>>              George Mason University, Prince William Campus
>>>>                     10900 University Boulevard, MSN 4D4
>>>>                              Manassas, Virginia  20110
>>>>
>>>> Office 703-993-1057     Room Occoquan-426     FAX 703-993-8430
>>>>                                       http://mbac.gmu.edu
>>>>
>>>> ******************************************************************************************
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>
>>  http://lists.bx.psu.edu/
>>
>
>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Daniel McDonald
I'll be there
Daniel



On Feb 7, 2012, at 10:30, Greg Caporaso <[hidden email]> wrote:

> OK, let's plan on a Skype call at 1pm MT/3pm ET this Thursday (9 Feb
> 2012). I will initiate the call - my Skype ID is gregcaporaso. Please
> let me know if you'd like to join the call, and send my your skype id.
>
> Looking forward to talking about this!
>
> Greg
>
> 2012/2/7 Rob Knight <[hidden email]>:
>> I can't make the call at that time (am in Dhaka) but am very enthusiastic
>> about that effort; please keep me in the loop. I am cc:ing a couple of the
>> people in my lab who also indicated interest in the qiime/galaxy integration
>> effort (though Antonio won't be able to make it either, for the same
>> reason). Thanks!
>>
>> Rob
>>
>> On Feb 7, 2012, at 8:48 AM, Jeffrey Long wrote:
>>
>> Hello Amanda,
>> I was just about to embark on EXACTLY this process, so I would certainly be
>> very interested in saving myself some work.
>> Would there be any issue (that you're aware of, of course) with using QIIME
>> 1.4.0 instead of 1.3?
>>
>> -Jeff
>>
>> On Tue, Feb 7, 2012 at 2:32 AM, Florent Angly <[hidden email]>
>> wrote:
>>>
>>> Hi Amanda,
>>> I would certainly be interested in using your helpful QIIME wrappers if
>>> you put them on the Toolshed.
>>> Best,
>>> Florent
>>>
>>> On 06/02/12 06:22, Amanda Zuzolo wrote:
>>>>
>>>> Hello, all.
>>>>
>>>> I have been working on getting the Qiime scripts into Galaxy as
>>>> mentioned before, and they are working with Qiime 1.3.0. I have edited
>>>> the wrapper file that Jim Johnson wrote to create more flexibility,
>>>> especially in cases where the tool looks for a specific file type
>>>> extension (for example, a .fna file), or where the tool normally
>>>> outputs something to the command line that is not normally picked up
>>>> in Galaxy.
>>>>
>>>> So far, I have completely finished fixing the XML files to the latest
>>>> documentation for the entire Pick OTU process, Alpha Diversity, and
>>>> Beta Diversity, as well as other miscellaneous functions. Currently, I
>>>> am working on making scripts for jack-knifing functional. I determined
>>>> that it would be easier to get individual scripts functional, rather
>>>> than workflow scripts, since that allows the end-user to have more
>>>> control. Additionally, the workflow scripts can easily be recreated by
>>>> using Galaxy's workflows.
>>>>
>>>> As far as the toolshed goes, I don't believe I know the ins and outs
>>>> yet, but I would be more than willing to learn if people would benefit
>>>> from having these versions in that repository.
>>>>
>>>> 2012/1/29 Jim Johnson<[hidden email]>:
>>>>>
>>>>> Pat,
>>>>>
>>>>> That sounds great.   Do one of you want to take ownership of the
>>>>> toolshed
>>>>> repository?
>>>>> At minimum, we should add developers to the list that can push changes.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> JJ
>>>>>
>>>>> On 1/28/12 9:37 AM, Gillevet Patrick wrote:
>>>>>
>>>>> Jim et al
>>>>>
>>>>> Amanda has most of the scripts working now and will be putting them up
>>>>> on
>>>>> the toolshed.
>>>>> She will be in touch as soon as the scripts are validated a couple of
>>>>> times
>>>>> with different datasets.
>>>>>
>>>>> cheers...
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:
>>>>>
>>>>>
>>>>> It is easiest to generate tools for galaxy when the applications or
>>>>> scripts
>>>>> can take arbitrarily named input files and generate output to given path
>>>>> names.
>>>>> Input directories, output directories are very convenient on the command
>>>>> line, but more of a challenge when crafting a galaxy tool.
>>>>> That said, many applications require a wrapper script to work with in
>>>>> galaxy.
>>>>> Thank you for the consistent script_info[] help/usage syntax in the
>>>>> qiime
>>>>> scripts,  which enabled me to generate a skeleton galaxy tool_config
>>>>> file
>>>>> for each qiime script.
>>>>>
>>>>> I had some time last spring to work on integrating qiime into galaxy.
>>>>> Unfortunately, I haven't had any time since to work on this.
>>>>> I put those partial results  on the Galaxy Tool Shed:
>>>>> http://toolshed.g2.bx.psu.edu/
>>>>> There's a continuing effort at George Mason University to incorporate
>>>>> qiime
>>>>> into galaxy tools, so you may want to ask them what they need.
>>>>>
>>>>>
>>>>> I started by generating galaxy tool_config files, e.g. align_seqs.xml,
>>>>>  by
>>>>> using python to get the script_info[] from the qiime script:
>>>>>
>>>>> $ cat generate_tool_config.bash
>>>>> #!/usr/bin/env bash
>>>>> python $1>  ${1%.*}.help
>>>>> cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h>
>>>>> ${1%.*}.log
>>>>>
>>>>> (I'll attach tool_template.txt )
>>>>>
>>>>> This generated skeleton tool_config .xml files that I could then edit as
>>>>> needed.
>>>>> ( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )
>>>>>
>>>>> I originally was calling all qiime scripts from a tool wrapper:
>>>>> qiime_wrapper.py
>>>>> But, if a script can be called with any input filepaths and write its
>>>>> results to any filepaths, and only writes to STDERR when it fails, then
>>>>> you
>>>>> could call that script directly.
>>>>>
>>>>>
>>>>> When should you use a tool_wrapper or call the qiime script directly?
>>>>>   Many of the qiime scripts could probably be called directly,
>>>>> especially if
>>>>> it can be called with arbitary input/output file pathnames.
>>>>>   The reasons for using a tool wrapper may be if input/output needs to
>>>>> be
>>>>> manipulated, moved, renamed in order to be used by the qiime script.
>>>>>   You'll also need a tool wrapper if the names or number of the output
>>>>> files
>>>>> can not be determined from the parameter settings.
>>>>>   ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
>>>>>   If your tool relies on a file ext to determine a format, you'll have
>>>>> to
>>>>> rename the input.
>>>>>   ( Galaxy dataset pathnames will look something like:
>>>>> /<your_galaxy_file_path>/072/dataset_72931.dat )
>>>>>   The format/type of a dataset is stored in its metadata, so the
>>>>> tool_config
>>>>> can use that information, especially if a script can take muliple
>>>>> alternative input formats.
>>>>>   A tool_wrapper can also be used to manage the stdout or stderr from a
>>>>> tool.   Galaxy currently interprets any output on stderr as a failure.
>>>>>
>>>>>
>>>>>
>>>>> A couple changes in galaxy should make somethings easier than when I
>>>>> first
>>>>> attempted this:
>>>>>   - galaxy now accepts dataset requests with sub directories. (
>>>>>
>>>>> https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch
>>>>> )
>>>>>     That means that output HTML files with links into sub directories
>>>>> can be
>>>>> left intact, with the html copied to the output dataset and the linked
>>>>> files
>>>>> to its "extra_files_path".
>>>>>   - if you know the pathname of an output relative to the working
>>>>> directory,
>>>>> galaxy can copy it automatically to the output dataset using the
>>>>> from_work_dir attribute.
>>>>>     ( see example in:
>>>>>
>>>>> https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml
>>>>> )
>>>>>
>>>>> Datatypes
>>>>>   You may want to create new datatypes to make it easier for the user to
>>>>> correctly select inputs to a tool from previous outputs.
>>>>>   For example, the qiime mapping file is a tabular file with specific
>>>>> requirements.  I put a 'qiimemapping' datatype in
>>>>> lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
>>>>>   so an input could generate a select list containing only qiimemapping
>>>>> datasets rather than all tabular ones.
>>>>>
>>>>> Generating a configfile
>>>>>   You can generate configfiles in the galaxy tool_config .xml file.
>>>>> The
>>>>> configfile is generated by the Cheetah interpreter just as the
>>>>> commandline
>>>>> is.
>>>>>   see:  alpha_rarefaction.xml
>>>>>
>>>>> The qiime_wrapper.py was patterned after the mothur_wrapper.py   with
>>>>> some
>>>>> of the same wrapper params to handle run time determined output (perhaps
>>>>> not
>>>>> needed):
>>>>>   --galaxy_datasets
>>>>>          a comma separated list of regex:output_dataset the wrapper
>>>>> searches
>>>>> the working_dir and copies the file that matches the regex to the outout
>>>>> dataset
>>>>>          if the exact pathname is known, use the "from_work_dir"
>>>>> attribute
>>>>> instead
>>>>>   --galaxy_datasetid
>>>>>          would be an output dataset id that would be used to dynamically
>>>>> create additional new datasets at job termination
>>>>>          (
>>>>> http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files
>>>>> "Number of Output datasets cannot be determined until tool run")
>>>>>   --galaxy_new_datasets
>>>>>          a comma separated list of regex:datatype used to dynamically
>>>>> create
>>>>> additional new datasets at job termination
>>>>>   --galaxy_new_files_path
>>>>>          the galaxy dir for dynamically generated output datasets
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *****************************************************************************************
>>>>>                                 Patrick M. Gillevet, Ph.D.
>>>>>                        Director, Microbiome Analysis Center
>>>>>     Professor, Department of Environmental Science and Policy
>>>>>                Affiliate Professor, School of Systems Biology
>>>>>              George Mason University, Prince William Campus
>>>>>                     10900 University Boulevard, MSN 4D4
>>>>>                              Manassas, Virginia  20110
>>>>>
>>>>> Office 703-993-1057     Room Occoquan-426     FAX 703-993-8430
>>>>>                                       http://mbac.gmu.edu
>>>>>
>>>>> ******************************************************************************************
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> ___________________________________________________________
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>>
>>>  http://lists.bx.psu.edu/
>>>
>>
>>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Amanda Zuzolo
In reply to this post by Jeffrey Long
Rob,

I am not aware that there would be any issue, as I've verified all the
options with the Qiime documentation that is up now (and I've
eliminated those being deprecated).

Amanda Zuzolo

On 2/7/12, Jeffrey Long <[hidden email]> wrote:

> Hello Amanda,
> I was just about to embark on EXACTLY this process, so I would certainly be
> very interested in saving myself some work.
> Would there be any issue (that you're aware of, of course) with using QIIME
> 1.4.0 instead of 1.3?
>
> -Jeff
>
> On Tue, Feb 7, 2012 at 2:32 AM, Florent Angly
> <[hidden email]>wrote:
>
>> Hi Amanda,
>> I would certainly be interested in using your helpful QIIME wrappers if
>> you put them on the Toolshed.
>> Best,
>> Florent
>>
>> On 06/02/12 06:22, Amanda Zuzolo wrote:
>>
>>> Hello, all.
>>>
>>> I have been working on getting the Qiime scripts into Galaxy as
>>> mentioned before, and they are working with Qiime 1.3.0. I have edited
>>> the wrapper file that Jim Johnson wrote to create more flexibility,
>>> especially in cases where the tool looks for a specific file type
>>> extension (for example, a .fna file), or where the tool normally
>>> outputs something to the command line that is not normally picked up
>>> in Galaxy.
>>>
>>> So far, I have completely finished fixing the XML files to the latest
>>> documentation for the entire Pick OTU process, Alpha Diversity, and
>>> Beta Diversity, as well as other miscellaneous functions. Currently, I
>>> am working on making scripts for jack-knifing functional. I determined
>>> that it would be easier to get individual scripts functional, rather
>>> than workflow scripts, since that allows the end-user to have more
>>> control. Additionally, the workflow scripts can easily be recreated by
>>> using Galaxy's workflows.
>>>
>>> As far as the toolshed goes, I don't believe I know the ins and outs
>>> yet, but I would be more than willing to learn if people would benefit
>>> from having these versions in that repository.
>>>
>>> 2012/1/29 Jim Johnson<[hidden email]>:
>>>
>>>> Pat,
>>>>
>>>> That sounds great.   Do one of you want to take ownership of the
>>>> toolshed
>>>> repository?
>>>> At minimum, we should add developers to the list that can push changes.
>>>>
>>>> Thanks,
>>>>
>>>> JJ
>>>>
>>>> On 1/28/12 9:37 AM, Gillevet Patrick wrote:
>>>>
>>>> Jim et al
>>>>
>>>> Amanda has most of the scripts working now and will be putting them up
>>>> on
>>>> the toolshed.
>>>> She will be in touch as soon as the scripts are validated a couple of
>>>> times
>>>> with different datasets.
>>>>
>>>> cheers...
>>>> Pat
>>>>
>>>>
>>>>
>>>> On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:
>>>>
>>>>
>>>> It is easiest to generate tools for galaxy when the applications or
>>>> scripts
>>>> can take arbitrarily named input files and generate output to given
>>>> path
>>>> names.
>>>> Input directories, output directories are very convenient on the
>>>> command
>>>> line, but more of a challenge when crafting a galaxy tool.
>>>> That said, many applications require a wrapper script to work with in
>>>> galaxy.
>>>> Thank you for the consistent script_info[] help/usage syntax in the
>>>> qiime
>>>> scripts,  which enabled me to generate a skeleton galaxy tool_config
>>>> file
>>>> for each qiime script.
>>>>
>>>> I had some time last spring to work on integrating qiime into galaxy.
>>>> Unfortunately, I haven't had any time since to work on this.
>>>> I put those partial results  on the Galaxy Tool Shed:
>>>> http://toolshed.g2.bx.psu.edu/
>>>> There's a continuing effort at George Mason University to incorporate
>>>> qiime
>>>> into galaxy tools, so you may want to ask them what they need.
>>>>
>>>>
>>>> I started by generating galaxy tool_config files, e.g. align_seqs.xml,
>>>>  by
>>>> using python to get the script_info[] from the qiime script:
>>>>
>>>> $ cat generate_tool_config.bash
>>>> #!/usr/bin/env bash
>>>> python $1>  ${1%.*}.help
>>>> cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1
>>>> -h>
>>>> ${1%.*}.log
>>>>
>>>> (I'll attach tool_template.txt )
>>>>
>>>> This generated skeleton tool_config .xml files that I could then edit
>>>> as
>>>> needed.
>>>> (
>>>> <a href="http://wiki.g2.bx.psu.edu/**Admin/Tools/Tool%20Config%**20Syntax">http://wiki.g2.bx.psu.edu/**Admin/Tools/Tool%20Config%**20Syntax<http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax>)
>>>>
>>>> I originally was calling all qiime scripts from a tool wrapper:
>>>> qiime_wrapper.py
>>>> But, if a script can be called with any input filepaths and write its
>>>> results to any filepaths, and only writes to STDERR when it fails, then
>>>> you
>>>> could call that script directly.
>>>>
>>>>
>>>> When should you use a tool_wrapper or call the qiime script directly?
>>>>   Many of the qiime scripts could probably be called directly,
>>>> especially if
>>>> it can be called with arbitary input/output file pathnames.
>>>>   The reasons for using a tool wrapper may be if input/output needs to
>>>> be
>>>> manipulated, moved, renamed in order to be used by the qiime script.
>>>>   You'll also need a tool wrapper if the names or number of the output
>>>> files
>>>> can not be determined from the parameter settings.
>>>>   (
>>>> <a href="http://wiki.g2.bx.psu.edu/**Admin/Tools/Multiple%20Output%**20Files">http://wiki.g2.bx.psu.edu/**Admin/Tools/Multiple%20Output%**20Files<http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files>)
>>>>   If your tool relies on a file ext to determine a format, you'll have
>>>> to
>>>> rename the input.
>>>>   ( Galaxy dataset pathnames will look something like:
>>>> /<your_galaxy_file_path>/072/**dataset_72931.dat )
>>>>   The format/type of a dataset is stored in its metadata, so the
>>>> tool_config
>>>> can use that information, especially if a script can take muliple
>>>> alternative input formats.
>>>>   A tool_wrapper can also be used to manage the stdout or stderr from a
>>>> tool.   Galaxy currently interprets any output on stderr as a failure.
>>>>
>>>>
>>>>
>>>> A couple changes in galaxy should make somethings easier than when I
>>>> first
>>>> attempted this:
>>>>   - galaxy now accepts dataset requests with sub directories. (
>>>> https://bitbucket.org/galaxy/**galaxy-central/issue/494/**
>>>> support-sub-dirs-in-extra_**files_path-patch<https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch>
>>>> )
>>>>     That means that output HTML files with links into sub directories
>>>> can be
>>>> left intact, with the html copied to the output dataset and the linked
>>>> files
>>>> to its "extra_files_path".
>>>>   - if you know the pathname of an output relative to the working
>>>> directory,
>>>> galaxy can copy it automatically to the output dataset using the
>>>> from_work_dir attribute.
>>>>     ( see example in:
>>>> https://bitbucket.org/galaxy/**galaxy-central/src/**
>>>> 21b645303c02/tools/ngs_rna/**tophat_wrapper.xml<https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml>
>>>> )
>>>>
>>>> Datatypes
>>>>   You may want to create new datatypes to make it easier for the user
>>>> to
>>>> correctly select inputs to a tool from previous outputs.
>>>>   For example, the qiime mapping file is a tabular file with specific
>>>> requirements.  I put a 'qiimemapping' datatype in
>>>> lib/galaxy/datatypes/**metagenomics.py and datatypes_conf.xml
>>>>   so an input could generate a select list containing only qiimemapping
>>>> datasets rather than all tabular ones.
>>>>
>>>> Generating a configfile
>>>>   You can generate configfiles in the galaxy tool_config .xml file.
>>>> The
>>>> configfile is generated by the Cheetah interpreter just as the
>>>> commandline
>>>> is.
>>>>   see:  alpha_rarefaction.xml
>>>>
>>>> The qiime_wrapper.py was patterned after the mothur_wrapper.py   with
>>>> some
>>>> of the same wrapper params to handle run time determined output
>>>> (perhaps
>>>> not
>>>> needed):
>>>>   --galaxy_datasets
>>>>          a comma separated list of regex:output_dataset the wrapper
>>>> searches
>>>> the working_dir and copies the file that matches the regex to the
>>>> outout
>>>> dataset
>>>>          if the exact pathname is known, use the "from_work_dir"
>>>> attribute
>>>> instead
>>>>   --galaxy_datasetid
>>>>          would be an output dataset id that would be used to
>>>> dynamically
>>>> create additional new datasets at job termination
>>>>          ( <a href="http://wiki.g2.bx.psu.edu/**Admin/Tools/Multiple%20Output%**">http://wiki.g2.bx.psu.edu/**Admin/Tools/Multiple%20Output%**
>>>> 20Files<http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files>
>>>> "Number of Output datasets cannot be determined until tool run")
>>>>   --galaxy_new_datasets
>>>>          a comma separated list of regex:datatype used to dynamically
>>>> create
>>>> additional new datasets at job termination
>>>>   --galaxy_new_files_path
>>>>          the galaxy dir for dynamically generated output datasets
>>>>
>>>>
>>>>
>>>>
>>>> ****************************************************************
>>>> *****************************
>>>>                                 Patrick M. Gillevet, Ph.D.
>>>>                        Director, Microbiome Analysis Center
>>>>     Professor, Department of Environmental Science and Policy
>>>>                Affiliate Professor, School of Systems Biology
>>>>              George Mason University, Prince William Campus
>>>>                     10900 University Boulevard, MSN 4D4
>>>>                              Manassas, Virginia  20110
>>>>
>>>> Office 703-993-1057     Room Occoquan-426     FAX 703-993-8430
>>>>                                       http://mbac.gmu.edu
>>>> ****************************************************************
>>>> ******************************
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>> ______________________________**_____________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>
>>  http://lists.bx.psu.edu/
>>
>>
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Rob Knight
That's great news! Thanks for checking.

Rob

On Feb 7, 2012, at 10:59 AM, Amanda Zuzolo wrote:

> Rob,
>
> I am not aware that there would be any issue, as I've verified all the
> options with the Qiime documentation that is up now (and I've
> eliminated those being deprecated).
>
> Amanda Zuzolo
>
> On 2/7/12, Jeffrey Long <[hidden email]> wrote:
>> Hello Amanda,
>> I was just about to embark on EXACTLY this process, so I would certainly be
>> very interested in saving myself some work.
>> Would there be any issue (that you're aware of, of course) with using QIIME
>> 1.4.0 instead of 1.3?
>>
>> -Jeff
>>
>> On Tue, Feb 7, 2012 at 2:32 AM, Florent Angly
>> <[hidden email]>wrote:
>>
>>> Hi Amanda,
>>> I would certainly be interested in using your helpful QIIME wrappers if
>>> you put them on the Toolshed.
>>> Best,
>>> Florent
>>>
>>> On 06/02/12 06:22, Amanda Zuzolo wrote:
>>>
>>>> Hello, all.
>>>>
>>>> I have been working on getting the Qiime scripts into Galaxy as
>>>> mentioned before, and they are working with Qiime 1.3.0. I have edited
>>>> the wrapper file that Jim Johnson wrote to create more flexibility,
>>>> especially in cases where the tool looks for a specific file type
>>>> extension (for example, a .fna file), or where the tool normally
>>>> outputs something to the command line that is not normally picked up
>>>> in Galaxy.
>>>>
>>>> So far, I have completely finished fixing the XML files to the latest
>>>> documentation for the entire Pick OTU process, Alpha Diversity, and
>>>> Beta Diversity, as well as other miscellaneous functions. Currently, I
>>>> am working on making scripts for jack-knifing functional. I determined
>>>> that it would be easier to get individual scripts functional, rather
>>>> than workflow scripts, since that allows the end-user to have more
>>>> control. Additionally, the workflow scripts can easily be recreated by
>>>> using Galaxy's workflows.
>>>>
>>>> As far as the toolshed goes, I don't believe I know the ins and outs
>>>> yet, but I would be more than willing to learn if people would benefit
>>>> from having these versions in that repository.
>>>>
>>>> 2012/1/29 Jim Johnson<[hidden email]>:
>>>>
>>>>> Pat,
>>>>>
>>>>> That sounds great.   Do one of you want to take ownership of the
>>>>> toolshed
>>>>> repository?
>>>>> At minimum, we should add developers to the list that can push changes.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> JJ
>>>>>
>>>>> On 1/28/12 9:37 AM, Gillevet Patrick wrote:
>>>>>
>>>>> Jim et al
>>>>>
>>>>> Amanda has most of the scripts working now and will be putting them up
>>>>> on
>>>>> the toolshed.
>>>>> She will be in touch as soon as the scripts are validated a couple of
>>>>> times
>>>>> with different datasets.
>>>>>
>>>>> cheers...
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:
>>>>>
>>>>>
>>>>> It is easiest to generate tools for galaxy when the applications or
>>>>> scripts
>>>>> can take arbitrarily named input files and generate output to given
>>>>> path
>>>>> names.
>>>>> Input directories, output directories are very convenient on the
>>>>> command
>>>>> line, but more of a challenge when crafting a galaxy tool.
>>>>> That said, many applications require a wrapper script to work with in
>>>>> galaxy.
>>>>> Thank you for the consistent script_info[] help/usage syntax in the
>>>>> qiime
>>>>> scripts,  which enabled me to generate a skeleton galaxy tool_config
>>>>> file
>>>>> for each qiime script.
>>>>>
>>>>> I had some time last spring to work on integrating qiime into galaxy.
>>>>> Unfortunately, I haven't had any time since to work on this.
>>>>> I put those partial results  on the Galaxy Tool Shed:
>>>>> http://toolshed.g2.bx.psu.edu/
>>>>> There's a continuing effort at George Mason University to incorporate
>>>>> qiime
>>>>> into galaxy tools, so you may want to ask them what they need.
>>>>>
>>>>>
>>>>> I started by generating galaxy tool_config files, e.g. align_seqs.xml,
>>>>> by
>>>>> using python to get the script_info[] from the qiime script:
>>>>>
>>>>> $ cat generate_tool_config.bash
>>>>> #!/usr/bin/env bash
>>>>> python $1>  ${1%.*}.help
>>>>> cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1
>>>>> -h>
>>>>> ${1%.*}.log
>>>>>
>>>>> (I'll attach tool_template.txt )
>>>>>
>>>>> This generated skeleton tool_config .xml files that I could then edit
>>>>> as
>>>>> needed.
>>>>> (
>>>>> <a href="http://wiki.g2.bx.psu.edu/**Admin/Tools/Tool%20Config%**20Syntax">http://wiki.g2.bx.psu.edu/**Admin/Tools/Tool%20Config%**20Syntax<http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax>)
>>>>>
>>>>> I originally was calling all qiime scripts from a tool wrapper:
>>>>> qiime_wrapper.py
>>>>> But, if a script can be called with any input filepaths and write its
>>>>> results to any filepaths, and only writes to STDERR when it fails, then
>>>>> you
>>>>> could call that script directly.
>>>>>
>>>>>
>>>>> When should you use a tool_wrapper or call the qiime script directly?
>>>>>  Many of the qiime scripts could probably be called directly,
>>>>> especially if
>>>>> it can be called with arbitary input/output file pathnames.
>>>>>  The reasons for using a tool wrapper may be if input/output needs to
>>>>> be
>>>>> manipulated, moved, renamed in order to be used by the qiime script.
>>>>>  You'll also need a tool wrapper if the names or number of the output
>>>>> files
>>>>> can not be determined from the parameter settings.
>>>>>  (
>>>>> <a href="http://wiki.g2.bx.psu.edu/**Admin/Tools/Multiple%20Output%**20Files">http://wiki.g2.bx.psu.edu/**Admin/Tools/Multiple%20Output%**20Files<http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files>)
>>>>>  If your tool relies on a file ext to determine a format, you'll have
>>>>> to
>>>>> rename the input.
>>>>>  ( Galaxy dataset pathnames will look something like:
>>>>> /<your_galaxy_file_path>/072/**dataset_72931.dat )
>>>>>  The format/type of a dataset is stored in its metadata, so the
>>>>> tool_config
>>>>> can use that information, especially if a script can take muliple
>>>>> alternative input formats.
>>>>>  A tool_wrapper can also be used to manage the stdout or stderr from a
>>>>> tool.   Galaxy currently interprets any output on stderr as a failure.
>>>>>
>>>>>
>>>>>
>>>>> A couple changes in galaxy should make somethings easier than when I
>>>>> first
>>>>> attempted this:
>>>>>  - galaxy now accepts dataset requests with sub directories. (
>>>>> https://bitbucket.org/galaxy/**galaxy-central/issue/494/**
>>>>> support-sub-dirs-in-extra_**files_path-patch<https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch>
>>>>> )
>>>>>    That means that output HTML files with links into sub directories
>>>>> can be
>>>>> left intact, with the html copied to the output dataset and the linked
>>>>> files
>>>>> to its "extra_files_path".
>>>>>  - if you know the pathname of an output relative to the working
>>>>> directory,
>>>>> galaxy can copy it automatically to the output dataset using the
>>>>> from_work_dir attribute.
>>>>>    ( see example in:
>>>>> https://bitbucket.org/galaxy/**galaxy-central/src/**
>>>>> 21b645303c02/tools/ngs_rna/**tophat_wrapper.xml<https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml>
>>>>> )
>>>>>
>>>>> Datatypes
>>>>>  You may want to create new datatypes to make it easier for the user
>>>>> to
>>>>> correctly select inputs to a tool from previous outputs.
>>>>>  For example, the qiime mapping file is a tabular file with specific
>>>>> requirements.  I put a 'qiimemapping' datatype in
>>>>> lib/galaxy/datatypes/**metagenomics.py and datatypes_conf.xml
>>>>>  so an input could generate a select list containing only qiimemapping
>>>>> datasets rather than all tabular ones.
>>>>>
>>>>> Generating a configfile
>>>>>  You can generate configfiles in the galaxy tool_config .xml file.
>>>>> The
>>>>> configfile is generated by the Cheetah interpreter just as the
>>>>> commandline
>>>>> is.
>>>>>  see:  alpha_rarefaction.xml
>>>>>
>>>>> The qiime_wrapper.py was patterned after the mothur_wrapper.py   with
>>>>> some
>>>>> of the same wrapper params to handle run time determined output
>>>>> (perhaps
>>>>> not
>>>>> needed):
>>>>>  --galaxy_datasets
>>>>>         a comma separated list of regex:output_dataset the wrapper
>>>>> searches
>>>>> the working_dir and copies the file that matches the regex to the
>>>>> outout
>>>>> dataset
>>>>>         if the exact pathname is known, use the "from_work_dir"
>>>>> attribute
>>>>> instead
>>>>>  --galaxy_datasetid
>>>>>         would be an output dataset id that would be used to
>>>>> dynamically
>>>>> create additional new datasets at job termination
>>>>>         ( <a href="http://wiki.g2.bx.psu.edu/**Admin/Tools/Multiple%20Output%**">http://wiki.g2.bx.psu.edu/**Admin/Tools/Multiple%20Output%**
>>>>> 20Files<http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files>
>>>>> "Number of Output datasets cannot be determined until tool run")
>>>>>  --galaxy_new_datasets
>>>>>         a comma separated list of regex:datatype used to dynamically
>>>>> create
>>>>> additional new datasets at job termination
>>>>>  --galaxy_new_files_path
>>>>>         the galaxy dir for dynamically generated output datasets
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ****************************************************************
>>>>> *****************************
>>>>>                                Patrick M. Gillevet, Ph.D.
>>>>>                       Director, Microbiome Analysis Center
>>>>>    Professor, Department of Environmental Science and Policy
>>>>>               Affiliate Professor, School of Systems Biology
>>>>>             George Mason University, Prince William Campus
>>>>>                    10900 University Boulevard, MSN 4D4
>>>>>                             Manassas, Virginia  20110
>>>>>
>>>>> Office 703-993-1057     Room Occoquan-426     FAX 703-993-8430
>>>>>                                      http://mbac.gmu.edu
>>>>> ****************************************************************
>>>>> ******************************
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>> ______________________________**_____________________________
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>>
>>> http://lists.bx.psu.edu/
>>>
>>>
>>


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Existing efforts to convert the QIIME pipeline to Galaxy?

Gillevet Patrick
In reply to this post by Daniel McDonald
Skype ID : patrickgillevet


On Feb 7, 2012, at 12:33 PM, Daniel McDonald wrote:

> I'll be there
> Daniel
>
>
>
> On Feb 7, 2012, at 10:30, Greg Caporaso <[hidden email]> wrote:
>
>> OK, let's plan on a Skype call at 1pm MT/3pm ET this Thursday (9 Feb
>> 2012). I will initiate the call - my Skype ID is gregcaporaso. Please
>> let me know if you'd like to join the call, and send my your skype id.
>>
>> Looking forward to talking about this!
>>
>> Greg
>>
>> 2012/2/7 Rob Knight <[hidden email]>:
>>> I can't make the call at that time (am in Dhaka) but am very enthusiastic
>>> about that effort; please keep me in the loop. I am cc:ing a couple of the
>>> people in my lab who also indicated interest in the qiime/galaxy integration
>>> effort (though Antonio won't be able to make it either, for the same
>>> reason). Thanks!
>>>
>>> Rob
>>>
>>> On Feb 7, 2012, at 8:48 AM, Jeffrey Long wrote:
>>>
>>> Hello Amanda,
>>> I was just about to embark on EXACTLY this process, so I would certainly be
>>> very interested in saving myself some work.
>>> Would there be any issue (that you're aware of, of course) with using QIIME
>>> 1.4.0 instead of 1.3?
>>>
>>> -Jeff
>>>
>>> On Tue, Feb 7, 2012 at 2:32 AM, Florent Angly <[hidden email]>
>>> wrote:
>>>>
>>>> Hi Amanda,
>>>> I would certainly be interested in using your helpful QIIME wrappers if
>>>> you put them on the Toolshed.
>>>> Best,
>>>> Florent
>>>>
>>>> On 06/02/12 06:22, Amanda Zuzolo wrote:
>>>>>
>>>>> Hello, all.
>>>>>
>>>>> I have been working on getting the Qiime scripts into Galaxy as
>>>>> mentioned before, and they are working with Qiime 1.3.0. I have edited
>>>>> the wrapper file that Jim Johnson wrote to create more flexibility,
>>>>> especially in cases where the tool looks for a specific file type
>>>>> extension (for example, a .fna file), or where the tool normally
>>>>> outputs something to the command line that is not normally picked up
>>>>> in Galaxy.
>>>>>
>>>>> So far, I have completely finished fixing the XML files to the latest
>>>>> documentation for the entire Pick OTU process, Alpha Diversity, and
>>>>> Beta Diversity, as well as other miscellaneous functions. Currently, I
>>>>> am working on making scripts for jack-knifing functional. I determined
>>>>> that it would be easier to get individual scripts functional, rather
>>>>> than workflow scripts, since that allows the end-user to have more
>>>>> control. Additionally, the workflow scripts can easily be recreated by
>>>>> using Galaxy's workflows.
>>>>>
>>>>> As far as the toolshed goes, I don't believe I know the ins and outs
>>>>> yet, but I would be more than willing to learn if people would benefit
>>>>> from having these versions in that repository.
>>>>>
>>>>> 2012/1/29 Jim Johnson<[hidden email]>:
>>>>>>
>>>>>> Pat,
>>>>>>
>>>>>> That sounds great.   Do one of you want to take ownership of the
>>>>>> toolshed
>>>>>> repository?
>>>>>> At minimum, we should add developers to the list that can push changes.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> JJ
>>>>>>
>>>>>> On 1/28/12 9:37 AM, Gillevet Patrick wrote:
>>>>>>
>>>>>> Jim et al
>>>>>>
>>>>>> Amanda has most of the scripts working now and will be putting them up
>>>>>> on
>>>>>> the toolshed.
>>>>>> She will be in touch as soon as the scripts are validated a couple of
>>>>>> times
>>>>>> with different datasets.
>>>>>>
>>>>>> cheers...
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:
>>>>>>
>>>>>>
>>>>>> It is easiest to generate tools for galaxy when the applications or
>>>>>> scripts
>>>>>> can take arbitrarily named input files and generate output to given path
>>>>>> names.
>>>>>> Input directories, output directories are very convenient on the command
>>>>>> line, but more of a challenge when crafting a galaxy tool.
>>>>>> That said, many applications require a wrapper script to work with in
>>>>>> galaxy.
>>>>>> Thank you for the consistent script_info[] help/usage syntax in the
>>>>>> qiime
>>>>>> scripts,  which enabled me to generate a skeleton galaxy tool_config
>>>>>> file
>>>>>> for each qiime script.
>>>>>>
>>>>>> I had some time last spring to work on integrating qiime into galaxy.
>>>>>> Unfortunately, I haven't had any time since to work on this.
>>>>>> I put those partial results  on the Galaxy Tool Shed:
>>>>>> http://toolshed.g2.bx.psu.edu/
>>>>>> There's a continuing effort at George Mason University to incorporate
>>>>>> qiime
>>>>>> into galaxy tools, so you may want to ask them what they need.
>>>>>>
>>>>>>
>>>>>> I started by generating galaxy tool_config files, e.g. align_seqs.xml,
>>>>>> by
>>>>>> using python to get the script_info[] from the qiime script:
>>>>>>
>>>>>> $ cat generate_tool_config.bash
>>>>>> #!/usr/bin/env bash
>>>>>> python $1>  ${1%.*}.help
>>>>>> cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h>
>>>>>> ${1%.*}.log
>>>>>>
>>>>>> (I'll attach tool_template.txt )
>>>>>>
>>>>>> This generated skeleton tool_config .xml files that I could then edit as
>>>>>> needed.
>>>>>> ( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )
>>>>>>
>>>>>> I originally was calling all qiime scripts from a tool wrapper:
>>>>>> qiime_wrapper.py
>>>>>> But, if a script can be called with any input filepaths and write its
>>>>>> results to any filepaths, and only writes to STDERR when it fails, then
>>>>>> you
>>>>>> could call that script directly.
>>>>>>
>>>>>>
>>>>>> When should you use a tool_wrapper or call the qiime script directly?
>>>>>>  Many of the qiime scripts could probably be called directly,
>>>>>> especially if
>>>>>> it can be called with arbitary input/output file pathnames.
>>>>>>  The reasons for using a tool wrapper may be if input/output needs to
>>>>>> be
>>>>>> manipulated, moved, renamed in order to be used by the qiime script.
>>>>>>  You'll also need a tool wrapper if the names or number of the output
>>>>>> files
>>>>>> can not be determined from the parameter settings.
>>>>>>  ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
>>>>>>  If your tool relies on a file ext to determine a format, you'll have
>>>>>> to
>>>>>> rename the input.
>>>>>>  ( Galaxy dataset pathnames will look something like:
>>>>>> /<your_galaxy_file_path>/072/dataset_72931.dat )
>>>>>>  The format/type of a dataset is stored in its metadata, so the
>>>>>> tool_config
>>>>>> can use that information, especially if a script can take muliple
>>>>>> alternative input formats.
>>>>>>  A tool_wrapper can also be used to manage the stdout or stderr from a
>>>>>> tool.   Galaxy currently interprets any output on stderr as a failure.
>>>>>>
>>>>>>
>>>>>>
>>>>>> A couple changes in galaxy should make somethings easier than when I
>>>>>> first
>>>>>> attempted this:
>>>>>>  - galaxy now accepts dataset requests with sub directories. (
>>>>>>
>>>>>> https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch
>>>>>> )
>>>>>>    That means that output HTML files with links into sub directories
>>>>>> can be
>>>>>> left intact, with the html copied to the output dataset and the linked
>>>>>> files
>>>>>> to its "extra_files_path".
>>>>>>  - if you know the pathname of an output relative to the working
>>>>>> directory,
>>>>>> galaxy can copy it automatically to the output dataset using the
>>>>>> from_work_dir attribute.
>>>>>>    ( see example in:
>>>>>>
>>>>>> https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml
>>>>>> )
>>>>>>
>>>>>> Datatypes
>>>>>>  You may want to create new datatypes to make it easier for the user to
>>>>>> correctly select inputs to a tool from previous outputs.
>>>>>>  For example, the qiime mapping file is a tabular file with specific
>>>>>> requirements.  I put a 'qiimemapping' datatype in
>>>>>> lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
>>>>>>  so an input could generate a select list containing only qiimemapping
>>>>>> datasets rather than all tabular ones.
>>>>>>
>>>>>> Generating a configfile
>>>>>>  You can generate configfiles in the galaxy tool_config .xml file.
>>>>>> The
>>>>>> configfile is generated by the Cheetah interpreter just as the
>>>>>> commandline
>>>>>> is.
>>>>>>  see:  alpha_rarefaction.xml
>>>>>>
>>>>>> The qiime_wrapper.py was patterned after the mothur_wrapper.py   with
>>>>>> some
>>>>>> of the same wrapper params to handle run time determined output (perhaps
>>>>>> not
>>>>>> needed):
>>>>>>  --galaxy_datasets
>>>>>>         a comma separated list of regex:output_dataset the wrapper
>>>>>> searches
>>>>>> the working_dir and copies the file that matches the regex to the outout
>>>>>> dataset
>>>>>>         if the exact pathname is known, use the "from_work_dir"
>>>>>> attribute
>>>>>> instead
>>>>>>  --galaxy_datasetid
>>>>>>         would be an output dataset id that would be used to dynamically
>>>>>> create additional new datasets at job termination
>>>>>>         (
>>>>>> http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files
>>>>>> "Number of Output datasets cannot be determined until tool run")
>>>>>>  --galaxy_new_datasets
>>>>>>         a comma separated list of regex:datatype used to dynamically
>>>>>> create
>>>>>> additional new datasets at job termination
>>>>>>  --galaxy_new_files_path
>>>>>>         the galaxy dir for dynamically generated output datasets
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *****************************************************************************************
>>>>>>                                Patrick M. Gillevet, Ph.D.
>>>>>>                       Director, Microbiome Analysis Center
>>>>>>    Professor, Department of Environmental Science and Policy
>>>>>>               Affiliate Professor, School of Systems Biology
>>>>>>             George Mason University, Prince William Campus
>>>>>>                    10900 University Boulevard, MSN 4D4
>>>>>>                             Manassas, Virginia  20110
>>>>>>
>>>>>> Office 703-993-1057     Room Occoquan-426     FAX 703-993-8430
>>>>>>                                      http://mbac.gmu.edu
>>>>>>
>>>>>> ******************************************************************************************
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> ___________________________________________________________
>>>> Please keep all replies on the list by using "reply all"
>>>> in your mail client.  To manage your subscriptions to this
>>>> and other Galaxy lists, please use the interface at:
>>>>
>>>> http://lists.bx.psu.edu/
>>>>
>>>
>>>

*****************************************************************************************
                                Patrick M. Gillevet, Ph.D.
                       Director, Microbiome Analysis Center
    Professor, Department of Environmental Science and Policy
               Affiliate Professor, School of Systems Biology
             George Mason University, Prince William Campus
                    10900 University Boulevard, MSN 4D4
                             Manassas, Virginia  20110

Office 703-993-1057     Room Occoquan-426     FAX 703-993-8430
                                      http://mbac.gmu.edu
******************************************************************************************















___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/