Re: Creating multiple datasets in a libset

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Creating multiple datasets in a libset

burdettn
Hi Guys,
         Did you manage to get multiple datasets working? I can't seem to upload multiple files. Only the last file appears in the history. I changed my code as mentioned in the thread below in "example_watch_folder.py" to add multiple files separated by a new line and increased the sleep time:

for fname in os.listdir(in_folder):
            fullpath = os.path.join(in_folder, fname)
            print ' fullpath is [%s] ' % fullpath
            if os.path.isfile(fullpath):
                data = {}
                data['folder_id'] = library_folder_id
                data['file_type'] = 'auto'
                data['dbkey'] = ''
                data['upload_option'] = 'upload_paths'
                data['filesystem_paths'] = "/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz"
                print ' data is [%s] ' % str(data['filesystem_paths'])
                data['create_type'] = 'file'
                libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)
                #TODO Handle this better, but the datatype isn't always
                # set for the followup workflow execution without this
                # pause.
                time.sleep(65)

However, I get the following crash:

./example_watch_folder.py 64f3209856a3cf4f2d034a1ad5bf851c http://barium-rbh/csiro/api/ /home/galaxy/galaxy-drop/input /home/galaxy/galaxy-drop/output "This One" f2db41e1fa331b3e

 fullpath is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
 data is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz
 /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
url is : http://barium-rbh/csiro/api/libraries/33b43b4e7093c91f/contents?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'file_type': 'auto', 'dbkey': '', 'create_type': 'file', 'folder_id': 'F33b43b4e7093c91f', 'upload_option': 'upload_paths', 'filesystem_paths': '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': 'ff5476bcf6c921fa'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['daecbdd824e1c349', '358eb58cd5463e0d', 'c0279aab05812500'], 'history': '3cc0effd29705aa3'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': '79966582feb6c081'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['19c51286b777bc04', '0f71f1fc170d4ab9', '256444f6e7017e58'], 'history': 'b701da857886499b'}
Traceback (most recent call last):
  File "./example_watch_folder.py", line 89, in <module>
    main(api_key, api_url, in_folder, out_folder, data_library, workflow )
  File "./example_watch_folder.py", line 75, in main
    shutil.move(fullpath, os.path.join(out_folder, fname))
  File "/usr/lib/python2.7/shutil.py", line 299, in move
    copy2(src, real_dst)
  File "/usr/lib/python2.7/shutil.py", line 128, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'

It says there is no such file, but this file has already been copied from the input to the output directory. Any help much appreciated

Neil

------------------------------

Message: 2
Date: Mon, 29 Apr 2013 16:11:39 -0400
From: Rob Leclerc <[hidden email]>
To: Dannon Baker <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Creating multiple datasets in a libset
Message-ID:
        <CAGkd85fHSgO2YC1T+Frctyso9G5rfQb=_mLyHGSdxPM+s3=[hidden email]>
Content-Type: text/plain; charset="iso-8859-1"

Hi Dannon,

I've written some code to (i) query a dataset to ensure that it's been
uploaded after a submit and (ii) to ensure a resulting dataset has been
written to the file.

*#Block until all datasets have been uploaded*
libset = submit(api_key, api_url + "libraries/%s/contents" % library_id,
data, return_formatted = False)
for ds in libset:
    while True:
        uploaded_file = display(api_key, api_url +
'libraries/%s/contents/%s' %(library_id, ds['id']), return_formatted=False)
        if uploaded_file['misc_info'] == None:
            time.sleep(1)
        else:
            break

*#Block until all result datasets have been saved to the filesystem*
result_ds_url = api_url + 'histories/' + history_id + '/contents/' +
dsh['id'];
while True:
    result_ds = display(api_key, result_ds_url, return_formatted=False)
        if result_ds["state"] == 'ok':
            break
        else:
            time.sleep(1)


Rob Leclerc, PhD
<http://www.linkedin.com/in/robleclerc> <https://twitter.com/#!/robleclerc>
P: (US) +1-(917)-873-3037
P: (Shanghai) +86-1-(861)-612-5469
Personal Email: [hidden email]


On Mon, Apr 29, 2013 at 11:18 AM, Dannon Baker <[hidden email]>wrote:

> Yep, that example filesystem_paths you suggest should work fine.  The
> sleep() bit was a complete hack from the start, for simplicity in
> demonstrating a very basic pipeline, but what you probably want to do for a
> real implementation is query the dataset in question via the API, verify
> that the datatype/etc have been set, and only after that execute the
> workflow; instead of relying on sleep.
>
>
> On Mon, Apr 29, 2013 at 9:24 AM, Rob Leclerc <[hidden email]>wrote:
>
>> Hi Dannon,
>>
>> Thanks for the response. Sorry to be pedantic, but just to make sure that
>> I understand the interpretation of this field on the other side of the API,
>> I would need to have something like the following:
>>
>> data['filesystem_paths'] = "/home/me/file1.vcf \n /home/me/file2.vcf /n
>> /home/me/file3.vcf"
>>
>> I assume I should also increase the time.sleep() to reflect the uploading
>> of extra files?
>>
>> Cheers,
>>
>> Rob
>>
>> Rob Leclerc, PhD
>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>> P: (US) +1-(917)-873-3037
>> P: (Shanghai) +86-1-(861)-612-5469
>> Personal Email: [hidden email]
>>
>>
>> On Mon, Apr 29, 2013 at 9:15 AM, Dannon Baker <[hidden email]>wrote:
>>
>>> Hey Rob,
>>>
>>> That example_watch_folder.py does just submit exactly one at a time,
>>> executes the workflow, and then does the next all in separate transactions.
>>>  If you wanted to upload multiple filepaths at once, you'd just append more
>>> to the ''filesystem_paths' field (newline separated paths).
>>>
>>> -Dannon
>>>
>>>
>>> On Fri, Apr 26, 2013 at 11:54 PM, Rob Leclerc <[hidden email]>wrote:
>>>
>>>> I'm looking at example_watch_folder.py and it's not clear from the
>>>> example how you submit multiple datasets to a library. In the example, the
>>>> first submit returns a libset [] with only a single entry and then proceeds
>>>> to iterate through each dataset in the libset in the following section:
>>>>
>>>> data = {}
>>>>
>>>>    data['folder_id'] = library_folder_id
>>>>
>>>>    data['file_type'] = 'auto'
>>>>
>>>>    data['dbkey'] = ''
>>>>
>>>>    data['upload_option'] = 'upload_paths'
>>>>
>>>>
>>>>
>>>> *data['filesystem_paths'] = fullpath*
>>>>
>>>>    data['create_type'] = 'file'
>>>>
>>>>    libset = submit(api_key, api_url + "libraries/%s/contents" %
>>>> library_id, data, return_formatted = False)
>>>>
>>>>    time.sleep(5)
>>>>
>>>>    for ds in libset:
>>>>
>>>>        if 'id' in ds:
>>>>
>>>>                         wf_data = {}
>>>>
>>>>                         wf_data['workflow_id'] = workflow['id']
>>>>
>>>>                         wf_data['history'] = "%s - %s" % (fname,
>>>> workflow['name'])
>>>>
>>>>                         wf_data['ds_map'] = {}
>>>>
>>>>                         for step_id, ds_in in workflow['inputs'
>>>> ].iteritems():
>>>>
>>>>                             wf_data['ds_map'][step_id] = {'src':'ld',
>>>> 'id':ds['id']}
>>>>
>>>>                         res = submit( api_key, api_url + 'workflows',
>>>> wf_data, return_formatted=False)
>>>>
>>>>
>>>>
>>>> Rob Leclerc, PhD
>>>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>>>> P: (US) +1-(917)-873-3037
>>>> P: (Shanghai) +86-1-(861)-612-5469
>>>> Personal Email: [hidden email]
>>>>
>>>> ___________________________________________________________
>>>> Please keep all replies on the list by using "reply all"
>>>> in your mail client.  To manage your subscriptions to this
>>>> and other Galaxy lists, please use the interface at:
>>>>   http://lists.bx.psu.edu/
>>>>
>>>> To search Galaxy mailing lists use the unified search at:
>>>>   http://galaxyproject.org/search/mailinglists/
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20130429/383c60a5/attachment-0001.html>



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Creating multiple datasets in a libset

burdettn
Further, when I look in the ~galaxy-dist/database/files/000 I can see both files have been uploaded but only the second file has a history associated to it

Thanks
Neil

-----Original Message-----
From: Burdett, Neil (ICT Centre, Herston - RBWH)
Sent: Wednesday, 29 May 2013 2:45 PM
To: [hidden email]; 'Dannon Baker'; '[hidden email]'
Subject: Re: Creating multiple datasets in a libset

Hi Guys,
         Did you manage to get multiple datasets working? I can't seem to upload multiple files. Only the last file appears in the history. I changed my code as mentioned in the thread below in "example_watch_folder.py" to add multiple files separated by a new line and increased the sleep time:

for fname in os.listdir(in_folder):
            fullpath = os.path.join(in_folder, fname)
            print ' fullpath is [%s] ' % fullpath
            if os.path.isfile(fullpath):
                data = {}
                data['folder_id'] = library_folder_id
                data['file_type'] = 'auto'
                data['dbkey'] = ''
                data['upload_option'] = 'upload_paths'
                data['filesystem_paths'] = "/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz"
                print ' data is [%s] ' % str(data['filesystem_paths'])
                data['create_type'] = 'file'
                libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)
                #TODO Handle this better, but the datatype isn't always
                # set for the followup workflow execution without this
                # pause.
                time.sleep(65)

However, I get the following crash:

./example_watch_folder.py 64f3209856a3cf4f2d034a1ad5bf851c http://barium-rbh/csiro/api/ /home/galaxy/galaxy-drop/input /home/galaxy/galaxy-drop/output "This One" f2db41e1fa331b3e

 fullpath is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
 data is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz
 /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
url is : http://barium-rbh/csiro/api/libraries/33b43b4e7093c91f/contents?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'file_type': 'auto', 'dbkey': '', 'create_type': 'file', 'folder_id': 'F33b43b4e7093c91f', 'upload_option': 'upload_paths', 'filesystem_paths': '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': 'ff5476bcf6c921fa'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['daecbdd824e1c349', '358eb58cd5463e0d', 'c0279aab05812500'], 'history': '3cc0effd29705aa3'} url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': '79966582feb6c081'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['19c51286b777bc04', '0f71f1fc170d4ab9', '256444f6e7017e58'], 'history': 'b701da857886499b'} Traceback (most recent call last):
  File "./example_watch_folder.py", line 89, in <module>
    main(api_key, api_url, in_folder, out_folder, data_library, workflow )
  File "./example_watch_folder.py", line 75, in main
    shutil.move(fullpath, os.path.join(out_folder, fname))
  File "/usr/lib/python2.7/shutil.py", line 299, in move
    copy2(src, real_dst)
  File "/usr/lib/python2.7/shutil.py", line 128, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'

It says there is no such file, but this file has already been copied from the input to the output directory. Any help much appreciated

Neil

------------------------------

Message: 2
Date: Mon, 29 Apr 2013 16:11:39 -0400
From: Rob Leclerc <[hidden email]>
To: Dannon Baker <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Creating multiple datasets in a libset
Message-ID:
        <CAGkd85fHSgO2YC1T+Frctyso9G5rfQb=_mLyHGSdxPM+s3=[hidden email]>
Content-Type: text/plain; charset="iso-8859-1"

Hi Dannon,

I've written some code to (i) query a dataset to ensure that it's been uploaded after a submit and (ii) to ensure a resulting dataset has been written to the file.

*#Block until all datasets have been uploaded* libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False) for ds in libset:
    while True:
        uploaded_file = display(api_key, api_url + 'libraries/%s/contents/%s' %(library_id, ds['id']), return_formatted=False)
        if uploaded_file['misc_info'] == None:
            time.sleep(1)
        else:
            break

*#Block until all result datasets have been saved to the filesystem* result_ds_url = api_url + 'histories/' + history_id + '/contents/' + dsh['id']; while True:
    result_ds = display(api_key, result_ds_url, return_formatted=False)
        if result_ds["state"] == 'ok':
            break
        else:
            time.sleep(1)


Rob Leclerc, PhD
<http://www.linkedin.com/in/robleclerc> <https://twitter.com/#!/robleclerc>
P: (US) +1-(917)-873-3037
P: (Shanghai) +86-1-(861)-612-5469
Personal Email: [hidden email]


On Mon, Apr 29, 2013 at 11:18 AM, Dannon Baker <[hidden email]>wrote:

> Yep, that example filesystem_paths you suggest should work fine.  The
> sleep() bit was a complete hack from the start, for simplicity in
> demonstrating a very basic pipeline, but what you probably want to do
> for a real implementation is query the dataset in question via the
> API, verify that the datatype/etc have been set, and only after that
> execute the workflow; instead of relying on sleep.
>
>
> On Mon, Apr 29, 2013 at 9:24 AM, Rob Leclerc <[hidden email]>wrote:
>
>> Hi Dannon,
>>
>> Thanks for the response. Sorry to be pedantic, but just to make sure
>> that I understand the interpretation of this field on the other side
>> of the API, I would need to have something like the following:
>>
>> data['filesystem_paths'] = "/home/me/file1.vcf \n /home/me/file2.vcf
>> /n /home/me/file3.vcf"
>>
>> I assume I should also increase the time.sleep() to reflect the
>> uploading of extra files?
>>
>> Cheers,
>>
>> Rob
>>
>> Rob Leclerc, PhD
>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/roblec
>> lerc>
>> P: (US) +1-(917)-873-3037
>> P: (Shanghai) +86-1-(861)-612-5469
>> Personal Email: [hidden email]
>>
>>
>> On Mon, Apr 29, 2013 at 9:15 AM, Dannon Baker <[hidden email]>wrote:
>>
>>> Hey Rob,
>>>
>>> That example_watch_folder.py does just submit exactly one at a time,
>>> executes the workflow, and then does the next all in separate transactions.
>>>  If you wanted to upload multiple filepaths at once, you'd just
>>> append more to the ''filesystem_paths' field (newline separated paths).
>>>
>>> -Dannon
>>>
>>>
>>> On Fri, Apr 26, 2013 at 11:54 PM, Rob Leclerc <[hidden email]>wrote:
>>>
>>>> I'm looking at example_watch_folder.py and it's not clear from the
>>>> example how you submit multiple datasets to a library. In the
>>>> example, the first submit returns a libset [] with only a single
>>>> entry and then proceeds to iterate through each dataset in the libset in the following section:
>>>>
>>>> data = {}
>>>>
>>>>    data['folder_id'] = library_folder_id
>>>>
>>>>    data['file_type'] = 'auto'
>>>>
>>>>    data['dbkey'] = ''
>>>>
>>>>    data['upload_option'] = 'upload_paths'
>>>>
>>>>
>>>>
>>>> *data['filesystem_paths'] = fullpath*
>>>>
>>>>    data['create_type'] = 'file'
>>>>
>>>>    libset = submit(api_key, api_url + "libraries/%s/contents" %
>>>> library_id, data, return_formatted = False)
>>>>
>>>>    time.sleep(5)
>>>>
>>>>    for ds in libset:
>>>>
>>>>        if 'id' in ds:
>>>>
>>>>                         wf_data = {}
>>>>
>>>>                         wf_data['workflow_id'] = workflow['id']
>>>>
>>>>                         wf_data['history'] = "%s - %s" % (fname,
>>>> workflow['name'])
>>>>
>>>>                         wf_data['ds_map'] = {}
>>>>
>>>>                         for step_id, ds_in in workflow['inputs'
>>>> ].iteritems():
>>>>
>>>>                             wf_data['ds_map'][step_id] =
>>>> {'src':'ld', 'id':ds['id']}
>>>>
>>>>                         res = submit( api_key, api_url +
>>>> 'workflows', wf_data, return_formatted=False)
>>>>
>>>>
>>>>
>>>> Rob Leclerc, PhD
>>>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robl
>>>> eclerc>
>>>> P: (US) +1-(917)-873-3037
>>>> P: (Shanghai) +86-1-(861)-612-5469
>>>> Personal Email: [hidden email]
>>>>
>>>> ___________________________________________________________
>>>> Please keep all replies on the list by using "reply all"
>>>> in your mail client.  To manage your subscriptions to this and
>>>> other Galaxy lists, please use the interface at:
>>>>   http://lists.bx.psu.edu/
>>>>
>>>> To search Galaxy mailing lists use the unified search at:
>>>>   http://galaxyproject.org/search/mailinglists/
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20130429/383c60a5/attachment-0001.html>



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Creating multiple datasets in a libset

Dannon Baker-2
Has the script created multiple histories?  That's the intent, that each file+workflow execution has a separate history.

And, for why that file vanished resulting in the exception -- is it possible you ran a second instance of the script or moved the file?

-Dannon


On Wed, May 29, 2013 at 1:07 AM, <[hidden email]> wrote:
Further, when I look in the ~galaxy-dist/database/files/000 I can see both files have been uploaded but only the second file has a history associated to it

Thanks
Neil

-----Original Message-----
From: Burdett, Neil (ICT Centre, Herston - RBWH)
Sent: Wednesday, 29 May 2013 2:45 PM
To: [hidden email]; 'Dannon Baker'; '[hidden email]'
Subject: Re: Creating multiple datasets in a libset

Hi Guys,
         Did you manage to get multiple datasets working? I can't seem to upload multiple files. Only the last file appears in the history. I changed my code as mentioned in the thread below in "example_watch_folder.py" to add multiple files separated by a new line and increased the sleep time:

for fname in os.listdir(in_folder):
            fullpath = os.path.join(in_folder, fname)
            print ' fullpath is [%s] ' % fullpath
            if os.path.isfile(fullpath):
                data = {}
                data['folder_id'] = library_folder_id
                data['file_type'] = 'auto'
                data['dbkey'] = ''
                data['upload_option'] = 'upload_paths'
                data['filesystem_paths'] = "/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz"
                print ' data is [%s] ' % str(data['filesystem_paths'])
                data['create_type'] = 'file'
                libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)
                #TODO Handle this better, but the datatype isn't always
                # set for the followup workflow execution without this
                # pause.
                time.sleep(65)

However, I get the following crash:

./example_watch_folder.py 64f3209856a3cf4f2d034a1ad5bf851c http://barium-rbh/csiro/api/ /home/galaxy/galaxy-drop/input /home/galaxy/galaxy-drop/output "This One" f2db41e1fa331b3e

 fullpath is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
 data is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz
 /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
url is : http://barium-rbh/csiro/api/libraries/33b43b4e7093c91f/contents?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'file_type': 'auto', 'dbkey': '', 'create_type': 'file', 'folder_id': 'F33b43b4e7093c91f', 'upload_option': 'upload_paths', 'filesystem_paths': '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': 'ff5476bcf6c921fa'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['daecbdd824e1c349', '358eb58cd5463e0d', 'c0279aab05812500'], 'history': '3cc0effd29705aa3'} url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': '79966582feb6c081'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['19c51286b777bc04', '0f71f1fc170d4ab9', '256444f6e7017e58'], 'history': 'b701da857886499b'} Traceback (most recent call last):
  File "./example_watch_folder.py", line 89, in <module>
    main(api_key, api_url, in_folder, out_folder, data_library, workflow )
  File "./example_watch_folder.py", line 75, in main
    shutil.move(fullpath, os.path.join(out_folder, fname))
  File "/usr/lib/python2.7/shutil.py", line 299, in move
    copy2(src, real_dst)
  File "/usr/lib/python2.7/shutil.py", line 128, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'

It says there is no such file, but this file has already been copied from the input to the output directory. Any help much appreciated

Neil

------------------------------

Message: 2
Date: Mon, 29 Apr 2013 16:11:39 -0400
From: Rob Leclerc <[hidden email]>
To: Dannon Baker <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Creating multiple datasets in a libset
Message-ID:
        <CAGkd85fHSgO2YC1T+Frctyso9G5rfQb=_mLyHGSdxPM+s3=[hidden email]>
Content-Type: text/plain; charset="iso-8859-1"

Hi Dannon,

I've written some code to (i) query a dataset to ensure that it's been uploaded after a submit and (ii) to ensure a resulting dataset has been written to the file.

*#Block until all datasets have been uploaded* libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False) for ds in libset:
    while True:
        uploaded_file = display(api_key, api_url + 'libraries/%s/contents/%s' %(library_id, ds['id']), return_formatted=False)
        if uploaded_file['misc_info'] == None:
            time.sleep(1)
        else:
            break

*#Block until all result datasets have been saved to the filesystem* result_ds_url = api_url + 'histories/' + history_id + '/contents/' + dsh['id']; while True:
    result_ds = display(api_key, result_ds_url, return_formatted=False)
        if result_ds["state"] == 'ok':
            break
        else:
            time.sleep(1)


Rob Leclerc, PhD
<http://www.linkedin.com/in/robleclerc> <https://twitter.com/#!/robleclerc>
P: (US) +1-(917)-873-3037
P: (Shanghai) +86-1-(861)-612-5469
Personal Email: [hidden email]


On Mon, Apr 29, 2013 at 11:18 AM, Dannon Baker <[hidden email]>wrote:

> Yep, that example filesystem_paths you suggest should work fine.  The
> sleep() bit was a complete hack from the start, for simplicity in
> demonstrating a very basic pipeline, but what you probably want to do
> for a real implementation is query the dataset in question via the
> API, verify that the datatype/etc have been set, and only after that
> execute the workflow; instead of relying on sleep.
>
>
> On Mon, Apr 29, 2013 at 9:24 AM, Rob Leclerc <[hidden email]>wrote:
>
>> Hi Dannon,
>>
>> Thanks for the response. Sorry to be pedantic, but just to make sure
>> that I understand the interpretation of this field on the other side
>> of the API, I would need to have something like the following:
>>
>> data['filesystem_paths'] = "/home/me/file1.vcf \n /home/me/file2.vcf
>> /n /home/me/file3.vcf"
>>
>> I assume I should also increase the time.sleep() to reflect the
>> uploading of extra files?
>>
>> Cheers,
>>
>> Rob
>>
>> Rob Leclerc, PhD
>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/roblec
>> lerc>
>> P: (US) +1-(917)-873-3037
>> P: (Shanghai) +86-1-(861)-612-5469
>> Personal Email: [hidden email]
>>
>>
>> On Mon, Apr 29, 2013 at 9:15 AM, Dannon Baker <[hidden email]>wrote:
>>
>>> Hey Rob,
>>>
>>> That example_watch_folder.py does just submit exactly one at a time,
>>> executes the workflow, and then does the next all in separate transactions.
>>>  If you wanted to upload multiple filepaths at once, you'd just
>>> append more to the ''filesystem_paths' field (newline separated paths).
>>>
>>> -Dannon
>>>
>>>
>>> On Fri, Apr 26, 2013 at 11:54 PM, Rob Leclerc <[hidden email]>wrote:
>>>
>>>> I'm looking at example_watch_folder.py and it's not clear from the
>>>> example how you submit multiple datasets to a library. In the
>>>> example, the first submit returns a libset [] with only a single
>>>> entry and then proceeds to iterate through each dataset in the libset in the following section:
>>>>
>>>> data = {}
>>>>
>>>>    data['folder_id'] = library_folder_id
>>>>
>>>>    data['file_type'] = 'auto'
>>>>
>>>>    data['dbkey'] = ''
>>>>
>>>>    data['upload_option'] = 'upload_paths'
>>>>
>>>>
>>>>
>>>> *data['filesystem_paths'] = fullpath*
>>>>
>>>>    data['create_type'] = 'file'
>>>>
>>>>    libset = submit(api_key, api_url + "libraries/%s/contents" %
>>>> library_id, data, return_formatted = False)
>>>>
>>>>    time.sleep(5)
>>>>
>>>>    for ds in libset:
>>>>
>>>>        if 'id' in ds:
>>>>
>>>>                         wf_data = {}
>>>>
>>>>                         wf_data['workflow_id'] = workflow['id']
>>>>
>>>>                         wf_data['history'] = "%s - %s" % (fname,
>>>> workflow['name'])
>>>>
>>>>                         wf_data['ds_map'] = {}
>>>>
>>>>                         for step_id, ds_in in workflow['inputs'
>>>> ].iteritems():
>>>>
>>>>                             wf_data['ds_map'][step_id] =
>>>> {'src':'ld', 'id':ds['id']}
>>>>
>>>>                         res = submit( api_key, api_url +
>>>> 'workflows', wf_data, return_formatted=False)
>>>>
>>>>
>>>>
>>>> Rob Leclerc, PhD
>>>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robl
>>>> eclerc>
>>>> P: (US) +1-(917)-873-3037
>>>> P: (Shanghai) +86-1-(861)-612-5469
>>>> Personal Email: [hidden email]
>>>>
>>>> ___________________________________________________________
>>>> Please keep all replies on the list by using "reply all"
>>>> in your mail client.  To manage your subscriptions to this and
>>>> other Galaxy lists, please use the interface at:
>>>>   http://lists.bx.psu.edu/
>>>>
>>>> To search Galaxy mailing lists use the unified search at:
>>>>   http://galaxyproject.org/search/mailinglists/
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20130429/383c60a5/attachment-0001.html>




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Creating multiple datasets in a libset

rdlecler
In reply to this post by burdettn
Hi Neil,

I've attached my class function for uploading multiple files. 

 def upload_files(self, fullpaths):
        """
            Uploads files from a disk location to a Galaxy library
            Accepts an array of full path filenames
            Example: fullpaths = ['/home/username/file1.txt', '/home/username/files2.txt']
        """
        if self.jsonstring == None:
            self.get_library()
            
        library_id = self.library_id
        library_folder_id = self.library_folder_id
        api_key = self.api_key
        api_url = self.api_url
        
        #Galaxy needs to read the pathnames as a new line delimited string
        #so we do that transformation here
        fullpaths_string = ""
        for path in fullpaths:
            fullpaths_string = fullpaths_string + path + "\n"
            
        fullpaths_string = fullpaths_string[:-1]
        data = {}
        data['folder_id'] = library_folder_id
        data['file_type'] = 'auto'
        data['dbkey'] = ''
        data['upload_option'] = 'upload_paths'
        data['filesystem_paths'] = fullpaths_string
        data['create_type'] = 'file'
        #Start the upload. This will return right away, but it may take awhile
        libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)
        
        #Iterate through each dataset we just uploaded and block until all files have been written to the Galaxy database
        for ds in libset:
            last_filesize = 0
            while True:
                #If file_size != 0 and the file_size is different after a second iteration, then we assume the disk write is finished
                ds_id = ds['id']
                uploaded_file = display(api_key, api_url + 'libraries/%s/contents/%s' %(library_id, ds_id), return_formatted=False)
                print uploaded_file
                if uploaded_file['file_size'] != 0 and uploaded_file['file_size'] == last_filesize:
                    break
                else:
                    last_filesize = uploaded_file['file_size']
                    time.sleep(2)
        self.libset = libset
        return libset


Rob Leclerc, PhD
P: (US) +1-(917)-873-3037
P: (Shanghai) +86-1-(861)-612-5469
Personal Email: [hidden email]


On Wed, May 29, 2013 at 12:45 AM, <[hidden email]> wrote:
Hi Guys,
         Did you manage to get multiple datasets working? I can't seem to upload multiple files. Only the last file appears in the history. I changed my code as mentioned in the thread below in "example_watch_folder.py" to add multiple files separated by a new line and increased the sleep time:

for fname in os.listdir(in_folder):
            fullpath = os.path.join(in_folder, fname)
            print ' fullpath is [%s] ' % fullpath
            if os.path.isfile(fullpath):
                data = {}
                data['folder_id'] = library_folder_id
                data['file_type'] = 'auto'
                data['dbkey'] = ''
                data['upload_option'] = 'upload_paths'
                data['filesystem_paths'] = "/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz"
                print ' data is [%s] ' % str(data['filesystem_paths'])
                data['create_type'] = 'file'
                libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)
                #TODO Handle this better, but the datatype isn't always
                # set for the followup workflow execution without this
                # pause.
                time.sleep(65)

However, I get the following crash:

./example_watch_folder.py 64f3209856a3cf4f2d034a1ad5bf851c http://barium-rbh/csiro/api/ /home/galaxy/galaxy-drop/input /home/galaxy/galaxy-drop/output "This One" f2db41e1fa331b3e

 fullpath is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
 data is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz
 /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
url is : http://barium-rbh/csiro/api/libraries/33b43b4e7093c91f/contents?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'file_type': 'auto', 'dbkey': '', 'create_type': 'file', 'folder_id': 'F33b43b4e7093c91f', 'upload_option': 'upload_paths', 'filesystem_paths': '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': 'ff5476bcf6c921fa'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['daecbdd824e1c349', '358eb58cd5463e0d', 'c0279aab05812500'], 'history': '3cc0effd29705aa3'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': '79966582feb6c081'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['19c51286b777bc04', '0f71f1fc170d4ab9', '256444f6e7017e58'], 'history': 'b701da857886499b'}
Traceback (most recent call last):
  File "./example_watch_folder.py", line 89, in <module>
    main(api_key, api_url, in_folder, out_folder, data_library, workflow )
  File "./example_watch_folder.py", line 75, in main
    shutil.move(fullpath, os.path.join(out_folder, fname))
  File "/usr/lib/python2.7/shutil.py", line 299, in move
    copy2(src, real_dst)
  File "/usr/lib/python2.7/shutil.py", line 128, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'

It says there is no such file, but this file has already been copied from the input to the output directory. Any help much appreciated

Neil

------------------------------

Message: 2
Date: Mon, 29 Apr 2013 16:11:39 -0400
From: Rob Leclerc <[hidden email]>
To: Dannon Baker <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Creating multiple datasets in a libset
Message-ID:
        <CAGkd85fHSgO2YC1T+Frctyso9G5rfQb=_mLyHGSdxPM+s3=[hidden email]>
Content-Type: text/plain; charset="iso-8859-1"

Hi Dannon,

I've written some code to (i) query a dataset to ensure that it's been
uploaded after a submit and (ii) to ensure a resulting dataset has been
written to the file.

*#Block until all datasets have been uploaded*
libset = submit(api_key, api_url + "libraries/%s/contents" % library_id,
data, return_formatted = False)
for ds in libset:
    while True:
        uploaded_file = display(api_key, api_url +
'libraries/%s/contents/%s' %(library_id, ds['id']), return_formatted=False)
        if uploaded_file['misc_info'] == None:
            time.sleep(1)
        else:
            break

*#Block until all result datasets have been saved to the filesystem*
result_ds_url = api_url + 'histories/' + history_id + '/contents/' +
dsh['id'];
while True:
    result_ds = display(api_key, result_ds_url, return_formatted=False)
        if result_ds["state"] == 'ok':
            break
        else:
            time.sleep(1)


Rob Leclerc, PhD
<http://www.linkedin.com/in/robleclerc> <https://twitter.com/#!/robleclerc>
P: (US) <a href="tel:%2B1-%28917%29-873-3037" value="+19178733037">+1-(917)-873-3037
P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469" value="+8618616125469">+86-1-(861)-612-5469
Personal Email: [hidden email]


On Mon, Apr 29, 2013 at 11:18 AM, Dannon Baker <[hidden email]>wrote:

> Yep, that example filesystem_paths you suggest should work fine.  The
> sleep() bit was a complete hack from the start, for simplicity in
> demonstrating a very basic pipeline, but what you probably want to do for a
> real implementation is query the dataset in question via the API, verify
> that the datatype/etc have been set, and only after that execute the
> workflow; instead of relying on sleep.
>
>
> On Mon, Apr 29, 2013 at 9:24 AM, Rob Leclerc <[hidden email]>wrote:
>
>> Hi Dannon,
>>
>> Thanks for the response. Sorry to be pedantic, but just to make sure that
>> I understand the interpretation of this field on the other side of the API,
>> I would need to have something like the following:
>>
>> data['filesystem_paths'] = "/home/me/file1.vcf \n /home/me/file2.vcf /n
>> /home/me/file3.vcf"
>>
>> I assume I should also increase the time.sleep() to reflect the uploading
>> of extra files?
>>
>> Cheers,
>>
>> Rob
>>
>> Rob Leclerc, PhD
>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>> P: (US) <a href="tel:%2B1-%28917%29-873-3037" value="+19178733037">+1-(917)-873-3037
>> P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469" value="+8618616125469">+86-1-(861)-612-5469
>> Personal Email: [hidden email]
>>
>>
>> On Mon, Apr 29, 2013 at 9:15 AM, Dannon Baker <[hidden email]>wrote:
>>
>>> Hey Rob,
>>>
>>> That example_watch_folder.py does just submit exactly one at a time,
>>> executes the workflow, and then does the next all in separate transactions.
>>>  If you wanted to upload multiple filepaths at once, you'd just append more
>>> to the ''filesystem_paths' field (newline separated paths).
>>>
>>> -Dannon
>>>
>>>
>>> On Fri, Apr 26, 2013 at 11:54 PM, Rob Leclerc <[hidden email]>wrote:
>>>
>>>> I'm looking at example_watch_folder.py and it's not clear from the
>>>> example how you submit multiple datasets to a library. In the example, the
>>>> first submit returns a libset [] with only a single entry and then proceeds
>>>> to iterate through each dataset in the libset in the following section:
>>>>
>>>> data = {}
>>>>
>>>>    data['folder_id'] = library_folder_id
>>>>
>>>>    data['file_type'] = 'auto'
>>>>
>>>>    data['dbkey'] = ''
>>>>
>>>>    data['upload_option'] = 'upload_paths'
>>>>
>>>>
>>>>
>>>> *data['filesystem_paths'] = fullpath*
>>>>
>>>>    data['create_type'] = 'file'
>>>>
>>>>    libset = submit(api_key, api_url + "libraries/%s/contents" %
>>>> library_id, data, return_formatted = False)
>>>>
>>>>    time.sleep(5)
>>>>
>>>>    for ds in libset:
>>>>
>>>>        if 'id' in ds:
>>>>
>>>>                         wf_data = {}
>>>>
>>>>                         wf_data['workflow_id'] = workflow['id']
>>>>
>>>>                         wf_data['history'] = "%s - %s" % (fname,
>>>> workflow['name'])
>>>>
>>>>                         wf_data['ds_map'] = {}
>>>>
>>>>                         for step_id, ds_in in workflow['inputs'
>>>> ].iteritems():
>>>>
>>>>                             wf_data['ds_map'][step_id] = {'src':'ld',
>>>> 'id':ds['id']}
>>>>
>>>>                         res = submit( api_key, api_url + 'workflows',
>>>> wf_data, return_formatted=False)
>>>>
>>>>
>>>>
>>>> Rob Leclerc, PhD
>>>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>>>> P: (US) <a href="tel:%2B1-%28917%29-873-3037" value="+19178733037">+1-(917)-873-3037
>>>> P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469" value="+8618616125469">+86-1-(861)-612-5469
>>>> Personal Email: [hidden email]
>>>>
>>>> ___________________________________________________________
>>>> Please keep all replies on the list by using "reply all"
>>>> in your mail client.  To manage your subscriptions to this
>>>> and other Galaxy lists, please use the interface at:
>>>>   http://lists.bx.psu.edu/
>>>>
>>>> To search Galaxy mailing lists use the unified search at:
>>>>   http://galaxyproject.org/search/mailinglists/
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20130429/383c60a5/attachment-0001.html>




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Creating multiple datasets in a libset

burdettn

Hi Rob,

        Thanks for the class I assume you created it in the “example_watch_folder.py” or whatever you may have renamed it too? Can you send me the full python script if possible ?

 

I modified example_watch_folder.py as followed (using your code):

 

if __name__ == '__main__':

    try:

        api_key = sys.argv[1]

        api_url = sys.argv[2]

        #in_folder = sys.argv[3]

        #out_folder = sys.argv[4]

        fullpaths = sys.argv[3]

        data_library = sys.argv[4]

        workflow = sys.argv[5]

    except IndexError:

        print 'usage: %s key url in_folder out_folder data_library workflow' % os.path.basename( sys.argv[0] )

        sys.exit( 1 )

    #main(api_key, api_url, in_folder, out_folder, data_library, workflow )

    main(api_key, api_url, fullpaths, data_library, workflow )

 

#def main(api_key, api_url, in_folder, out_folder, data_library, workflow):

def main(api_key, api_url, fullpaths, data_library, workflow):

...

 

while 1:

        #Galaxy needs to read the pathnames as a new line delimited string

        #so we do that transformation here

        print fullpaths

        fullpaths_string = ""

        for path in fullpaths:

            fullpaths_string = fullpaths_string + path + "\n"

           

        fullpaths_string = fullpaths_string[:-1]

        data = {}

        data['folder_id'] = library_folder_id

        data['file_type'] = 'auto'

        data['dbkey'] = ''

        data['upload_option'] = 'upload_paths'

        data['filesystem_paths'] = fullpaths_string

        data['create_type'] = 'file'

        print "before libset "

        #Start the upload. This will return right away, but it may take awhile

        libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)

        print "after libset "

        #Iterate through each dataset we just uploaded and block until all files have been written to the Galaxy database

        for ds in libset:

            last_filesize = 0

            while True:

                #If file_size != 0 and the file_size is different after a second iteration, then we assume the disk write is finished

                ds_id = ds['id']

                uploaded_file = display(api_key, api_url + 'libraries/%s/contents/%s' %(library_id, ds_id), return_formatted=False)

                print uploaded_file

                if uploaded_file['file_size'] != 0 and uploaded_file['file_size'] == last_filesize:

                    break

                else:

                    last_filesize = uploaded_file['file_size']

                    time.sleep(2)

 

However, when I run this I get the following output i.e. there is a new line after each character should you not use os.path.dirname:

 

./milxview_watch_folder.py de5f19fcf64a47ca9b61cfc3bf41490c http://barium-rbh/csiro/api/ "/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz,/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz" "This One" f2db41e1fa331b3e
/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz,/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz
/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
1
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z
,
/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
2
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z

/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
1
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z
,
/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
2
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z
before libset
after libset
Traceback (most recent call last):
  File "./milxview_watch_folder.py", line 127, in <module>
    main(api_key, api_url, fullpaths, data_library, workflow )
  File "./milxview_watch_folder.py", line 70, in main
    ds_id = ds['id']
TypeError: string indices must be integers, not str

 

 

From: Rob Leclerc [mailto:[hidden email]]
Sent: Wednesday, 29 May 2013 11:38 PM
To: Burdett, Neil (ICT Centre, Herston - RBWH)
Cc: [hidden email]; Dannon Baker
Subject: Re: Creating multiple datasets in a libset

 

Hi Neil,

 

I've attached my class function for uploading multiple files. 

 

 def upload_files(self, fullpaths):

        """

            Uploads files from a disk location to a Galaxy library

            Accepts an array of full path filenames

            Example: fullpaths = ['/home/username/file1.txt', '/home/username/files2.txt']

        """

        if self.jsonstring == None:

            self.get_library()

            

        library_id = self.library_id

        library_folder_id = self.library_folder_id

        api_key = self.api_key

        api_url = self.api_url

        

        #Galaxy needs to read the pathnames as a new line delimited string

        #so we do that transformation here

        fullpaths_string = ""

        for path in fullpaths:

            fullpaths_string = fullpaths_string + path + "\n"

            

        fullpaths_string = fullpaths_string[:-1]

        data = {}

        data['folder_id'] = library_folder_id

        data['file_type'] = 'auto'

        data['dbkey'] = ''

        data['upload_option'] = 'upload_paths'

        data['filesystem_paths'] = fullpaths_string

        data['create_type'] = 'file'

        #Start the upload. This will return right away, but it may take awhile

        libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)

        

        #Iterate through each dataset we just uploaded and block until all files have been written to the Galaxy database

        for ds in libset:

            last_filesize = 0

            while True:

                #If file_size != 0 and the file_size is different after a second iteration, then we assume the disk write is finished

                ds_id = ds['id']

                uploaded_file = display(api_key, api_url + 'libraries/%s/contents/%s' %(library_id, ds_id), return_formatted=False)

                print uploaded_file

                if uploaded_file['file_size'] != 0 and uploaded_file['file_size'] == last_filesize:

                    break

                else:

                    last_filesize = uploaded_file['file_size']

                    time.sleep(2)

        self.libset = libset

        return libset

 


Rob Leclerc, PhD

P: (US) +1-(917)-873-3037

P: (Shanghai) +86-1-(861)-612-5469

Personal Email: [hidden email]

 

On Wed, May 29, 2013 at 12:45 AM, <[hidden email]> wrote:

Hi Guys,
         Did you manage to get multiple datasets working? I can't seem to upload multiple files. Only the last file appears in the history. I changed my code as mentioned in the thread below in "example_watch_folder.py" to add multiple files separated by a new line and increased the sleep time:

for fname in os.listdir(in_folder):
            fullpath = os.path.join(in_folder, fname)
            print ' fullpath is [%s] ' % fullpath
            if os.path.isfile(fullpath):
                data = {}
                data['folder_id'] = library_folder_id
                data['file_type'] = 'auto'
                data['dbkey'] = ''
                data['upload_option'] = 'upload_paths'
                data['filesystem_paths'] = "/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz"
                print ' data is [%s] ' % str(data['filesystem_paths'])
                data['create_type'] = 'file'
                libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)
                #TODO Handle this better, but the datatype isn't always
                # set for the followup workflow execution without this
                # pause.
                time.sleep(65)

However, I get the following crash:

./example_watch_folder.py 64f3209856a3cf4f2d034a1ad5bf851c http://barium-rbh/csiro/api/ /home/galaxy/galaxy-drop/input /home/galaxy/galaxy-drop/output "This One" f2db41e1fa331b3e

 fullpath is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
 data is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz
 /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
url is : http://barium-rbh/csiro/api/libraries/33b43b4e7093c91f/contents?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'file_type': 'auto', 'dbkey': '', 'create_type': 'file', 'folder_id': 'F33b43b4e7093c91f', 'upload_option': 'upload_paths', 'filesystem_paths': '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': 'ff5476bcf6c921fa'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['daecbdd824e1c349', '358eb58cd5463e0d', 'c0279aab05812500'], 'history': '3cc0effd29705aa3'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': '79966582feb6c081'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['19c51286b777bc04', '0f71f1fc170d4ab9', '256444f6e7017e58'], 'history': 'b701da857886499b'}
Traceback (most recent call last):
  File "./example_watch_folder.py", line 89, in <module>
    main(api_key, api_url, in_folder, out_folder, data_library, workflow )
  File "./example_watch_folder.py", line 75, in main
    shutil.move(fullpath, os.path.join(out_folder, fname))
  File "/usr/lib/python2.7/shutil.py", line 299, in move
    copy2(src, real_dst)
  File "/usr/lib/python2.7/shutil.py", line 128, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'

It says there is no such file, but this file has already been copied from the input to the output directory. Any help much appreciated

Neil

------------------------------

Message: 2
Date: Mon, 29 Apr 2013 16:11:39 -0400
From: Rob Leclerc <[hidden email]>
To: Dannon Baker <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Creating multiple datasets in a libset
Message-ID:
        <CAGkd85fHSgO2YC1T+Frctyso9G5rfQb=_mLyHGSdxPM+s3=[hidden email]>
Content-Type: text/plain; charset="iso-8859-1"

Hi Dannon,

I've written some code to (i) query a dataset to ensure that it's been
uploaded after a submit and (ii) to ensure a resulting dataset has been
written to the file.

*#Block until all datasets have been uploaded*
libset = submit(api_key, api_url + "libraries/%s/contents" % library_id,
data, return_formatted = False)
for ds in libset:
    while True:
        uploaded_file = display(api_key, api_url +
'libraries/%s/contents/%s' %(library_id, ds['id']), return_formatted=False)
        if uploaded_file['misc_info'] == None:
            time.sleep(1)
        else:
            break

*#Block until all result datasets have been saved to the filesystem*
result_ds_url = api_url + 'histories/' + history_id + '/contents/' +
dsh['id'];
while True:
    result_ds = display(api_key, result_ds_url, return_formatted=False)
        if result_ds["state"] == 'ok':
            break
        else:
            time.sleep(1)


Rob Leclerc, PhD
<http://www.linkedin.com/in/robleclerc> <https://twitter.com/#!/robleclerc>
P: (US) <a href="tel:%2B1-%28917%29-873-3037">+1-(917)-873-3037
P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469">+86-1-(861)-612-5469
Personal Email: [hidden email]


On Mon, Apr 29, 2013 at 11:18 AM, Dannon Baker <[hidden email]>wrote:

> Yep, that example filesystem_paths you suggest should work fine.  The
> sleep() bit was a complete hack from the start, for simplicity in
> demonstrating a very basic pipeline, but what you probably want to do for a
> real implementation is query the dataset in question via the API, verify
> that the datatype/etc have been set, and only after that execute the
> workflow; instead of relying on sleep.
>
>
> On Mon, Apr 29, 2013 at 9:24 AM, Rob Leclerc <[hidden email]>wrote:
>
>> Hi Dannon,
>>
>> Thanks for the response. Sorry to be pedantic, but just to make sure that
>> I understand the interpretation of this field on the other side of the API,
>> I would need to have something like the following:
>>
>> data['filesystem_paths'] = "/home/me/file1.vcf \n /home/me/file2.vcf /n
>> /home/me/file3.vcf"
>>
>> I assume I should also increase the time.sleep() to reflect the uploading
>> of extra files?
>>
>> Cheers,
>>
>> Rob
>>
>> Rob Leclerc, PhD
>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>> P: (US) <a href="tel:%2B1-%28917%29-873-3037">+1-(917)-873-3037
>> P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469">+86-1-(861)-612-5469
>> Personal Email: [hidden email]
>>
>>
>> On Mon, Apr 29, 2013 at 9:15 AM, Dannon Baker <[hidden email]>wrote:
>>
>>> Hey Rob,
>>>
>>> That example_watch_folder.py does just submit exactly one at a time,
>>> executes the workflow, and then does the next all in separate transactions.
>>>  If you wanted to upload multiple filepaths at once, you'd just append more
>>> to the ''filesystem_paths' field (newline separated paths).
>>>
>>> -Dannon
>>>
>>>
>>> On Fri, Apr 26, 2013 at 11:54 PM, Rob Leclerc <[hidden email]>wrote:
>>>
>>>> I'm looking at example_watch_folder.py and it's not clear from the
>>>> example how you submit multiple datasets to a library. In the example, the
>>>> first submit returns a libset [] with only a single entry and then proceeds
>>>> to iterate through each dataset in the libset in the following section:
>>>>
>>>> data = {}
>>>>
>>>>    data['folder_id'] = library_folder_id
>>>>
>>>>    data['file_type'] = 'auto'
>>>>
>>>>    data['dbkey'] = ''
>>>>
>>>>    data['upload_option'] = 'upload_paths'
>>>>
>>>>
>>>>
>>>> *data['filesystem_paths'] = fullpath*
>>>>
>>>>    data['create_type'] = 'file'
>>>>
>>>>    libset = submit(api_key, api_url + "libraries/%s/contents" %
>>>> library_id, data, return_formatted = False)
>>>>
>>>>    time.sleep(5)
>>>>
>>>>    for ds in libset:
>>>>
>>>>        if 'id' in ds:
>>>>
>>>>                         wf_data = {}
>>>>
>>>>                         wf_data['workflow_id'] = workflow['id']
>>>>
>>>>                         wf_data['history'] = "%s - %s" % (fname,
>>>> workflow['name'])
>>>>
>>>>                         wf_data['ds_map'] = {}
>>>>
>>>>                         for step_id, ds_in in workflow['inputs'
>>>> ].iteritems():
>>>>
>>>>                             wf_data['ds_map'][step_id] = {'src':'ld',
>>>> 'id':ds['id']}
>>>>
>>>>                         res = submit( api_key, api_url + 'workflows',
>>>> wf_data, return_formatted=False)
>>>>
>>>>
>>>>
>>>> Rob Leclerc, PhD
>>>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>>>> P: (US) <a href="tel:%2B1-%28917%29-873-3037">+1-(917)-873-3037
>>>> P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469">+86-1-(861)-612-5469
>>>> Personal Email: [hidden email]
>>>>
>>>> ___________________________________________________________
>>>> Please keep all replies on the list by using "reply all"
>>>> in your mail client.  To manage your subscriptions to this
>>>> and other Galaxy lists, please use the interface at:
>>>>   http://lists.bx.psu.edu/
>>>>
>>>> To search Galaxy mailing lists use the unified search at:
>>>>   http://galaxyproject.org/search/mailinglists/
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20130429/383c60a5/attachment-0001.html>

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Creating multiple datasets in a libset

rdlecler
Hi Neil,

Is fullpaths a *new line* delimited string?

Rob Leclerc, PhD
P: (US) +1-(917)-873-3037
P: (Shanghai) +86-1-(861)-612-5469
Personal Email: [hidden email]


On May 30, 2013, at 1:09 AM, <[hidden email]> wrote:

Hi Rob,

        Thanks for the class I assume you created it in the “example_watch_folder.py” or whatever you may have renamed it too? Can you send me the full python script if possible ?

 

I modified example_watch_folder.py as followed (using your code):

 

if __name__ == '__main__':

    try:

        api_key = sys.argv[1]

        api_url = sys.argv[2]

        #in_folder = sys.argv[3]

        #out_folder = sys.argv[4]

        fullpaths = sys.argv[3]

        data_library = sys.argv[4]

        workflow = sys.argv[5]

    except IndexError:

        print 'usage: %s key url in_folder out_folder data_library workflow' % os.path.basename( sys.argv[0] )

        sys.exit( 1 )

    #main(api_key, api_url, in_folder, out_folder, data_library, workflow )

    main(api_key, api_url, fullpaths, data_library, workflow )

 

#def main(api_key, api_url, in_folder, out_folder, data_library, workflow):

def main(api_key, api_url, fullpaths, data_library, workflow):

...

 

while 1:

        #Galaxy needs to read the pathnames as a new line delimited string

        #so we do that transformation here

        print fullpaths

        fullpaths_string = ""

        for path in fullpaths:

            fullpaths_string = fullpaths_string + path + "\n"

           

        fullpaths_string = fullpaths_string[:-1]

        data = {}

        data['folder_id'] = library_folder_id

        data['file_type'] = 'auto'

        data['dbkey'] = ''

        data['upload_option'] = 'upload_paths'

        data['filesystem_paths'] = fullpaths_string

        data['create_type'] = 'file'

        print "before libset "

        #Start the upload. This will return right away, but it may take awhile

        libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)

        print "after libset "

        #Iterate through each dataset we just uploaded and block until all files have been written to the Galaxy database

        for ds in libset:

            last_filesize = 0

            while True:

                #If file_size != 0 and the file_size is different after a second iteration, then we assume the disk write is finished

                ds_id = ds['id']

                uploaded_file = display(api_key, api_url + 'libraries/%s/contents/%s' %(library_id, ds_id), return_formatted=False)

                print uploaded_file

                if uploaded_file['file_size'] != 0 and uploaded_file['file_size'] == last_filesize:

                    break

                else:

                    last_filesize = uploaded_file['file_size']

                    time.sleep(2)

 

However, when I run this I get the following output i.e. there is a new line after each character should you not use os.path.dirname:

 

./milxview_watch_folder.py de5f19fcf64a47ca9b61cfc3bf41490c http://barium-rbh/csiro/api/ "/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz,/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz" "This One" f2db41e1fa331b3e
/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz,/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz
/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
1
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z
,
/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
2
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z

/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
1
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z
,
/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
2
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z
before libset
after libset
Traceback (most recent call last):
  File "./milxview_watch_folder.py", line 127, in <module>
    main(api_key, api_url, fullpaths, data_library, workflow )
  File "./milxview_watch_folder.py", line 70, in main
    ds_id = ds['id']
TypeError: string indices must be integers, not str

 

 

From: Rob Leclerc [[hidden email]]
Sent: Wednesday, 29 May 2013 11:38 PM
To: Burdett, Neil (ICT Centre, Herston - RBWH)
Cc: [hidden email]; Dannon Baker
Subject: Re: Creating multiple datasets in a libset

 

Hi Neil,

 

I've attached my class function for uploading multiple files. 

 

 def upload_files(self, fullpaths):

        """

            Uploads files from a disk location to a Galaxy library

            Accepts an array of full path filenames

            Example: fullpaths = ['/home/username/file1.txt', '/home/username/files2.txt']

        """

        if self.jsonstring == None:

            self.get_library()

            

        library_id = self.library_id

        library_folder_id = self.library_folder_id

        api_key = self.api_key

        api_url = self.api_url

        

        #Galaxy needs to read the pathnames as a new line delimited string

        #so we do that transformation here

        fullpaths_string = ""

        for path in fullpaths:

            fullpaths_string = fullpaths_string + path + "\n"

            

        fullpaths_string = fullpaths_string[:-1]

        data = {}

        data['folder_id'] = library_folder_id

        data['file_type'] = 'auto'

        data['dbkey'] = ''

        data['upload_option'] = 'upload_paths'

        data['filesystem_paths'] = fullpaths_string

        data['create_type'] = 'file'

        #Start the upload. This will return right away, but it may take awhile

        libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)

        

        #Iterate through each dataset we just uploaded and block until all files have been written to the Galaxy database

        for ds in libset:

            last_filesize = 0

            while True:

                #If file_size != 0 and the file_size is different after a second iteration, then we assume the disk write is finished

                ds_id = ds['id']

                uploaded_file = display(api_key, api_url + 'libraries/%s/contents/%s' %(library_id, ds_id), return_formatted=False)

                print uploaded_file

                if uploaded_file['file_size'] != 0 and uploaded_file['file_size'] == last_filesize:

                    break

                else:

                    last_filesize = uploaded_file['file_size']

                    time.sleep(2)

        self.libset = libset

        return libset

 


Rob Leclerc, PhD

P: (US) +1-(917)-873-3037

P: (Shanghai) +86-1-(861)-612-5469

Personal Email: [hidden email]

 

On Wed, May 29, 2013 at 12:45 AM, <[hidden email]> wrote:

Hi Guys,
         Did you manage to get multiple datasets working? I can't seem to upload multiple files. Only the last file appears in the history. I changed my code as mentioned in the thread below in "example_watch_folder.py" to add multiple files separated by a new line and increased the sleep time:

for fname in os.listdir(in_folder):
            fullpath = os.path.join(in_folder, fname)
            print ' fullpath is [%s] ' % fullpath
            if os.path.isfile(fullpath):
                data = {}
                data['folder_id'] = library_folder_id
                data['file_type'] = 'auto'
                data['dbkey'] = ''
                data['upload_option'] = 'upload_paths'
                data['filesystem_paths'] = "/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz"
                print ' data is [%s] ' % str(data['filesystem_paths'])
                data['create_type'] = 'file'
                libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)
                #TODO Handle this better, but the datatype isn't always
                # set for the followup workflow execution without this
                # pause.
                time.sleep(65)

However, I get the following crash:

./example_watch_folder.py 64f3209856a3cf4f2d034a1ad5bf851c http://barium-rbh/csiro/api/ /home/galaxy/galaxy-drop/input /home/galaxy/galaxy-drop/output "This One" f2db41e1fa331b3e

 fullpath is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
 data is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz
 /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
url is : http://barium-rbh/csiro/api/libraries/33b43b4e7093c91f/contents?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'file_type': 'auto', 'dbkey': '', 'create_type': 'file', 'folder_id': 'F33b43b4e7093c91f', 'upload_option': 'upload_paths', 'filesystem_paths': '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': 'ff5476bcf6c921fa'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['daecbdd824e1c349', '358eb58cd5463e0d', 'c0279aab05812500'], 'history': '3cc0effd29705aa3'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': '79966582feb6c081'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['19c51286b777bc04', '0f71f1fc170d4ab9', '256444f6e7017e58'], 'history': 'b701da857886499b'}
Traceback (most recent call last):
  File "./example_watch_folder.py", line 89, in <module>
    main(api_key, api_url, in_folder, out_folder, data_library, workflow )
  File "./example_watch_folder.py", line 75, in main
    shutil.move(fullpath, os.path.join(out_folder, fname))
  File "/usr/lib/python2.7/shutil.py", line 299, in move
    copy2(src, real_dst)
  File "/usr/lib/python2.7/shutil.py", line 128, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'

It says there is no such file, but this file has already been copied from the input to the output directory. Any help much appreciated

Neil

------------------------------

Message: 2
Date: Mon, 29 Apr 2013 16:11:39 -0400
From: Rob Leclerc <[hidden email]>
To: Dannon Baker <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Creating multiple datasets in a libset
Message-ID:
        <CAGkd85fHSgO2YC1T+Frctyso9G5rfQb=_mLyHGSdxPM+s3=[hidden email]>
Content-Type: text/plain; charset="iso-8859-1"

Hi Dannon,

I've written some code to (i) query a dataset to ensure that it's been
uploaded after a submit and (ii) to ensure a resulting dataset has been
written to the file.

*#Block until all datasets have been uploaded*
libset = submit(api_key, api_url + "libraries/%s/contents" % library_id,
data, return_formatted = False)
for ds in libset:
    while True:
        uploaded_file = display(api_key, api_url +
'libraries/%s/contents/%s' %(library_id, ds['id']), return_formatted=False)
        if uploaded_file['misc_info'] == None:
            time.sleep(1)
        else:
            break

*#Block until all result datasets have been saved to the filesystem*
result_ds_url = api_url + 'histories/' + history_id + '/contents/' +
dsh['id'];
while True:
    result_ds = display(api_key, result_ds_url, return_formatted=False)
        if result_ds["state"] == 'ok':
            break
        else:
            time.sleep(1)


Rob Leclerc, PhD
<http://www.linkedin.com/in/robleclerc> <https://twitter.com/#!/robleclerc>
P: (US) <a href="tel:%2B1-%28917%29-873-3037">+1-(917)-873-3037
P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469">+86-1-(861)-612-5469
Personal Email: [hidden email]


On Mon, Apr 29, 2013 at 11:18 AM, Dannon Baker <[hidden email]>wrote:

> Yep, that example filesystem_paths you suggest should work fine.  The
> sleep() bit was a complete hack from the start, for simplicity in
> demonstrating a very basic pipeline, but what you probably want to do for a
> real implementation is query the dataset in question via the API, verify
> that the datatype/etc have been set, and only after that execute the
> workflow; instead of relying on sleep.
>
>
> On Mon, Apr 29, 2013 at 9:24 AM, Rob Leclerc <[hidden email]>wrote:
>
>> Hi Dannon,
>>
>> Thanks for the response. Sorry to be pedantic, but just to make sure that
>> I understand the interpretation of this field on the other side of the API,
>> I would need to have something like the following:
>>
>> data['filesystem_paths'] = "/home/me/file1.vcf \n /home/me/file2.vcf /n
>> /home/me/file3.vcf"
>>
>> I assume I should also increase the time.sleep() to reflect the uploading
>> of extra files?
>>
>> Cheers,
>>
>> Rob
>>
>> Rob Leclerc, PhD
>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>> P: (US) <a href="tel:%2B1-%28917%29-873-3037">+1-(917)-873-3037
>> P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469">+86-1-(861)-612-5469
>> Personal Email: [hidden email]
>>
>>
>> On Mon, Apr 29, 2013 at 9:15 AM, Dannon Baker <[hidden email]>wrote:
>>
>>> Hey Rob,
>>>
>>> That example_watch_folder.py does just submit exactly one at a time,
>>> executes the workflow, and then does the next all in separate transactions.
>>>  If you wanted to upload multiple filepaths at once, you'd just append more
>>> to the ''filesystem_paths' field (newline separated paths).
>>>
>>> -Dannon
>>>
>>>
>>> On Fri, Apr 26, 2013 at 11:54 PM, Rob Leclerc <[hidden email]>wrote:
>>>
>>>> I'm looking at example_watch_folder.py and it's not clear from the
>>>> example how you submit multiple datasets to a library. In the example, the
>>>> first submit returns a libset [] with only a single entry and then proceeds
>>>> to iterate through each dataset in the libset in the following section:
>>>>
>>>> data = {}
>>>>
>>>>    data['folder_id'] = library_folder_id
>>>>
>>>>    data['file_type'] = 'auto'
>>>>
>>>>    data['dbkey'] = ''
>>>>
>>>>    data['upload_option'] = 'upload_paths'
>>>>
>>>>
>>>>
>>>> *data['filesystem_paths'] = fullpath*
>>>>
>>>>    data['create_type'] = 'file'
>>>>
>>>>    libset = submit(api_key, api_url + "libraries/%s/contents" %
>>>> library_id, data, return_formatted = False)
>>>>
>>>>    time.sleep(5)
>>>>
>>>>    for ds in libset:
>>>>
>>>>        if 'id' in ds:
>>>>
>>>>                         wf_data = {}
>>>>
>>>>                         wf_data['workflow_id'] = workflow['id']
>>>>
>>>>                         wf_data['history'] = "%s - %s" % (fname,
>>>> workflow['name'])
>>>>
>>>>                         wf_data['ds_map'] = {}
>>>>
>>>>                         for step_id, ds_in in workflow['inputs'
>>>> ].iteritems():
>>>>
>>>>                             wf_data['ds_map'][step_id] = {'src':'ld',
>>>> 'id':ds['id']}
>>>>
>>>>                         res = submit( api_key, api_url + 'workflows',
>>>> wf_data, return_formatted=False)
>>>>
>>>>
>>>>
>>>> Rob Leclerc, PhD
>>>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>>>> P: (US) <a href="tel:%2B1-%28917%29-873-3037">+1-(917)-873-3037
>>>> P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469">+86-1-(861)-612-5469
>>>> Personal Email: [hidden email]
>>>>
>>>> ___________________________________________________________
>>>> Please keep all replies on the list by using "reply all"
>>>> in your mail client.  To manage your subscriptions to this
>>>> and other Galaxy lists, please use the interface at:
>>>>   http://lists.bx.psu.edu/
>>>>
>>>> To search Galaxy mailing lists use the unified search at:
>>>>   http://galaxyproject.org/search/mailinglists/
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20130429/383c60a5/attachment-0001.html>

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Creating multiple datasets in a libset

rdlecler
Hi Niel,

I've attached my wrapper classes I'm using for the Galaxy API which allows me to upload multiple files. This was based on the watch_folder example. There is also a mapping mechanisms to map input and output files to their workflow labels. See the commented section below for some hints on how to use it


#################START##################

import config
import os
import shutil
import sys
import time
import logging

sys.path.insert( 0, os.path.dirname( __file__ ) )
from common import submit, display

logging.basicConfig(level=logging.INFO)
log = logging.getLogger('galaxy_api_utils')

class GalaxyLibrary:
    """
        Encapsulates basic functionality for creating a library and uploading files to a library
    """
    def __init__(self, api_key, api_url, libraryname):
        self.api_key = api_key
        self.api_url = api_url
        self.libraryname = libraryname
        self.jsonstring = None
        self.libset = {}
    

    def get_library(self):
        """
            Get's the library created/accessed when this class was initialized
            If the library does not exist then build it
            Returns library_id and the library_folder_id
        """
        if self.jsonstring != None:
            return self.jsonstring
        api_key = self.api_key
        api_url = self.api_url
        libraryname = self.libraryname
        
        libs = display(api_key, api_url + 'libraries', return_formatted=False)
        library_id = None
        for library in libs:
            if library['name'] == libraryname:
                library_id = library['id']
        if not library_id:
            lib_create_data = {'name':libraryname}
            library = submit(api_key, api_url + 'libraries', lib_create_data, return_formatted=False)
            library_id = library['id']
        folders = display(api_key, api_url + "libraries/%s/contents" % library_id, return_formatted = False)
        for f in folders:
            if f['name'] == "/":
                library_folder_id = f['id']
        if not library_id or not library_folder_id:
            log.error("GalaxyLibrary:get_library    Failure to configure library destination.")
            raise Exception('Failed to configure library destination')
        self.jsonstring = { 'library_id' : library_id, 'library_folder_id' : library_folder_id }
        return self.jsonstring
    

    def upload_files(self, fullpaths):
        """
            Uploads files from a disk location to a Galaxy library
            Accepts an array of full path filenames
            Example: fullpaths = ['/home/username/file1.txt', '/home/username/files2.txt']
        """
        if self.jsonstring == None:
            self.get_library()
            
        library_id = self.jsonstring['library_id']
        library_folder_id = self.jsonstring['library_folder_id']
        api_key = self.api_key
        api_url = self.api_url
        
        #Galaxy needs to read the pathnames as a new line delimited string
        #so we do that transformation here
        fullpaths_string = ""
        for path in fullpaths:
            fullpaths_string = fullpaths_string + path + "\n"
            
        fullpaths_string = fullpaths_string[:-1]
        data = {}
        data['folder_id'] = library_folder_id
        data['file_type'] = 'auto'
        data['dbkey'] = ''
        data['upload_option'] = 'upload_paths'
        data['filesystem_paths'] = fullpaths_string
        data['create_type'] = 'file'
        #Start the upload. This will return right away, but it may take awhile
        libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)
        
        #Iterate through each dataset we just uploaded and block until all files have been written to the Galaxy database
        for ds in libset:
            last_filesize = 0
            while True:
                #If file_size != 0 and the file_size is different after a second iteration, then we assume the disk write is finished
                ds_id = ds['id']
                uploaded_file = display(api_key, api_url + 'libraries/%s/contents/%s' %(library_id, ds_id), return_formatted=False)
                print uploaded_file
                if uploaded_file['file_size'] != 0 and uploaded_file['file_size'] == last_filesize:
                    break
                else:
                    last_filesize = uploaded_file['file_size']
                    time.sleep(2)
        self.libset = libset
        return libset
    
    
    def get_libset(self): 
        return self.libset
    
    
    def get_file_id(self, filename):
        """
            Gets the Galaxy file id for a file that we've uploaded with this object
        """
        libset = self.get_libset()
        for myfile in libset:
            if myfile['name'] == filename:
                return myfile['id']
        return None


class GalaxyWorkflow:
    """
        Encapsulates basic functionality to execute a workflow
    """
    def __init__(self, api_key, api_url, workflow_id):
        self.api_key = api_key
        self.api_url = api_url
        self.workflow_id = workflow_id
        self.history_name = None
        self.history_id = None

    def execute_workflow(self, workflow_id, step_id_mapping, history_name):
        """
            Variable: step_id_mapping
            Example mapping:
            step_id_mapping = { 'input_label_1' : { 'src' : 'ld', id: <file_id_from_uploaded_lib>' }},  { 'input_label_2' : { 'src' : 'ldda', id: <file_id_from_dataset>' }}
            where input_label_1 is the name at inputs.<step_id>.label which you want to associate with a particular fileid
            Files uplpaded should use src':'ld' while library datasets should be in 'src:ldda'
            history_name: should be a unique name for this run.
        """
        api_key = self.api_key
        api_url = self.api_url
        
        workflow = display(api_key, api_url + 'workflows/%s' % self.workflow_id, return_formatted = False)
        wf_data = {}
        wf_data['workflow_id'] = workflow['id']
        self.history_name = "%s - %s" % (history_name, workflow['name'])
        wf_data['history'] = self.history_name
        ds_map = {}
        
        #We need to map to the workflow's input label names to the step id since changes the galaxy database
        #will change the workflow step_id numbers
        for step_id, val in workflow['inputs'].iteritems():
            label = val['label']
            ds_map[step_id] =  step_id_mapping[label]     
        wf_data['ds_map'] = ds_map
        
        #Run the workflow. This will return immediately, but it will take awhile to run
        res = submit( api_key, api_url + 'workflows', wf_data, return_formatted=False)
        if res:
            log.info('GalaxyWorkflow:execute_workflow    workflow executed. Result:' + str(res))
            self.history_id = res['history']
            return GalaxyHistory(api_key, api_url, self.history_id)
        return None


class GalaxyHistory:
    """
        encapsulates basic functionality for exporting results from a history.
    """
    def __init__(self, api_key, api_url, history_id):
        self.api_key = api_key
        self.api_url = api_url
        self.history_id = history_id
    
    # Export all designated 'output' files in the workflow to an export location outside of galaxy
    # This just copies datasets located the $GALAXY_HOME/database/files/<a sub dir> to an output director
    # dest_folder is in the format such as /Users/Robs/
    # galaxyCopyFileMap 
    # We need to map the output filenames specified in the workflow to output filenames
    # Example format:
    # galaxyCopyFileMap = { 'galaxy_output_filename1' : 'myarchivename1.ext', ''galaxy_output_filename1' : 'myarchivename2.ext' }
    def export_results(self, dest_folder, copyFileMap):
        """
            Export all designated 'output' files in the workflow to an export location outside of galaxy
            This just copies datasets located the $GALAXY_HOME/database/files/<a sub dir> to an output director
            dest_folder is in the format such as /Users/Robs/
            galaxyCopyFileMap 
            We need to map the output filenames specified in the workflow to output filenames
            Example format:
            galaxyCopyFileMap = { 'galaxy_output_filename1' : 'myarchivename1.ext', ''galaxy_output_filename1' : 'myarchivename2.ext' }
        """
        api_key = self.api_key
        api_url = self.api_url
        history_id = self.history_id
        if dest_folder.endswith('/') == False:
            dest_folder = dest_folder + '/'

        history_contents = display(api_key, api_url + 'histories/' + history_id + '/contents', return_formatted=False)
        
        #Use the copyFileMap to copy the files designated with a specific output label to a filename for exporting out of galaxy
        for internalfn, newfn in copyFileMap.iteritems():
            for dsh in history_contents:
                if 'name' in dsh:
                    if dsh['name'] == internalfn:
                        result_ds_url = api_url + 'histories/' + history_id + '/contents/' + dsh['id'];
                        
                        #Block until all results have been written to disk
                        ds_file_size = 0
                        while True:
                            result_ds = display(api_key, result_ds_url, return_formatted=False)
                            print result_ds
                            if result_ds["file_size"] != 0 and result_ds["file_size"] == ds_file_size:
                                break
                            else:
                                ds_file_size = result_ds["file_size"]
                                time.sleep(2)
                        
                        result_file_name = result_ds['file_name'];
                        fname = dest_folder + newfn
                        shutil.copy(result_file_name, fname)

# def main():
#     
#     EXAMPLE:
#     workflow_id = "<your_workflow_id_here>"
#     api_key = "<your_api_key_here>"
#     api_url = "http://localhost:8080/api/"
#     library_name =  "<your_libraryname_here>"
#     history_name = library_name + '-hist'
#     
#     gl = GalaxyLibrary(api_key, api_url, library_name )
#     fileset = ['/Users/Rob/input1.vcf', '/Users/Rob/input1.bed']
#     gl.upload_files(fileset);
#     libset =  gl.get_libset()
#     #Map workflow input labels with input files
#     step_id_mapping =  get_map(libset)
#     #Add Archived Datasets
#     
#     gwf = GalaxyWorkflow(api_key, api_url, workflow_id)
#     galaxyHistory = gwf.execute_workflow(workflow_id, step_id_mapping, history_name)
#     
#     #Map the output label with a filename to export the results once the run is completed
#     copyFileMap = {"output1":"output_file.txt"}
#     gh = GalaxyHistory(api_key, api_url, history_id)
#     gh.export_results("/Users/Rob", copyFileMap)
#     
#Map a workflows input labels to the file ids of input files. Assumes you've labeled your workflow inputs accordingly.
#def get_map(libset):
#    ds_map = {}
#    #Add dynamic data sets
#    for ds in libset:
#        if ds['name'].endswith("vcf"):
#            ds_map['INPUT_LABEL_1'] = { 'src' : 'ld', 'id' : ds['id'] }    #The first file you uploaded
#        if ds['name'].endswith("bed"):
#            ds_map['INPUT_LABEL_2'] = { 'src' : 'ld', 'id' : ds['id'] }    #The second file you uploaded
#    #Add Archived Datasets
#    ds_map['INPUT_LABEL_3'] = { 'src' : 'ldda', 'id' : '<id_to_an_input_file_already_stored_in_the_data_library>'}
#    return ds_map
#
# if __name__ == '__main__':
#     main()


Rob Leclerc, PhD
P: (US) +1-(917)-873-3037
P: (Shanghai) +86-1-(861)-612-5469
Personal Email: [hidden email]


On Thu, May 30, 2013 at 8:39 AM, Rob Leclerc <[hidden email]> wrote:
Hi Neil,

Is fullpaths a *new line* delimited string?


Rob Leclerc, PhD
P: (US) <a href="tel:%2B1-%28917%29-873-3037" value="+19178733037" target="_blank">+1-(917)-873-3037
P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469" value="+8618616125469" target="_blank">+86-1-(861)-612-5469
Personal Email: [hidden email]


On May 30, 2013, at 1:09 AM, <[hidden email]> wrote:

Hi Rob,

        Thanks for the class I assume you created it in the “example_watch_folder.py” or whatever you may have renamed it too? Can you send me the full python script if possible ?

 

I modified example_watch_folder.py as followed (using your code):

 

if __name__ == '__main__':

    try:

        api_key = sys.argv[1]

        api_url = sys.argv[2]

        #in_folder = sys.argv[3]

        #out_folder = sys.argv[4]

        fullpaths = sys.argv[3]

        data_library = sys.argv[4]

        workflow = sys.argv[5]

    except IndexError:

        print 'usage: %s key url in_folder out_folder data_library workflow' % os.path.basename( sys.argv[0] )

        sys.exit( 1 )

    #main(api_key, api_url, in_folder, out_folder, data_library, workflow )

    main(api_key, api_url, fullpaths, data_library, workflow )

 

#def main(api_key, api_url, in_folder, out_folder, data_library, workflow):

def main(api_key, api_url, fullpaths, data_library, workflow):

...

 

while 1:

        #Galaxy needs to read the pathnames as a new line delimited string

        #so we do that transformation here

        print fullpaths

        fullpaths_string = ""

        for path in fullpaths:

            fullpaths_string = fullpaths_string + path + "\n"

           

        fullpaths_string = fullpaths_string[:-1]

        data = {}

        data['folder_id'] = library_folder_id

        data['file_type'] = 'auto'

        data['dbkey'] = ''

        data['upload_option'] = 'upload_paths'

        data['filesystem_paths'] = fullpaths_string

        data['create_type'] = 'file'

        print "before libset "

        #Start the upload. This will return right away, but it may take awhile

        libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)

        print "after libset "

        #Iterate through each dataset we just uploaded and block until all files have been written to the Galaxy database

        for ds in libset:

            last_filesize = 0

            while True:

                #If file_size != 0 and the file_size is different after a second iteration, then we assume the disk write is finished

                ds_id = ds['id']

                uploaded_file = display(api_key, api_url + 'libraries/%s/contents/%s' %(library_id, ds_id), return_formatted=False)

                print uploaded_file

                if uploaded_file['file_size'] != 0 and uploaded_file['file_size'] == last_filesize:

                    break

                else:

                    last_filesize = uploaded_file['file_size']

                    time.sleep(2)

 

However, when I run this I get the following output i.e. there is a new line after each character should you not use os.path.dirname:

 

./milxview_watch_folder.py de5f19fcf64a47ca9b61cfc3bf41490c http://barium-rbh/csiro/api/ "/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz,/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz" "This One" f2db41e1fa331b3e
/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz,/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz
/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
1
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z
,
/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
2
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z

/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
1
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z
,
/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
2
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z
before libset
after libset
Traceback (most recent call last):
  File "./milxview_watch_folder.py", line 127, in <module>
    main(api_key, api_url, fullpaths, data_library, workflow )
  File "./milxview_watch_folder.py", line 70, in main
    ds_id = ds['id']
TypeError: string indices must be integers, not str

 

 

From: Rob Leclerc [[hidden email]]
Sent: Wednesday, 29 May 2013 11:38 PM
To: Burdett, Neil (ICT Centre, Herston - RBWH)
Cc: [hidden email]; Dannon Baker
Subject: Re: Creating multiple datasets in a libset

 

Hi Neil,

 

I've attached my class function for uploading multiple files. 

 

 def upload_files(self, fullpaths):

        """

            Uploads files from a disk location to a Galaxy library

            Accepts an array of full path filenames

            Example: fullpaths = ['/home/username/file1.txt', '/home/username/files2.txt']

        """

        if self.jsonstring == None:

            self.get_library()

            

        library_id = self.library_id

        library_folder_id = self.library_folder_id

        api_key = self.api_key

        api_url = self.api_url

        

        #Galaxy needs to read the pathnames as a new line delimited string

        #so we do that transformation here

        fullpaths_string = ""

        for path in fullpaths:

            fullpaths_string = fullpaths_string + path + "\n"

            

        fullpaths_string = fullpaths_string[:-1]

        data = {}

        data['folder_id'] = library_folder_id

        data['file_type'] = 'auto'

        data['dbkey'] = ''

        data['upload_option'] = 'upload_paths'

        data['filesystem_paths'] = fullpaths_string

        data['create_type'] = 'file'

        #Start the upload. This will return right away, but it may take awhile

        libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)

        

        #Iterate through each dataset we just uploaded and block until all files have been written to the Galaxy database

        for ds in libset:

            last_filesize = 0

            while True:

                #If file_size != 0 and the file_size is different after a second iteration, then we assume the disk write is finished

                ds_id = ds['id']

                uploaded_file = display(api_key, api_url + 'libraries/%s/contents/%s' %(library_id, ds_id), return_formatted=False)

                print uploaded_file

                if uploaded_file['file_size'] != 0 and uploaded_file['file_size'] == last_filesize:

                    break

                else:

                    last_filesize = uploaded_file['file_size']

                    time.sleep(2)

        self.libset = libset

        return libset

 


Rob Leclerc, PhD

P: (US) <a href="tel:%2B1-%28917%29-873-3037" value="+19178733037" target="_blank">+1-(917)-873-3037

P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469" value="+8618616125469" target="_blank">+86-1-(861)-612-5469

Personal Email: [hidden email]

 

On Wed, May 29, 2013 at 12:45 AM, <[hidden email]> wrote:

Hi Guys,
         Did you manage to get multiple datasets working? I can't seem to upload multiple files. Only the last file appears in the history. I changed my code as mentioned in the thread below in "example_watch_folder.py" to add multiple files separated by a new line and increased the sleep time:

for fname in os.listdir(in_folder):
            fullpath = os.path.join(in_folder, fname)
            print ' fullpath is [%s] ' % fullpath
            if os.path.isfile(fullpath):
                data = {}
                data['folder_id'] = library_folder_id
                data['file_type'] = 'auto'
                data['dbkey'] = ''
                data['upload_option'] = 'upload_paths'
                data['filesystem_paths'] = "/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz"
                print ' data is [%s] ' % str(data['filesystem_paths'])
                data['create_type'] = 'file'
                libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)
                #TODO Handle this better, but the datatype isn't always
                # set for the followup workflow execution without this
                # pause.
                time.sleep(65)

However, I get the following crash:

./example_watch_folder.py 64f3209856a3cf4f2d034a1ad5bf851c http://barium-rbh/csiro/api/ /home/galaxy/galaxy-drop/input /home/galaxy/galaxy-drop/output "This One" f2db41e1fa331b3e

 fullpath is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
 data is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz
 /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
url is : http://barium-rbh/csiro/api/libraries/33b43b4e7093c91f/contents?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'file_type': 'auto', 'dbkey': '', 'create_type': 'file', 'folder_id': 'F33b43b4e7093c91f', 'upload_option': 'upload_paths', 'filesystem_paths': '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': 'ff5476bcf6c921fa'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['daecbdd824e1c349', '358eb58cd5463e0d', 'c0279aab05812500'], 'history': '3cc0effd29705aa3'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': '79966582feb6c081'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['19c51286b777bc04', '0f71f1fc170d4ab9', '256444f6e7017e58'], 'history': 'b701da857886499b'}
Traceback (most recent call last):
  File "./example_watch_folder.py", line 89, in <module>
    main(api_key, api_url, in_folder, out_folder, data_library, workflow )
  File "./example_watch_folder.py", line 75, in main
    shutil.move(fullpath, os.path.join(out_folder, fname))
  File "/usr/lib/python2.7/shutil.py", line 299, in move
    copy2(src, real_dst)
  File "/usr/lib/python2.7/shutil.py", line 128, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'

It says there is no such file, but this file has already been copied from the input to the output directory. Any help much appreciated

Neil

------------------------------

Message: 2
Date: Mon, 29 Apr 2013 16:11:39 -0400
From: Rob Leclerc <[hidden email]>
To: Dannon Baker <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Creating multiple datasets in a libset
Message-ID:
        <CAGkd85fHSgO2YC1T+Frctyso9G5rfQb=_mLyHGSdxPM+s3=[hidden email]>
Content-Type: text/plain; charset="iso-8859-1"

Hi Dannon,

I've written some code to (i) query a dataset to ensure that it's been
uploaded after a submit and (ii) to ensure a resulting dataset has been
written to the file.

*#Block until all datasets have been uploaded*
libset = submit(api_key, api_url + "libraries/%s/contents" % library_id,
data, return_formatted = False)
for ds in libset:
    while True:
        uploaded_file = display(api_key, api_url +
'libraries/%s/contents/%s' %(library_id, ds['id']), return_formatted=False)
        if uploaded_file['misc_info'] == None:
            time.sleep(1)
        else:
            break

*#Block until all result datasets have been saved to the filesystem*
result_ds_url = api_url + 'histories/' + history_id + '/contents/' +
dsh['id'];
while True:
    result_ds = display(api_key, result_ds_url, return_formatted=False)
        if result_ds["state"] == 'ok':
            break
        else:
            time.sleep(1)


Rob Leclerc, PhD
<http://www.linkedin.com/in/robleclerc> <https://twitter.com/#!/robleclerc>
P: (US) <a href="tel:%2B1-%28917%29-873-3037" target="_blank">+1-(917)-873-3037
P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469" target="_blank">+86-1-(861)-612-5469
Personal Email: [hidden email]


On Mon, Apr 29, 2013 at 11:18 AM, Dannon Baker <[hidden email]>wrote:

> Yep, that example filesystem_paths you suggest should work fine.  The
> sleep() bit was a complete hack from the start, for simplicity in
> demonstrating a very basic pipeline, but what you probably want to do for a
> real implementation is query the dataset in question via the API, verify
> that the datatype/etc have been set, and only after that execute the
> workflow; instead of relying on sleep.
>
>
> On Mon, Apr 29, 2013 at 9:24 AM, Rob Leclerc <[hidden email]>wrote:
>
>> Hi Dannon,
>>
>> Thanks for the response. Sorry to be pedantic, but just to make sure that
>> I understand the interpretation of this field on the other side of the API,
>> I would need to have something like the following:
>>
>> data['filesystem_paths'] = "/home/me/file1.vcf \n /home/me/file2.vcf /n
>> /home/me/file3.vcf"
>>
>> I assume I should also increase the time.sleep() to reflect the uploading
>> of extra files?
>>
>> Cheers,
>>
>> Rob
>>
>> Rob Leclerc, PhD
>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>> P: (US) <a href="tel:%2B1-%28917%29-873-3037" target="_blank">+1-(917)-873-3037
>> P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469" target="_blank">+86-1-(861)-612-5469
>> Personal Email: [hidden email]
>>
>>
>> On Mon, Apr 29, 2013 at 9:15 AM, Dannon Baker <[hidden email]>wrote:
>>
>>> Hey Rob,
>>>
>>> That example_watch_folder.py does just submit exactly one at a time,
>>> executes the workflow, and then does the next all in separate transactions.
>>>  If you wanted to upload multiple filepaths at once, you'd just append more
>>> to the ''filesystem_paths' field (newline separated paths).
>>>
>>> -Dannon
>>>
>>>
>>> On Fri, Apr 26, 2013 at 11:54 PM, Rob Leclerc <[hidden email]>wrote:
>>>
>>>> I'm looking at example_watch_folder.py and it's not clear from the
>>>> example how you submit multiple datasets to a library. In the example, the
>>>> first submit returns a libset [] with only a single entry and then proceeds
>>>> to iterate through each dataset in the libset in the following section:
>>>>
>>>> data = {}
>>>>
>>>>    data['folder_id'] = library_folder_id
>>>>
>>>>    data['file_type'] = 'auto'
>>>>
>>>>    data['dbkey'] = ''
>>>>
>>>>    data['upload_option'] = 'upload_paths'
>>>>
>>>>
>>>>
>>>> *data['filesystem_paths'] = fullpath*
>>>>
>>>>    data['create_type'] = 'file'
>>>>
>>>>    libset = submit(api_key, api_url + "libraries/%s/contents" %
>>>> library_id, data, return_formatted = False)
>>>>
>>>>    time.sleep(5)
>>>>
>>>>    for ds in libset:
>>>>
>>>>        if 'id' in ds:
>>>>
>>>>                         wf_data = {}
>>>>
>>>>                         wf_data['workflow_id'] = workflow['id']
>>>>
>>>>                         wf_data['history'] = "%s - %s" % (fname,
>>>> workflow['name'])
>>>>
>>>>                         wf_data['ds_map'] = {}
>>>>
>>>>                         for step_id, ds_in in workflow['inputs'
>>>> ].iteritems():
>>>>
>>>>                             wf_data['ds_map'][step_id] = {'src':'ld',
>>>> 'id':ds['id']}
>>>>
>>>>                         res = submit( api_key, api_url + 'workflows',
>>>> wf_data, return_formatted=False)
>>>>
>>>>
>>>>
>>>> Rob Leclerc, PhD
>>>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>>>> P: (US) <a href="tel:%2B1-%28917%29-873-3037" target="_blank">+1-(917)-873-3037
>>>> P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469" target="_blank">+86-1-(861)-612-5469
>>>> Personal Email: [hidden email]
>>>>
>>>> ___________________________________________________________
>>>> Please keep all replies on the list by using "reply all"
>>>> in your mail client.  To manage your subscriptions to this
>>>> and other Galaxy lists, please use the interface at:
>>>>   http://lists.bx.psu.edu/
>>>>
>>>> To search Galaxy mailing lists use the unified search at:
>>>>   http://galaxyproject.org/search/mailinglists/
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20130429/383c60a5/attachment-0001.html>

 



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Creating multiple datasets in a libset

Hakeem Almabrazi

Hi Rob,

 

Thank you for providing these classes.  I have downloaded these classes and tried to get to work for what I need.  I basically would like to be able to load multiple files (FASTq(s), bed(s)) files into new history and run a certain workflow (many steps).  I could not get around the idea of how to associate the loaded files to the workflow steps knowing that I will have multiple bed(s) files and they will have different names each time.  So this lack of understanding in addition to unfamiliarity with python could be the issue here. 

 

Here is the error I keep getting running your classes:

 

Traceback (most recent call last):

  File "watch_multifiles.py", line 285, in <module>

    main(api_key, api_url, in_folder, out_folder, data_library, workflow )

  File "watch_multifiles.py", line 248, in main

    galaxyHistory = gwf.execute_workflow(workflow_id, step_id_mapping,

history_name)

  File "watch_multifiles.py", line 158, in execute_workflow

    ds_map[step_id] =  step_id_mapping[label]    

KeyError: 'Input Dataset'

 

And here is what I did:

 

1.        I uncommented the main section as follow:

 

def main(api_key, api_url, in_folder, out_folder, library_name, workflow_id):

 

#    

#     EXAMPLE: api_key  http://local:8080/api/ /tmp/API/ /tmp/API/done "API imports" wf_id

     history_name = library_name + '-hist'

#    

     gl = GalaxyLibrary(api_key, api_url, library_name )

     fileset = ['/tmp/API/temp.bed', '/tmp/API/seq.fasta']

     gl.upload_files(fileset);

     libset =  gl.get_libset()

#

#     #Map workflow input labels with input files

     step_id_mapping =  get_map(libset)

     #Add Archived Datasets

#    

     gwf = GalaxyWorkflow(api_key, api_url, workflow_id)

     print("Hi there ***************************",history_name,   workflow_id)

     galaxyHistory = gwf.execute_workflow(workflow_id, step_id_mapping, history_name)

#    

     #Map the output label with a filename to export the results once the run is completed

     copyFileMap = {"output1":"output_file.txt"}

     gh = GalaxyHistory(api_key, api_url, history_id)

     gh.export_results("/tmp/API/", copyFileMap)

#    

#

#

#Map a workflows input labels to the file ids of input files. Assumes you've labeled your workflow inputs accordingly.

def get_map(libset):

    ds_map = {}

    #Add dynamic data sets

    for ds in libset:

        if ds['name'].endswith("bed"):## changed from "vcf"

                                    print("Got bed file:",ds)

                                    ds_map['INPUT_LABEL_1'] = { 'src' : 'ld', 'id' : ds['id'] }    #The first file you uploaded

        if ds['name'].endswith("fasta"):##changed from "bed"

                                    print("Got fasta file:",ds)

                                    ds_map['INPUT_LABEL_2'] = { 'src' : 'ld', 'id' : ds['id'] }    #The second file you uploaded

    #Add Archived Datasets

    ds_map['INPUT_LABEL_3'] = { 'src' : 'ldda', 'id' : '<id_to_an_input_file_already_stored_in_the_data_library>'}

    return ds_map

#

#if __name__ == '__main__':

#     main()

if __name__ == '__main__':

    try:

        api_key = sys.argv[1]

        api_url = sys.argv[2]

        in_folder = sys.argv[3]

        out_folder = sys.argv[4]

        data_library = sys.argv[5]

        workflow = sys.argv[6]

    except IndexError:

        print 'usage: %s key url in_folder out_folder data_library workflow' % os.path.basename( sys.argv[0] )

        sys.exit( 1 )

    main(api_key, api_url, in_folder, out_folder, data_library, workflow )

 

2.       I created a simple workflow that takes two input files (bed and fasta).  I changed the labels to what you suggested in your comment “INPUT_LABEL_1” and “INPUT_LABEL_2”

 

3.       Then I ran it (similar way to the example_watch_folder.py)

 

The two files get loaded to the data library but then right away I get the above error.

 

 

I appreciate your feedback in regards to this.

 

Regards,

Hakeem

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Rob Leclerc
Sent: Thursday, May 30, 2013 9:06 AM
To: <[hidden email]>
Cc: <[hidden email]>
Subject: Re: [galaxy-dev] Creating multiple datasets in a libset

 

Hi Niel,

 

I've attached my wrapper classes I'm using for the Galaxy API which allows me to upload multiple files. This was based on the watch_folder example. There is also a mapping mechanisms to map input and output files to their workflow labels. See the commented section below for some hints on how to use it

 

 

#################START##################

 

import config

import os

import shutil

import sys

import time

import logging

 

sys.path.insert( 0, os.path.dirname( __file__ ) )

from common import submit, display

 

logging.basicConfig(level=logging.INFO)

log = logging.getLogger('galaxy_api_utils')

 

class GalaxyLibrary:

    """

        Encapsulates basic functionality for creating a library and uploading files to a library

    """

    def __init__(self, api_key, api_url, libraryname):

        self.api_key = api_key

        self.api_url = api_url

        self.libraryname = libraryname

        self.jsonstring = None

        self.libset = {}

    

 

    def get_library(self):

        """

            Get's the library created/accessed when this class was initialized

            If the library does not exist then build it

            Returns library_id and the library_folder_id

        """

        if self.jsonstring != None:

            return self.jsonstring

        api_key = self.api_key

        api_url = self.api_url

        libraryname = self.libraryname

        

        libs = display(api_key, api_url + 'libraries', return_formatted=False)

        library_id = None

        for library in libs:

            if library['name'] == libraryname:

                library_id = library['id']

        if not library_id:

            lib_create_data = {'name':libraryname}

            library = submit(api_key, api_url + 'libraries', lib_create_data, return_formatted=False)

            library_id = library['id']

        folders = display(api_key, api_url + "libraries/%s/contents" % library_id, return_formatted = False)

        for f in folders:

            if f['name'] == "/":

                library_folder_id = f['id']

        if not library_id or not library_folder_id:

            log.error("GalaxyLibrary:get_library    Failure to configure library destination.")

            raise Exception('Failed to configure library destination')

        self.jsonstring = { 'library_id' : library_id, 'library_folder_id' : library_folder_id }

        return self.jsonstring

    

 

    def upload_files(self, fullpaths):

        """

            Uploads files from a disk location to a Galaxy library

            Accepts an array of full path filenames

            Example: fullpaths = ['/home/username/file1.txt', '/home/username/files2.txt']

        """

        if self.jsonstring == None:

            self.get_library()

            

        library_id = self.jsonstring['library_id']

        library_folder_id = self.jsonstring['library_folder_id']

        api_key = self.api_key

        api_url = self.api_url

        

        #Galaxy needs to read the pathnames as a new line delimited string

        #so we do that transformation here

        fullpaths_string = ""

        for path in fullpaths:

            fullpaths_string = fullpaths_string + path + "\n"

            

        fullpaths_string = fullpaths_string[:-1]

        data = {}

        data['folder_id'] = library_folder_id

        data['file_type'] = 'auto'

        data['dbkey'] = ''

        data['upload_option'] = 'upload_paths'

        data['filesystem_paths'] = fullpaths_string

        data['create_type'] = 'file'

        #Start the upload. This will return right away, but it may take awhile

        libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)

        

        #Iterate through each dataset we just uploaded and block until all files have been written to the Galaxy database

        for ds in libset:

            last_filesize = 0

            while True:

                #If file_size != 0 and the file_size is different after a second iteration, then we assume the disk write is finished

                ds_id = ds['id']

                uploaded_file = display(api_key, api_url + 'libraries/%s/contents/%s' %(library_id, ds_id), return_formatted=False)

                print uploaded_file

                if uploaded_file['file_size'] != 0 and uploaded_file['file_size'] == last_filesize:

                    break

                else:

                    last_filesize = uploaded_file['file_size']

                    time.sleep(2)

        self.libset = libset

        return libset

    

    

    def get_libset(self): 

        return self.libset

    

    

    def get_file_id(self, filename):

        """

            Gets the Galaxy file id for a file that we've uploaded with this object

        """

        libset = self.get_libset()

        for myfile in libset:

            if myfile['name'] == filename:

                return myfile['id']

        return None

 

 

class GalaxyWorkflow:

    """

        Encapsulates basic functionality to execute a workflow

    """

    def __init__(self, api_key, api_url, workflow_id):

        self.api_key = api_key

        self.api_url = api_url

        self.workflow_id = workflow_id

        self.history_name = None

        self.history_id = None

 

    def execute_workflow(self, workflow_id, step_id_mapping, history_name):

        """

            Variable: step_id_mapping

            Example mapping:

            step_id_mapping = { 'input_label_1' : { 'src' : 'ld', id: <file_id_from_uploaded_lib>' }},  { 'input_label_2' : { 'src' : 'ldda', id: <file_id_from_dataset>' }}

            where input_label_1 is the name at inputs.<step_id>.label which you want to associate with a particular fileid

            Files uplpaded should use src':'ld' while library datasets should be in 'src:ldda'

            history_name: should be a unique name for this run.

        """

        api_key = self.api_key

        api_url = self.api_url

        

        workflow = display(api_key, api_url + 'workflows/%s' % self.workflow_id, return_formatted = False)

        wf_data = {}

        wf_data['workflow_id'] = workflow['id']

        self.history_name = "%s - %s" % (history_name, workflow['name'])

        wf_data['history'] = self.history_name

        ds_map = {}

        

        #We need to map to the workflow's input label names to the step id since changes the galaxy database

        #will change the workflow step_id numbers

        for step_id, val in workflow['inputs'].iteritems():

            label = val['label']

            ds_map[step_id] =  step_id_mapping[label]     

        wf_data['ds_map'] = ds_map

        

        #Run the workflow. This will return immediately, but it will take awhile to run

        res = submit( api_key, api_url + 'workflows', wf_data, return_formatted=False)

        if res:

            log.info('GalaxyWorkflow:execute_workflow    workflow executed. Result:' + str(res))

            self.history_id = res['history']

            return GalaxyHistory(api_key, api_url, self.history_id)

        return None

 

 

class GalaxyHistory:

    """

        encapsulates basic functionality for exporting results from a history.

    """

    def __init__(self, api_key, api_url, history_id):

        self.api_key = api_key

        self.api_url = api_url

        self.history_id = history_id

    

    # Export all designated 'output' files in the workflow to an export location outside of galaxy

    # This just copies datasets located the $GALAXY_HOME/database/files/<a sub dir> to an output director

    # dest_folder is in the format such as /Users/Robs/

    # galaxyCopyFileMap 

    # We need to map the output filenames specified in the workflow to output filenames

    # Example format:

    # galaxyCopyFileMap = { 'galaxy_output_filename1' : 'myarchivename1.ext', ''galaxy_output_filename1' : 'myarchivename2.ext' }

    def export_results(self, dest_folder, copyFileMap):

        """

            Export all designated 'output' files in the workflow to an export location outside of galaxy

            This just copies datasets located the $GALAXY_HOME/database/files/<a sub dir> to an output director

            dest_folder is in the format such as /Users/Robs/

            galaxyCopyFileMap 

            We need to map the output filenames specified in the workflow to output filenames

            Example format:

            galaxyCopyFileMap = { 'galaxy_output_filename1' : 'myarchivename1.ext', ''galaxy_output_filename1' : 'myarchivename2.ext' }

        """

        api_key = self.api_key

        api_url = self.api_url

        history_id = self.history_id

        if dest_folder.endswith('/') == False:

            dest_folder = dest_folder + '/'

 

        history_contents = display(api_key, api_url + 'histories/' + history_id + '/contents', return_formatted=False)

        

        #Use the copyFileMap to copy the files designated with a specific output label to a filename for exporting out of galaxy

        for internalfn, newfn in copyFileMap.iteritems():

            for dsh in history_contents:

                if 'name' in dsh:

                    if dsh['name'] == internalfn:

                        result_ds_url = api_url + 'histories/' + history_id + '/contents/' + dsh['id'];

                        

                        #Block until all results have been written to disk

                        ds_file_size = 0

                        while True:

                            result_ds = display(api_key, result_ds_url, return_formatted=False)

                            print result_ds

                            if result_ds["file_size"] != 0 and result_ds["file_size"] == ds_file_size:

                                break

                            else:

                                ds_file_size = result_ds["file_size"]

                                time.sleep(2)

                        

                        result_file_name = result_ds['file_name'];

                        fname = dest_folder + newfn

                        shutil.copy(result_file_name, fname)

 

# def main():

#     

#     EXAMPLE:

#     workflow_id = "<your_workflow_id_here>"

#     api_key = "<your_api_key_here>"

#     api_url = "http://localhost:8080/api/"

#     library_name =  "<your_libraryname_here>"

#     history_name = library_name + '-hist'

#     

#     gl = GalaxyLibrary(api_key, api_url, library_name )

#     fileset = ['/Users/Rob/input1.vcf', '/Users/Rob/input1.bed']

#     gl.upload_files(fileset);

#     libset =  gl.get_libset()

#     #Map workflow input labels with input files

#     step_id_mapping =  get_map(libset)

#     #Add Archived Datasets

#     

#     gwf = GalaxyWorkflow(api_key, api_url, workflow_id)

#     galaxyHistory = gwf.execute_workflow(workflow_id, step_id_mapping, history_name)

#     

#     #Map the output label with a filename to export the results once the run is completed

#     copyFileMap = {"output1":"output_file.txt"}

#     gh = GalaxyHistory(api_key, api_url, history_id)

#     gh.export_results("/Users/Rob", copyFileMap)

#     

#Map a workflows input labels to the file ids of input files. Assumes you've labeled your workflow inputs accordingly.

#def get_map(libset):

#    ds_map = {}

#    #Add dynamic data sets

#    for ds in libset:

#        if ds['name'].endswith("vcf"):

#            ds_map['INPUT_LABEL_1'] = { 'src' : 'ld', 'id' : ds['id'] }    #The first file you uploaded

#        if ds['name'].endswith("bed"):

#            ds_map['INPUT_LABEL_2'] = { 'src' : 'ld', 'id' : ds['id'] }    #The second file you uploaded

#    #Add Archived Datasets

#    ds_map['INPUT_LABEL_3'] = { 'src' : 'ldda', 'id' : '<id_to_an_input_file_already_stored_in_the_data_library>'}

#    return ds_map

#

# if __name__ == '__main__':

#     main()

 


Rob Leclerc, PhD

Image removed by sender.Image removed by sender.

P: (US) +1-(917)-873-3037

P: (Shanghai) +86-1-(861)-612-5469

Personal Email: [hidden email]

 

On Thu, May 30, 2013 at 8:39 AM, Rob Leclerc <[hidden email]> wrote:

Hi Neil,

 

Is fullpaths a *new line* delimited string?



Rob Leclerc, PhD

Image removed by sender.Image removed by sender.

P: (US) <a href="tel:%2B1-%28917%29-873-3037" target="_blank"> +1-(917)-873-3037

P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469" target="_blank"> +86-1-(861)-612-5469

Personal Email: [hidden email]

 


On May 30, 2013, at 1:09 AM, <[hidden email]> wrote:

Hi Rob,

        Thanks for the class I assume you created it in the “example_watch_folder.py” or whatever you may have renamed it too? Can you send me the full python script if possible ?

 

I modified example_watch_folder.py as followed (using your code):

 

if __name__ == '__main__':

    try:

        api_key = sys.argv[1]

        api_url = sys.argv[2]

        #in_folder = sys.argv[3]

        #out_folder = sys.argv[4]

        fullpaths = sys.argv[3]

        data_library = sys.argv[4]

        workflow = sys.argv[5]

    except IndexError:

        print 'usage: %s key url in_folder out_folder data_library workflow' % os.path.basename( sys.argv[0] )

        sys.exit( 1 )

    #main(api_key, api_url, in_folder, out_folder, data_library, workflow )

    main(api_key, api_url, fullpaths, data_library, workflow )

 

#def main(api_key, api_url, in_folder, out_folder, data_library, workflow):

def main(api_key, api_url, fullpaths, data_library, workflow):

...

 

while 1:

        #Galaxy needs to read the pathnames as a new line delimited string

        #so we do that transformation here

        print fullpaths

        fullpaths_string = ""

        for path in fullpaths:

            fullpaths_string = fullpaths_string + path + "\n"

           

        fullpaths_string = fullpaths_string[:-1]

        data = {}

        data['folder_id'] = library_folder_id

        data['file_type'] = 'auto'

        data['dbkey'] = ''

        data['upload_option'] = 'upload_paths'

        data['filesystem_paths'] = fullpaths_string

        data['create_type'] = 'file'

        print "before libset "

        #Start the upload. This will return right away, but it may take awhile

        libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)

        print "after libset "

        #Iterate through each dataset we just uploaded and block until all files have been written to the Galaxy database

        for ds in libset:

            last_filesize = 0

            while True:

                #If file_size != 0 and the file_size is different after a second iteration, then we assume the disk write is finished

                ds_id = ds['id']

                uploaded_file = display(api_key, api_url + 'libraries/%s/contents/%s' %(library_id, ds_id), return_formatted=False)

                print uploaded_file

                if uploaded_file['file_size'] != 0 and uploaded_file['file_size'] == last_filesize:

                    break

                else:

                    last_filesize = uploaded_file['file_size']

                    time.sleep(2)

 

However, when I run this I get the following output i.e. there is a new line after each character should you not use os.path.dirname:

 

./milxview_watch_folder.py de5f19fcf64a47ca9b61cfc3bf41490c http://barium-rbh/csiro/api/ "/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz,/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz" "This One" f2db41e1fa331b3e
/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz,/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz
/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
1
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z
,
/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
2
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z

/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
1
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z
,
/
h
o
m
e
/
g
a
l
a
x
y
/
g
a
l
a
x
y
-
d
r
o
p
/
i
n
p
u
t
/
1
4
1
_
S
_
0
8
5
1
_
M
R
I
_
T
2
_
S
c
r
e
e
n
i
n
g
.
n
i
i
.
g
z
before libset
after libset
Traceback (most recent call last):
  File "./milxview_watch_folder.py", line 127, in <module>
    main(api_key, api_url, fullpaths, data_library, workflow )
  File "./milxview_watch_folder.py", line 70, in main
    ds_id = ds['id']
TypeError: string indices must be integers, not str

 

 

From: Rob Leclerc [[hidden email]]
Sent: Wednesday, 29 May 2013 11:38 PM
To: Burdett, Neil (ICT Centre, Herston - RBWH)
Cc: [hidden email]; Dannon Baker
Subject: Re: Creating multiple datasets in a libset

 

Hi Neil,

 

I've attached my class function for uploading multiple files. 

 

 def upload_files(self, fullpaths):

        """

            Uploads files from a disk location to a Galaxy library

            Accepts an array of full path filenames

            Example: fullpaths = ['/home/username/file1.txt', '/home/username/files2.txt']

        """

        if self.jsonstring == None:

            self.get_library()

            

        library_id = self.library_id

        library_folder_id = self.library_folder_id

        api_key = self.api_key

        api_url = self.api_url

        

        #Galaxy needs to read the pathnames as a new line delimited string

        #so we do that transformation here

        fullpaths_string = ""

        for path in fullpaths:

            fullpaths_string = fullpaths_string + path + "\n"

            

        fullpaths_string = fullpaths_string[:-1]

        data = {}

        data['folder_id'] = library_folder_id

        data['file_type'] = 'auto'

        data['dbkey'] = ''

        data['upload_option'] = 'upload_paths'

        data['filesystem_paths'] = fullpaths_string

        data['create_type'] = 'file'

        #Start the upload. This will return right away, but it may take awhile

        libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)

        

        #Iterate through each dataset we just uploaded and block until all files have been written to the Galaxy database

        for ds in libset:

            last_filesize = 0

            while True:

                #If file_size != 0 and the file_size is different after a second iteration, then we assume the disk write is finished

                ds_id = ds['id']

                uploaded_file = display(api_key, api_url + 'libraries/%s/contents/%s' %(library_id, ds_id), return_formatted=False)

                print uploaded_file

                if uploaded_file['file_size'] != 0 and uploaded_file['file_size'] == last_filesize:

                    break

                else:

                    last_filesize = uploaded_file['file_size']

                    time.sleep(2)

        self.libset = libset

        return libset

 


Rob Leclerc, PhD

Image removed by sender.Image removed by sender.

P: (US) <a href="tel:%2B1-%28917%29-873-3037" target="_blank">+1-(917)-873-3037

P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469" target="_blank">+86-1-(861)-612-5469

Personal Email: [hidden email]

 

On Wed, May 29, 2013 at 12:45 AM, <[hidden email]> wrote:

Hi Guys,
         Did you manage to get multiple datasets working? I can't seem to upload multiple files. Only the last file appears in the history. I changed my code as mentioned in the thread below in "example_watch_folder.py" to add multiple files separated by a new line and increased the sleep time:

for fname in os.listdir(in_folder):
            fullpath = os.path.join(in_folder, fname)
            print ' fullpath is [%s] ' % fullpath
            if os.path.isfile(fullpath):
                data = {}
                data['folder_id'] = library_folder_id
                data['file_type'] = 'auto'
                data['dbkey'] = ''
                data['upload_option'] = 'upload_paths'
                data['filesystem_paths'] = "/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz"
                print ' data is [%s] ' % str(data['filesystem_paths'])
                data['create_type'] = 'file'
                libset = submit(api_key, api_url + "libraries/%s/contents" % library_id, data, return_formatted = False)
                #TODO Handle this better, but the datatype isn't always
                # set for the followup workflow execution without this
                # pause.
                time.sleep(65)

However, I get the following crash:

./example_watch_folder.py 64f3209856a3cf4f2d034a1ad5bf851c http://barium-rbh/csiro/api/ /home/galaxy/galaxy-drop/input /home/galaxy/galaxy-drop/output "This One" f2db41e1fa331b3e

 fullpath is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
 data is [/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz
 /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz]
url is : http://barium-rbh/csiro/api/libraries/33b43b4e7093c91f/contents?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'file_type': 'auto', 'dbkey': '', 'create_type': 'file', 'folder_id': 'F33b43b4e7093c91f', 'upload_option': 'upload_paths', 'filesystem_paths': '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T1_Screening.nii.gz\n /home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': 'ff5476bcf6c921fa'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['daecbdd824e1c349', '358eb58cd5463e0d', 'c0279aab05812500'], 'history': '3cc0effd29705aa3'}
url is : http://barium-rbh/csiro/api/workflows?key=64f3209856a3cf4f2d034a1ad5bf851c
data is : {'workflow_id': 'f2db41e1fa331b3e', 'ds_map': {'14': {'src': 'ld', 'id': '79966582feb6c081'}}, 'history': '141_S_0851_MRI_T2_Screening.nii.gz - apiFullCTE'}
{'outputs': ['19c51286b777bc04', '0f71f1fc170d4ab9', '256444f6e7017e58'], 'history': 'b701da857886499b'}
Traceback (most recent call last):
  File "./example_watch_folder.py", line 89, in <module>
    main(api_key, api_url, in_folder, out_folder, data_library, workflow )
  File "./example_watch_folder.py", line 75, in main
    shutil.move(fullpath, os.path.join(out_folder, fname))
  File "/usr/lib/python2.7/shutil.py", line 299, in move
    copy2(src, real_dst)
  File "/usr/lib/python2.7/shutil.py", line 128, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '/home/galaxy/galaxy-drop/input/141_S_0851_MRI_T2_Screening.nii.gz'

It says there is no such file, but this file has already been copied from the input to the output directory. Any help much appreciated

Neil

------------------------------

Message: 2
Date: Mon, 29 Apr 2013 16:11:39 -0400
From: Rob Leclerc <[hidden email]>
To: Dannon Baker <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Creating multiple datasets in a libset
Message-ID:
        <CAGkd85fHSgO2YC1T+Frctyso9G5rfQb=_mLyHGSdxPM+s3=[hidden email]>
Content-Type: text/plain; charset="iso-8859-1"

Hi Dannon,

I've written some code to (i) query a dataset to ensure that it's been
uploaded after a submit and (ii) to ensure a resulting dataset has been
written to the file.

*#Block until all datasets have been uploaded*
libset = submit(api_key, api_url + "libraries/%s/contents" % library_id,
data, return_formatted = False)
for ds in libset:
    while True:
        uploaded_file = display(api_key, api_url +
'libraries/%s/contents/%s' %(library_id, ds['id']), return_formatted=False)
        if uploaded_file['misc_info'] == None:
            time.sleep(1)
        else:
            break

*#Block until all result datasets have been saved to the filesystem*
result_ds_url = api_url + 'histories/' + history_id + '/contents/' +
dsh['id'];
while True:
    result_ds = display(api_key, result_ds_url, return_formatted=False)
        if result_ds["state"] == 'ok':
            break
        else:
            time.sleep(1)


Rob Leclerc, PhD
<http://www.linkedin.com/in/robleclerc> <https://twitter.com/#!/robleclerc>
P: (US) <a href="tel:%2B1-%28917%29-873-3037" target="_blank">+1-(917)-873-3037
P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469" target="_blank">+86-1-(861)-612-5469
Personal Email: [hidden email]


On Mon, Apr 29, 2013 at 11:18 AM, Dannon Baker <[hidden email]>wrote:

> Yep, that example filesystem_paths you suggest should work fine.  The
> sleep() bit was a complete hack from the start, for simplicity in
> demonstrating a very basic pipeline, but what you probably want to do for a
> real implementation is query the dataset in question via the API, verify
> that the datatype/etc have been set, and only after that execute the
> workflow; instead of relying on sleep.
>
>
> On Mon, Apr 29, 2013 at 9:24 AM, Rob Leclerc <[hidden email]>wrote:
>
>> Hi Dannon,
>>
>> Thanks for the response. Sorry to be pedantic, but just to make sure that
>> I understand the interpretation of this field on the other side of the API,
>> I would need to have something like the following:
>>
>> data['filesystem_paths'] = "/home/me/file1.vcf \n /home/me/file2.vcf /n
>> /home/me/file3.vcf"
>>
>> I assume I should also increase the time.sleep() to reflect the uploading
>> of extra files?
>>
>> Cheers,
>>
>> Rob
>>
>> Rob Leclerc, PhD
>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>> P: (US) <a href="tel:%2B1-%28917%29-873-3037" target="_blank">+1-(917)-873-3037
>> P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469" target="_blank">+86-1-(861)-612-5469
>> Personal Email: [hidden email]
>>
>>
>> On Mon, Apr 29, 2013 at 9:15 AM, Dannon Baker <[hidden email]>wrote:
>>
>>> Hey Rob,
>>>
>>> That example_watch_folder.py does just submit exactly one at a time,
>>> executes the workflow, and then does the next all in separate transactions.
>>>  If you wanted to upload multiple filepaths at once, you'd just append more
>>> to the ''filesystem_paths' field (newline separated paths).
>>>
>>> -Dannon
>>>
>>>
>>> On Fri, Apr 26, 2013 at 11:54 PM, Rob Leclerc <[hidden email]>wrote:
>>>
>>>> I'm looking at example_watch_folder.py and it's not clear from the
>>>> example how you submit multiple datasets to a library. In the example, the
>>>> first submit returns a libset [] with only a single entry and then proceeds
>>>> to iterate through each dataset in the libset in the following section:
>>>>
>>>> data = {}
>>>>
>>>>    data['folder_id'] = library_folder_id
>>>>
>>>>    data['file_type'] = 'auto'
>>>>
>>>>    data['dbkey'] = ''
>>>>
>>>>    data['upload_option'] = 'upload_paths'
>>>>
>>>>
>>>>
>>>> *data['filesystem_paths'] = fullpath*
>>>>
>>>>    data['create_type'] = 'file'
>>>>
>>>>    libset = submit(api_key, api_url + "libraries/%s/contents" %
>>>> library_id, data, return_formatted = False)
>>>>
>>>>    time.sleep(5)
>>>>
>>>>    for ds in libset:
>>>>
>>>>        if 'id' in ds:
>>>>
>>>>                         wf_data = {}
>>>>
>>>>                         wf_data['workflow_id'] = workflow['id']
>>>>
>>>>                         wf_data['history'] = "%s - %s" % (fname,
>>>> workflow['name'])
>>>>
>>>>                         wf_data['ds_map'] = {}
>>>>
>>>>                         for step_id, ds_in in workflow['inputs'
>>>> ].iteritems():
>>>>
>>>>                             wf_data['ds_map'][step_id] = {'src':'ld',
>>>> 'id':ds['id']}
>>>>
>>>>                         res = submit( api_key, api_url + 'workflows',
>>>> wf_data, return_formatted=False)
>>>>
>>>>
>>>>
>>>> Rob Leclerc, PhD
>>>> <http://www.linkedin.com/in/robleclerc><https://twitter.com/#!/robleclerc>
>>>> P: (US) <a href="tel:%2B1-%28917%29-873-3037" target="_blank">+1-(917)-873-3037
>>>> P: (Shanghai) <a href="tel:%2B86-1-%28861%29-612-5469" target="_blank">+86-1-(861)-612-5469
>>>> Personal Email: [hidden email]
>>>>
>>>> ___________________________________________________________
>>>> Please keep all replies on the list by using "reply all"
>>>> in your mail client.  To manage your subscriptions to this
>>>> and other Galaxy lists, please use the interface at:
>>>>   http://lists.bx.psu.edu/
>>>>
>>>> To search Galaxy mailing lists use the unified search at:
>>>>   http://galaxyproject.org/search/mailinglists/
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.bx.psu.edu/pipermail/galaxy-dev/attachments/20130429/383c60a5/attachment-0001.html>

 

 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/