Which ID ('id', 'workflow_id', and 'dataset_id') should be used?

classic Classic list List threaded Threaded
3 messages Options
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Which ID ('id', 'workflow_id', and 'dataset_id') should be used?

Aarthi Mohan
Hi all,

I will appreciate your help in understanding the 'id' key returned from the API. I am using Galaxy Version 15.03 & bioblend Version 0.8.0.

Example:

I have highlighted the id and related fields with bold and red. 

>>> workflowClient.get_invocations('f7bb1edd6b95db62') 
[{u'inputs': {u'1': {u'src': u'hda', u'id': u'06d9fe130fbe098e'}}, u'update_time': u'2017-05-17T03:09:10', u'uuid': u'fd066a98-3aad-11e7-90e9-1cc1de6d5ef4', u'history_id': u'b8a0d6158b9961df', u'state': u'scheduled', u'workflow_id': u'915ae9a80309f157', u'steps':
...
 u'model_class': u'WorkflowInvocation', u'id': u'8c49be448cfe29bc'}]

Why is the 'workflow_id' different from the one I passed to the fucntion? And why is that 'workflow_id' is not found anywhere in the return value?

>>> historyClient.show_dataset(hid,'468b2dfe96a5a9a1')
{u'accessible': True, u'resubmitted': False, u'create_time': u'2017-05-17T03:04:02', u'download_url': u'/api/histories/b8a0d6158b9961df/contents/468b2dfe96a5a9a1/display', u'file_size': 545, u'dataset_id': u'56c890cbef28295c', u'id': u'468b2dfe96a5a9a1', u'misc_info': u'uploaded fastqsanger file', u'hda_ldda': u'hda', u'metadata_sequences': 5, u'state': u'ok', u'display_types': [], u'display_apps': [], u'type': u'file', u'file_path': None, u'misc_blurb': u'5 sequences', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><td>@1</td></tr><tr><td>tccacaagccattgtgtgtaattaaccactaattgtgtataagtttaaact</td></tr><tr><td>+</td></tr><tr><td>IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII</td></tr><tr><td>@2</td></tr><tr><td>tccacaagccattgtgtgtaattaaccactaattgtgtataagtttaaact</td></tr></table>', u'update_time': u'2017-05-17T03:04:06', u'data_type': u'galaxy.datatypes.sequence.FastqSanger', u'tags': [], u'deleted': False, u'history_id': u'b8a0d6158b9961df', u'meta_files': [], u'genome_build': u'?', u'hid': 1, u'model_class': u'HistoryDatasetAssociation', u'metadata_data_lines': 20, u'file_ext': u'fastqsanger', u'annotation': None, u'metadata_dbkey': u'?', u'history_content_type': u'dataset', u'name': u'a_1.fastq', u'extension': u'fastqsanger', u'visible': True, u'url': u'/api/histories/b8a0d6158b9961df/contents/468b2dfe96a5a9a1', u'uuid': u'aa6dcf49-6fe9-49e0-8064-c8bc275a37d5', u'visualizations': [], u'purged': False, u'api_type': u'file'}
 
>>> historyClient.show_dataset(hid,'56c890cbef28295c')
{u'accessible': True, u'resubmitted': False, u'create_time': u'2017-05-17T02:59:27', u'file_size': 64, u'dataset_id': u'9ccf9e6f1cf4d1fa', u'id': u'56c890cbef28295c', u'misc_info': u'##fileformat=VCFv4.1\n##FILTER=<ID=PASS,Description="All filters passed">\n##fileDate=20170517\n##source=freeBayes v0.9.20\n##reference=localref.fa\n##phasing=none\n##commandline="freebayes --bam localbam_0.bam --fasta-reference localref.fa --vcf /home/sphadmi', u'hda_ldda': u'hda', u'download_url': u'/api/histories/06ec17aefa2d49dd/contents/56c890cbef28295c/display', u'state': u'ok', u'display_types': [], u'display_apps': [], u'type': u'file', u'file_path': None, u'misc_blurb': u'0 lines', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><td>#Calculation and writing of high density regions has completed.</td></tr></table>', u'update_time': u'2017-05-17T02:59:36', u'data_type': u'galaxy.datatypes.data.Text', u'tags': [], u'deleted': False, u'history_id': u'06ec17aefa2d49dd', u'meta_files': [], u'genome_build': u'?', u'hid': 44, u'model_class': u'HistoryDatasetAssociation', u'metadata_data_lines': None, u'file_ext': u'txt', u'annotation': None, u'metadata_dbkey': u'?', u'history_content_type': u'dataset', u'name': u'High density regions', u'extension': u'txt', u'visible': False, u'url': u'/api/histories/06ec17aefa2d49dd/contents/56c890cbef28295c', u'uuid': u'8b8c70a4-cd2e-43d3-bc77-b06511557c96', u'visualizations': [], u'purged': False, u'api_type': u'file'}

Similarly, here the 'dataset_id' is different from the one I passed to show_dataset method. If I check the 'dataset_id' from first call, it points to another different file!

Please let me know which of these 'id' should be used and what would be the purpose of the other id?

Thanks for your help and time!

Best,
Aarthi




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which ID ('id', 'workflow_id', and 'dataset_id') should be used?

Nicola Soranzo-2
Hi Aarthi,
thanks for your email, see my replies inline.

On 17/05/17 08:21, Aarthi Mohan wrote:
Hi all,

I will appreciate your help in understanding the 'id' key returned from the API. I am using Galaxy Version 15.03 & bioblend Version 0.8.0.

Example:

I have highlighted the id and related fields with bold and red. 

>>> workflowClient.get_invocations('f7bb1edd6b95db62') 
[{u'inputs': {u'1': {u'src': u'hda', u'id': u'06d9fe130fbe098e'}}, u'update_time': u'2017-05-17T03:09:10', u'uuid': u'fd066a98-3aad-11e7-90e9-1cc1de6d5ef4', u'history_id': u'b8a0d6158b9961df', u'state': u'scheduled', u'workflow_id': u'915ae9a80309f157', u'steps':
...
 u'model_class': u'WorkflowInvocation', u'id': u'8c49be448cfe29bc'}]

Why is the 'workflow_id' different from the one I passed to the fucntion? And why is that 'workflow_id' is not found anywhere in the return value?

The confusion here is generated by the API mixing 2 concepts used by Galaxy to manage workflows: "stored workflows" and "workflows". A stored workflow represents a workflow throughout its life (storing name, description, owner, if it's deleted/published...), while a workflow is particular version of a stored workflow, with the description of the various input, steps, subworkflows. Every time you modify and save a stored workflow in the UI, a new workflow is generated and associated to the stored workflow. The stored workflow is always linked to the latest workflow version.

The ids used to interact with the API are the stored workflow ids ('f7bb1edd6b95db62' in your example above), while get_invocations() returns the workflow id ('915ae9a80309f157' in your case). That's because an invocation derives from a particular version of the workflow. It may be good to extend the API to also return the stored workflow id.


>>> historyClient.show_dataset(hid,'468b2dfe96a5a9a1')
{u'accessible': True, u'resubmitted': False, u'create_time': u'2017-05-17T03:04:02', u'download_url': u'/api/histories/b8a0d6158b9961df/contents/468b2dfe96a5a9a1/display', u'file_size': 545, u'dataset_id': u'56c890cbef28295c', u'id': u'468b2dfe96a5a9a1', u'misc_info': u'uploaded fastqsanger file', u'hda_ldda': u'hda', u'metadata_sequences': 5, u'state': u'ok', u'display_types': [], u'display_apps': [], u'type': u'file', u'file_path': None, u'misc_blurb': u'5 sequences', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><td>@1</td></tr><tr><td>tccacaagccattgtgtgtaattaaccactaattgtgtataagtttaaact</td></tr><tr><td>+</td></tr><tr><td>IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII</td></tr><tr><td>@2</td></tr><tr><td>tccacaagccattgtgtgtaattaaccactaattgtgtataagtttaaact</td></tr></table>', u'update_time': u'2017-05-17T03:04:06', u'data_type': u'galaxy.datatypes.sequence.FastqSanger', u'tags': [], u'deleted': False, u'history_id': u'b8a0d6158b9961df', u'meta_files': [], u'genome_build': u'?', u'hid': 1, u'model_class': u'HistoryDatasetAssociation', u'metadata_data_lines': 20, u'file_ext': u'fastqsanger', u'annotation': None, u'metadata_dbkey': u'?', u'history_content_type': u'dataset', u'name': u'a_1.fastq', u'extension': u'fastqsanger', u'visible': True, u'url': u'/api/histories/b8a0d6158b9961df/contents/468b2dfe96a5a9a1', u'uuid': u'aa6dcf49-6fe9-49e0-8064-c8bc275a37d5', u'visualizations': [], u'purged': False, u'api_type': u'file'}
 
>>> historyClient.show_dataset(hid,'56c890cbef28295c')
{u'accessible': True, u'resubmitted': False, u'create_time': u'2017-05-17T02:59:27', u'file_size': 64, u'dataset_id': u'9ccf9e6f1cf4d1fa', u'id': u'56c890cbef28295c', u'misc_info': u'##fileformat=VCFv4.1\n##FILTER=<ID=PASS,Description="All filters passed">\n##fileDate=20170517\n##source=freeBayes v0.9.20\n##reference=localref.fa\n##phasing=none\n##commandline="freebayes --bam localbam_0.bam --fasta-reference localref.fa --vcf /home/sphadmi', u'hda_ldda': u'hda', u'download_url': u'/api/histories/06ec17aefa2d49dd/contents/56c890cbef28295c/display', u'state': u'ok', u'display_types': [], u'display_apps': [], u'type': u'file', u'file_path': None, u'misc_blurb': u'0 lines', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><td>#Calculation and writing of high density regions has completed.</td></tr></table>', u'update_time': u'2017-05-17T02:59:36', u'data_type': u'galaxy.datatypes.data.Text', u'tags': [], u'deleted': False, u'history_id': u'06ec17aefa2d49dd', u'meta_files': [], u'genome_build': u'?', u'hid': 44, u'model_class': u'HistoryDatasetAssociation', u'metadata_data_lines': None, u'file_ext': u'txt', u'annotation': None, u'metadata_dbkey': u'?', u'history_content_type': u'dataset', u'name': u'High density regions', u'extension': u'txt', u'visible': False, u'url': u'/api/histories/06ec17aefa2d49dd/contents/56c890cbef28295c', u'uuid': u'8b8c70a4-cd2e-43d3-bc77-b06511557c96', u'visualizations': [], u'purged': False, u'api_type': u'file'}

Similarly, here the 'dataset_id' is different from the one I passed to show_dataset method. If I check the 'dataset_id' from first call, it points to another different file!

There's nothing wrong here, the API returns the id of the history dataset you requested in the 'id' field. The 'dataset_id' does not refer to a "history dataset", but to the more general "dataset". A history dataset is a particular instance of a dataset in one history, but the same dataset can be used in other histories or libraries and can be shared with other users. So you may have multiple history datasets and library datasets all pointing to the same file on disk.

Cheers,
Nicola

Please let me know which of these 'id' should be used and what would be the purpose of the other id?

Thanks for your help and time!

Best,
Aarthi





___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which ID ('id', 'workflow_id', and 'dataset_id') should be used?

Aarthi Mohan
Thanks for the detailed explanation Nicola! 

Best,
Aarthi

On Wed, May 17, 2017 at 6:29 PM, Nicola Soranzo <[hidden email]> wrote:
Hi Aarthi,
thanks for your email, see my replies inline.

On 17/05/17 08:21, Aarthi Mohan wrote:
Hi all,

I will appreciate your help in understanding the 'id' key returned from the API. I am using Galaxy Version 15.03 & bioblend Version 0.8.0.

Example:

I have highlighted the id and related fields with bold and red. 

>>> workflowClient.get_invocations('f7bb1edd6b95db62'
[{u'inputs': {u'1': {u'src': u'hda', u'id': u'06d9fe130fbe098e'}}, u'update_time': u'2017-05-17T03:09:10', u'uuid': u'fd066a98-3aad-11e7-90e9-1cc1de6d5ef4', u'history_id': u'b8a0d6158b9961df', u'state': u'scheduled', u'workflow_id': u'915ae9a80309f157', u'steps':
...
 u'model_class': u'WorkflowInvocation', u'id': u'8c49be448cfe29bc'}]

Why is the 'workflow_id' different from the one I passed to the fucntion? And why is that 'workflow_id' is not found anywhere in the return value?

The confusion here is generated by the API mixing 2 concepts used by Galaxy to manage workflows: "stored workflows" and "workflows". A stored workflow represents a workflow throughout its life (storing name, description, owner, if it's deleted/published...), while a workflow is particular version of a stored workflow, with the description of the various input, steps, subworkflows. Every time you modify and save a stored workflow in the UI, a new workflow is generated and associated to the stored workflow. The stored workflow is always linked to the latest workflow version.

The ids used to interact with the API are the stored workflow ids ('f7bb1edd6b95db62' in your example above), while get_invocations() returns the workflow id ('915ae9a80309f157' in your case). That's because an invocation derives from a particular version of the workflow. It may be good to extend the API to also return the stored workflow id.


>>> historyClient.show_dataset(hid,'468b2dfe96a5a9a1')
{u'accessible': True, u'resubmitted': False, u'create_time': u'2017-05-17T03:04:02', u'download_url': u'/api/histories/b8a0d6158b9961df/contents/468b2dfe96a5a9a1/display', u'file_size': 545, u'dataset_id': u'56c890cbef28295c', u'id': u'468b2dfe96a5a9a1', u'misc_info': u'uploaded fastqsanger file', u'hda_ldda': u'hda', u'metadata_sequences': 5, u'state': u'ok', u'display_types': [], u'display_apps': [], u'type': u'file', u'file_path': None, u'misc_blurb': u'5 sequences', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><td>@1</td></tr><tr><td>tccacaagccattgtgtgtaattaaccactaattgtgtataagtttaaact</td></tr><tr><td>+</td></tr><tr><td>IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII</td></tr><tr><td>@2</td></tr><tr><td>tccacaagccattgtgtgtaattaaccactaattgtgtataagtttaaact</td></tr></table>', u'update_time': u'2017-05-17T03:04:06', u'data_type': u'galaxy.datatypes.sequence.FastqSanger', u'tags': [], u'deleted': False, u'history_id': u'b8a0d6158b9961df', u'meta_files': [], u'genome_build': u'?', u'hid': 1, u'model_class': u'HistoryDatasetAssociation', u'metadata_data_lines': 20, u'file_ext': u'fastqsanger', u'annotation': None, u'metadata_dbkey': u'?', u'history_content_type': u'dataset', u'name': u'a_1.fastq', u'extension': u'fastqsanger', u'visible': True, u'url': u'/api/histories/b8a0d6158b9961df/contents/468b2dfe96a5a9a1', u'uuid': u'aa6dcf49-6fe9-49e0-8064-c8bc275a37d5', u'visualizations': [], u'purged': False, u'api_type': u'file'}
 
>>> historyClient.show_dataset(hid,'56c890cbef28295c')
{u'accessible': True, u'resubmitted': False, u'create_time': u'2017-05-17T02:59:27', u'file_size': 64, u'dataset_id': u'9ccf9e6f1cf4d1fa', u'id': u'56c890cbef28295c', u'misc_info': u'##fileformat=VCFv4.1\n##FILTER=<ID=PASS,Description="All filters passed">\n##fileDate=20170517\n##source=freeBayes v0.9.20\n##reference=localref.fa\n##phasing=none\n##commandline="freebayes --bam localbam_0.bam --fasta-reference localref.fa --vcf /home/sphadmi', u'hda_ldda': u'hda', u'download_url': u'/api/histories/06ec17aefa2d49dd/contents/56c890cbef28295c/display', u'state': u'ok', u'display_types': [], u'display_apps': [], u'type': u'file', u'file_path': None, u'misc_blurb': u'0 lines', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><td>#Calculation and writing of high density regions has completed.</td></tr></table>', u'update_time': u'2017-05-17T02:59:36', u'data_type': u'galaxy.datatypes.data.Text', u'tags': [], u'deleted': False, u'history_id': u'06ec17aefa2d49dd', u'meta_files': [], u'genome_build': u'?', u'hid': 44, u'model_class': u'HistoryDatasetAssociation', u'metadata_data_lines': None, u'file_ext': u'txt', u'annotation': None, u'metadata_dbkey': u'?', u'history_content_type': u'dataset', u'name': u'High density regions', u'extension': u'txt', u'visible': False, u'url': u'/api/histories/06ec17aefa2d49dd/contents/56c890cbef28295c', u'uuid': u'8b8c70a4-cd2e-43d3-bc77-b06511557c96', u'visualizations': [], u'purged': False, u'api_type': u'file'}

Similarly, here the 'dataset_id' is different from the one I passed to show_dataset method. If I check the 'dataset_id' from first call, it points to another different file!

There's nothing wrong here, the API returns the id of the history dataset you requested in the 'id' field. The 'dataset_id' does not refer to a "history dataset", but to the more general "dataset". A history dataset is a particular instance of a dataset in one history, but the same dataset can be used in other histories or libraries and can be shared with other users. So you may have multiple history datasets and library datasets all pointing to the same file on disk.

Cheers,
Nicola

Please let me know which of these 'id' should be used and what would be the purpose of the other id?

Thanks for your help and time!

Best,
Aarthi





___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Loading...