-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.md
132 lines (84 loc) · 5.38 KB
/
README.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
# Biobtree
<!--[](https://dev.azure.com/biobtree/biobtree/_build/latest?definitionId=1&branchName=master) -->
Biobtree is a bioinformatics tool which allows mapping the bioinformatics datasets
via identifiers and special keywors with simple or advance chain query capability.
<!---
## Demo
Demo of web interface which covers all the datasets with examples. This demo covers relatively old data from 2020 for the latest data use the tool directly.
https://www.ebi.ac.uk/~tgur/biobtree/
-->
## Features
* **Datasets** - supports wide datasets such as `Ensembl` `Uniprot` `ChEMBL` `HMDB` `Taxonomy` `GO` `EFO` `HGNC` `ECO` `Uniparc` `Uniref` with tens of more via cross references
by retrieving latest data from providers
* **MapReduce** - processes small or large datasets based on users selection and build B+ tree based uniform local database via specialized MapReduce based tecnique with efficient storage usage
* **Query** - Allow simple or advance chain queries between datasets with intiutive syntax which allows writing RDF or graph like queries
* **Genome** - supports querying full Ensembl genomes coordinates with `transcript`, `CDS`, `exon`, `utr` with several attiributes, mapped datasets and identifiers such as `ortholog` ,`paralog` or probe identifers belongs `Affymetrix` or `Illumina`
* **Protein** - Uniprot proteins including protein features with variations and mapped datasets.
* **Chemistry** - `ChEMBL` and `HMDB` datasets supported for chemistry, disease and drug releated analaysis
* **Taxonomy & Ontologies** - `Taxonomy` `GO` `EFO` `ECO` data with mapping to other datasets and child and parent query capability
* **Your data** - Your custom data can be integrated with or without relation to other datasets
* **Web UI** - Web interface for easy explorations and examples
* **Web Services** - REST or gRPC services
* **R & Python** - [Bioconductor R](https://github.com/tamerh/biobtreeR) and [Python](https://github.com/tamerh/biobtreePy) wrapper packages to use from existing pipelines easier with built-in databases
### Usage
First install [latest](https://github.com/tamerh/biobtree/releases/latest) biobtree executable available for Windows, Mac or Linux. Then extract the downloaded file to a new folder and open a terminal in this new folder directory and starts the biobtree. Alternatively R and Python based [biobtreeR](https://github.com/tamerh/biobtreeR) and [biobtreePy](https://github.com/tamerh/biobtreePy) wrapper packages can be used instead of using the executable directly for eaiser integration.
#### Starting biobtree with target datasets or genomes
```sh
# build ensembl genomes by tax id with uniprot&taxonomy datasets
biobtree --tax 595,984254 -d "uniprot,taxonomy" build
# build datasets only
biobtree -d "uniprot,taxonomy,hgnc" build
biobtree -d "hgnc,chembl,hmdb" build
# once data is built start web for using ws and ui
biobtree web
# to see all options and datasets use help
biobtree help
```
#### Starting biobtree with built-in databases
```sh
# 4 built-in database provided with commonly studied datasets and organism genomes in order to speed up database build process
# Check following func doc for each database content
# https://github.com/tamerh/biobtreeR/blob/master/R/buildData.R
biobtree --pre-built 1 install
biobtree web
```
Builting databases updated regularly at least for each Ensembl release and all builtin database files along with configuration files are hosted in spererate github [repository](https://github.com/tamerh/biobtree-conf)
### Web service endpoints
```ruby
# Meta
# datasets meta informations
localhost:8888/ws/meta
# Search
# i is the only mandatory parameter
localhost:8888/ws/?i={terms}&s={dataset}&p={page}&f={filter}
# Mapping
# i and m are mandatory parameters
localhost:8888/ws/map/?i={terms}&m={mapfilter_query}&s={dataset}&p={page}
# Retrieve dataset entry. Both paramters are mandatory
localhost:8888/ws/entry/?i={identifier}&s={dataset}
# Retrieve entry with filtered mapping entries. Only page parameter is optional
localhost:8888/ws/filter/?i={identifier}&s={dataset}&f={filter_datasets}&p={page}
# Retrieve entry results with page index. All the parameters are mandatory
localhost:8888/ws/page/?i={identifier}&s={dataset}&p={page}&t={total}
```
<!-- ### Integrating your dataset
User data can be integrated to biobtree. Since biobtree has capability to process large datasets, this feature creates an alternative for mapping related data to be indexed with biobtree. Data should be gzipped and in an xml format compliant with UniProt xml schema [definition](ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot.xsd). Once data has been prepared, file location needs to be configured in biobtree configuration file which is located at `conf/source.dataset.json`. After these configuration dataset used similarly with other dataset like
```sh
biobtree -d "+my_data" start
``` -->
### Publication
https://f1000research.com/articles/8-145
### Building source
biobtree is written with GO for the data processing and Vue.js for the web application part. To build and the create biobtree executable install go>=1.13 and run
```sh
go build
```
To build the web application for development in the web directory run
```sh
npm install
npm run serve
```
To build the web package run
```sh
npm run build
```