Search Documents in Textpresso¶
Search documents indexed by Textpresso through queries on fulltext or sentences.
These are the APIs to perform document searches:
-
POST
/v1/textpresso/api/search_documents
¶ Search for documents indexed by Textpresso. Requires authentication
Request JSON Object: - token (string) – a valid access token. See How to obtain an access token for further information on how to get one.
- query (object) – a query object (see Query Object for more details)
- include_fulltext (boolean) – whether to return the fulltext and abstract of the documents. Default value is false. Restricted to specific tokens due to copyright.
- include_all_sentences (boolean) – whether to return the text of all the sentences in the text. Default value is false. Restricted to specific tokens due to copyright.
- include_match_sentences (boolean) – whether to return the text of each matched sentence. Valid only for sentence searches. Default value is false
- since_num (int) – used for pagination. Skip the first results and return entries from the specified number. Note that the counter starts from 0 - i.e., the first document is number 0.
- count (int) – used for pagination. Return up to the specified number of results. Maximum value is 200
Response Datatype Format
The returned data is a json array of objects, each of which contains the following fields:
Response JSON Object: - identifier (string) – the document identifier
- score (string) – the score of the document - an absolute number that indicates the degree to which the document matches the provided query
- title (string) – the title of the document
- author (string) – the author(s) of the document
- accession (string) – the accession of the document
- journal (string) – the journal of the document
- year (string) – publication year
- doc_type (string) – the type of document (e.g., research article, review)
- fulltext (string) – the fulltext of the document. Only if include_fulltext is set to true in the request.
- abstract (string) – the abstract of the document. Only if include_fulltext is set to true in the request.
Response JSON Array of Objects: - all_sentences (string) – the text of each sentence. Only if include_all_sentences is set to true in the request.
- matched_sentences (string) – the text of each matched sentence. Only if include_match_sentences is set to true in the request and the query type is set to sentence.
Example request:
POST /v1/textpresso/api/search_documents HTTP/1.1 Host: textpressocentral.org:18080 Accept: application/json { "token": "123456789", "query": { "keywords": "DYN-1", "type": "document", "case_sensitive": false, "sort_by_year": false, "count": 2, "corpora": [ "C. elegans", "C. elegans Supplementals" ] } }
Example response:
HTTP/1.1 200 OK Vary: Accept Content-Type: text/javascript [ { "doc_type": "Journal_article", "score": 0.0418161, "identifier": "I5m", "title": "Factors regulating the abundance and localization of synaptobrevin in the plasma membrane.", "author": "Dittman JS ; Kaplan JM", "accession": " Other:doi:10.1073\\/pnas.0600784103 PMID:16844789 WBPaper00027755", "journal": "Proc Natl Acad Sci U S A" }, { "doc_type": "Journal_article", "score": 0.032331, "identifier": "B4r", "title": "A dynamin GTPase mutation causes a rapid and reversible temperature-inducible locomotion defect in C. elegans.", "author": "Clark SG ; Shurland D-L ; Meyerowitz EM ; Bargmann CI ; Van der Bliek AM", "accession": " Other:cgc2892 doi:10.1073\\/pnas.94.19.10438 PMID:9294229 WBPaper00002892", "journal": "Proc Natl Acad Sci U S A" } ]
Example request using Curl from the shell
curl -k -d "{\"token\":\"XXXXX\", \"query\": {\"keywords\": \"yeast AND two AND hybrid\", \"year\": \"2017\", \"type\": \"sentence\", \"corpora\": [\"C. elegans\"]}, \"include_sentences\": true}" https://textpressocentral.org:18080/v1/textpresso/api/search_documents
-
POST
/v1/textpresso/api/get_documents_count
¶ Get the number of documents that match a search query. Requires authentication
Request JSON Object: - token (string) – a valid access token. See How to obtain an access token for further information on how to get one.
- query (object) – a query object (see Query Object for more details)
Response Datatype Format
Response JSON Object: - counter (int) – the number of documents matching the query
Example request:
POST /v1/textpresso/api/get_documents_count HTTP/1.1 Host: textpressocentral.org:18080 Accept: application/json { "token": "123456789", "query": { "keywords": "DYN-1", "type": "document", "case_sensitive": false, "sort_by_year": false, "count": 2, "corpora": [ "C. elegans", "C. elegans Supplementals" ] } }
Example response:
HTTP/1.1 200 OK Vary: Accept Content-Type: text/javascript { "counter": 229 }
-
GET
/v1/textpresso/api/available_corpora
¶ Get the list of corpora available on the server
Response Data Format
A json array of strings
Example request:
GET /v1/textpresso/api/available_corpora HTTP/1.1 Host: textpressocentral.org:18080
Example response:
HTTP/1.1 200 OK Vary: Accept Content-Type: text/javascript ["C. elegans","C. elegans Supplementals","PMCOA C. elegans","PMCOA Animal"]
-
POST
/v1/textpresso/api/get_category_matches_document_fulltext
¶ Get the list of words in the fulltext of one or more documents that match a specified category. Requires authentication
Request JSON Object: - token (string) – a valid access token. See How to obtain an access token for further information on how to get one.
- query (object) – a query object used to search for the documents (see Query Object for more details)
- category (string) – a valid category in Textpresso format (e.g., “Gene (C. elegans) (tpgce:0000001)”) - see Textpresso central category browser for the complete list of supported categories.
Response Datatype Format
The returned data is a json array of objects, each of which represents a document matched by the provided query, and contains the following fields:
Response JSON Object: - identifier (string) – the document identifier
Response JSON Array of Objects: - matches (string) – the list of words in the fulltext of the document that matched the specified category
Example request:
POST /v1/textpresso/api/get_category_matches_document_fulltext HTTP/1.1 Host: textpressocentral.org:18080 Accept: application/json { "token": "123456789", "query": { "accession": "WBPaper00050052", "corpora": [ "C. elegans", "C. elegans Supplementals" ] }, "category": "Gene (C. elegans) (tpgce:0000001)" }
Example response:
HTTP/1.1 200 OK Vary: Accept Content-Type: text/javascript [ { "identifier":"C. elegans/WBPaper00050052/WBPaper00050052.tpcas", "matches": ["apl-1","cdc-42","ceh-36","daf-16","glp-1","hsf-1","ins-33","lin-14","lin-4","mec-4","pmp-3","rab-3","snb-1"] } ]