![]() ![]() Moreover, we can extract the relevant passages and information for a certain question from huge amounts of textual literature or patent information instead of delivering only a list of hit documents.” “TDM enables us to do more complex searches using a large number of synonyms through ontologies, where regular search systems reach their limitations. Now, what’s important is not just the information within individual records but the relationship among those pieces of information. An information scientist at a large pharmaceutical company described how TDM enabled her to find relationships based not only on abstracts but also the full text, and that the value of TDM grows with full-text availability. As it turns out, access to the full-text of an article or report isn’t necessarily the richest form of content anymore. Today, info pros face the problem of having more digitized information available than can be contained in any database, and their concern is how to make sense of the knowledge buried within all that information. A search for opioid dependence, for example, would be expanded to include the MeSH descriptors Opioid-Related Disorders, Heroin Dependence, Morphine Dependence, and Opium Dependence. The original query could be expanded to include not only the MeSH descriptor for that medical concept but also all the descriptors that fall within that concept. For example, an info pro may want to improve his researchers’ discovery process with an API that takes a query, identifies any search terms that reference medical concepts, and then looks up each concept in the US National Library of Medicine’s Medical Subject Headings (MeSH) hierarchical thesaurus. The Google Books Ngram Viewer provides insight into books, but TDM projects can also involve multiple databases or data collections. Figure 1 shows that use of the phrase “head nurse” peaked during the 1940s and 1950s.įigure 1: Google Books Ngram Viewer search result ![]() Note that it is not necessary to specify what words were used as adjectives modifying the word “nurse” the query retrieves and sorts all the adjectives to identify the ten most frequent phrases and their relative frequency over time. We can query the Ngram Viewer for instances in which the word “nurse” is used as a noun (rather than a verb, as in nursing someone back to health) and is modified by an adjective. For example, each word in the sentence The school nurse treated the boy is analyzed for meaning and relationship – the word “school” is an adjective modifying the noun “nurse”, and the subject “nurse” is conducting the action of “treated” to the object “boy”. This Google project analyzed the digitized content of millions of books, parsing each word and sentence. The Google Books Ngram Viewer (/ngrams) demonstrates the power of TDM when applied to the full text of books. As simple as this looks, it can be transformative when applied to all the types of information in a dataset-a book chapter, an organization profile, an author, and so on. Similarly, semantic triples generated from a bibliographic record might include this_article – has_the_author – John_Doe and John_Doe – is_affiliated_with –Drexel_University. A fact such as The sky is blue could be represented by the triple the_sky – has_the_color – blue. These individual information units, called “semantic triples”, consist of three elements that reflect a piece of knowledge, in the format subject – predicate – object. Each record is analyzed and individual pieces of information are extracted by a TDM tool in a structured format. What is revolutionary is the ability of researchers to explore a dataset without knowing what specific questions to ask.Ī TDM project usually starts with a large corpus of data, such as a bibliographic database of citations and abstracts of research articles, or an authoritative reference source. The goal of text and data mining is to filter through information, identify pieces of data, and find the relationships and patterns among them. Info pros bring a unique perspective to TDM projects-they understand how information is used within their organizations, and they know how to make that information more discoverable and hence more valuable. What has revolutionized how info pros look at the research landscape is the development of sophisticated tools for text and data mining (TDM) of large data sets. They are often responsible for evaluating and managing subscriptions to value-added online services such as Springer Nature they identify and acquire specialized datasets for researchers and they manage and make discoverable internal resources and collections. Information professionals, knowledge workers and librarians have a long familiarity with managing and searching within large sets of information. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |