• Toxicity Case Study • Continuing Education for your Practice Manager • Employee Highlight SUMMER 2004 VOLUME 1, ISSUE 1 VetCare is Committed to Staff Continued Educationand practical. We feel that standardized classes will help lead to greater unifor-mity in staff nursing technique and help attendance to our employee evalu-ation process to encourage staff to: at- V
- A |
J |K |
U |V |
Microsoft word - herbalmethodspaper_jbi-resubmit_2-28Identifying potential adverse effects using the web: a new approach to medical Adrian Benton, BAa,*, Lyle Ungar, PhDc, Shawndra Hill, PhDb, Sean Hennessy, PharmD, PhDa, Jun Mao, MD, MSCEa, Annie Chung, BAa, Charles E. Leonard, PharmDa, John H. Holmes, a University of Pennsylvania School of Medicine, Philadelphia, PA b University of Pennsylvania, The Wharton School, Philadelphia, PA c University of Pennsylvania School of Engineering and Applied Science, Philadelphia, PA * Corresponding author, [email protected] (Adrian Benton) University of Pennsylvania School of Medicine Center for Clinical Epidemiology and Biostatistics 729 Blockley Hall 423 Guardian Drive Philadelphia, PA 19104‐6021 Medical message boards are online resources where users with a particular condition exchange information, some of which they might not otherwise share with medical providers. Many of these boards contain many posts and contain patient opinions and experiences that would be potentially useful to clinicians and researchers. We present an approach that is able to collect a corpus of medical message board posts, de-identify the corpus, and extract information on potential adverse drug effects discussed by users. Using a corpus of posts to breast cancer message boards, we identified drug event pairs using co-occurrence statistics. We then compared the identified drug event pairs with adverse effects listed on the package labels of tamoxifen, anastrazole, exemestane, and letrozole. Of the pairs identified by our system, 75-80% were documented on the drug labels. Some pairs may represent previously unidentified Keywords: data mining; information extraction; medical message board; drug adverse 1. Introduction
It is well known that the lay public frequently uses online resources such as message boards to seek and exchange medical information [1-5]. Users of these resources ask questions and seek advice about topics that they may be hesitant to discuss with their health care providers. One such topic is concern about adverse effects experienced with some medications, especially those prescribed for serious conditions such as cancer, where patient anxiety may be heightened by the characteristics of the disease and the long-term exposure to potentially toxic drugs. Although adverse events should be reported through available channels, such as the Adverse Event Reporting System (AERS) of the U.S. Food and Drug Administration, many patients do not do so, perhaps because of ignorance of these channels, embarrassment, perceptions of poor provider availability, etc. Instead, they often use informal networks such as internet message boards to report and discuss adverse events. The burgeoning of medical message boards provides evidence of the frequency of discussions about health-related matters in society at large, and the boards in turn provide a fertile source of data for identifying and evaluating the advice, concerns, and opinions that are sought and provided by message board users. However, these data remain largely untapped by researchers in While there are several studies evaluating discussions in message boards and chat rooms, these have used qualitative research methods.[6-10] To date, there are no reports in the medical or social sciences literature that apply text mining methods to these discussions to identify and analyze reports of adverse drug events. Reasons for this include the unstructured nature of message board text, the use of nonstandard abbreviations, a wide variety in the use of syntax and spelling, the temporal relationship between posts in a single thread, and references within messages to other threads. Despite the difficulties associated with using message board text, researchers in many domains, particularly marketing, have successfully applied methods to extract information from this irregular medium. Glance, et al.  have developed a comprehensive system that crawls message boards and derives several different metrics about products including how much people discuss them and how much users seem to like those products. Feldman, et al.  have also published a system that extracts comparisons between different products and the attributes that they are compared on. Other systems have focused on different forms of web content such as product reviews in order to extract people’s sentiment about certain products [13;14] or have relied on the pages produced by Google searches to extract the reputation of a Medical message boards have been examined to a much lesser degree. Malouf, Davidson, and Sherman  applied sentiment analysis to epilepsy blogs in order to extract patient preferences for different seizure disorder drugs. Although not focused on online user-generated content, pharmacovigilance researchers have also applied data mining methods in order to extract signals of possible adverse drug events from databases such as the AERS [17-19]. Electronic health records have also been examined to identify similar signals  as well as to generate more general biomedical hypotheses . Though there is a history of data mining in the medical domain, it has yet to exploit the knowledge that can be derived from online user-generated content. We present here methods that we used to identify such knowledge, specifically self- reported adverse events that may be associated with four hormonal medications that are commonly used in the treatment of breast cancer. 2. Methods
In order to validate our system, we selected four of the most commonly used drugs to treat breast cancer: tamoxifen (Nolvadex), anastrazole (Arimidex), letrozole (Femara), and exemestane (Aromasin). Tamoxifen has long been the standard treatment for hormonally-responsive breast cancer. Anastrazole, letrozole, and exemestane are aromatase inhibitors, more recently developed, that are also used to treat hormonally- responsive breast cancer. We chose these four drugs because they have similar indications and were frequently mentioned in the breast cancer corpus. Table 1 shows the top 10 association rules by count identified by our system for tamoxifen. Table 1. Top ten rules returned for tamoxifen ranked by count. Drug+/Event+ is the number of messages that contain the drug and event co-occurring within a 20 token window, Drug+/Event- is the number of messages containing only the drug and not the event, Drug-/Event+ is the number of messages containing only the event and not the drug, and Drug-/Event- is the number of messages mentioning neither the drug nor the event. P-values in this list are very close to 0 or 1. For each of these four drugs, we compiled a documented list of all adverse events (AEs) believed to occur from the drug. This list was compiled from all of the AEs mentioned in tables and notes contained in the drug label. For each of the four drugs, we compiled a list of significantly associated events returned by our system. We then evaluated our system’s performance by comparing our system list against the documented list for that particular drug using the precision and recall metrics. In this context, precision was defined as the proportion of the events we found that occur in the documented list as AEs: Recall was defined as the proportion of documented AEs occurring in the system list: This method of evaluation is similar to that used to evaluate a pharmacovigilance system over electronic health records by Wang, et al. . Table 2. Recall refers to the proportion of terms in the documented list that matched terms in the system list, over the total number of terms in the documented list. Precision refers to the proportion of terms in the system list that matched terms in the documented list over the total number of terms in the system list. ‘N’ refers to the size of the denominator for each value calculated. 3.2. Identifying rare and novel events In addition to evaluating our system for its ability to identify many of the AEs listed on the drug label, we also investigated the events returned by our system that were either documented as rare AEs or were not listed on the drug label at all. For example, our system identified many AEs that were documented as rare events for tamoxifen; these included “fatty liver”, “uterine cancer”, “stroke”, and “pulmonary embolism”. We referred back to the messages where the terms occurred in order to determine the context in which authors mentioned these events. A few anecdotes from these messages are listed below. Note that the majority of message board users referred to the aromatase inhibitors by their brand names and these anecdotes reflect that. I saw the liver specialist ‐‐ was able to get in earlier and he said my fatty liver is probably from taking Tamoxifen.
I am still very scared, because Tamoxifen can cause uterine cancer.
had a stroke recently, after finishing 5 yrs of Tamoxifen. This sealed my decision.
I was one of the few that developed a pulmonary embolism while on Tamoxifen ‐ lucky me.
Block 2. Anecdotes of rare (as defined by label) AEs occurring with Tamoxifen. Very few authors mentioned that they developed uterine cancer, and that they were simply scared of developing it. For the other rare events, authors had mentioned that they Although it is mentioned as a very rare AE on the label, “uterine cancer” co-occurred 374 times with tamoxifen in our breast cancer corpus. This does not necessarily suggest that it is a more common AE than the label states, but simply that people frequently talk about it. Most of these messages demonstrated anxiety about taking tamoxifen because of this side effect, rather than having actually developed uterine cancer. However, for other AEs, such as “pulmonary embolism” and “fatty liver”, authors mentioned that they had actually developed the conditions. There were a few AEs that were reported with one of these four drugs that were not listed on the drug label. Many of these were false positives (i.e., AE and drug mentions were unrelated in the posts, determined by human). However, there were a few significantly associated events (with p-value less than Simes-corrected .05) that message board authors claimed to have actually occurred. The following undocumented AEs were found to be based on instances where message board authors claimed to have actually experienced the AE. anastrazole: chapped lips, dry eye, lupus, conjunctivitis, fibromyalgia exemestane: dry eye, high cholesterol, vaginal dryness letrozole: mood swings, vaginal discharge Not all of these AEs may be truly caused by the drug, but, at the very least, they may give practitioners a better idea of common perceptions that patients hold about these drugs. A few examples of posts where authors mentioned experiencing these AEs are The only SEs i had on Tamoxifen were weight gain and hot flashes/night sweats.
Has anybody been suffering from chapped, cracked lips, especially at the corners of your mouth, since being on
Arimidex? I had this problem many years ago but always could fix it by taking vitamin B complex. The vitamin B
complex isn’t helping any more and I am wondering if Arimidex is causing this.
My eye doctor says my dry eyes are probably from age ‐‐‐ but he thinks Arimidex made them worse and sent a letter
to my DR. saying so.
2/07 Moved to Aromasin & Zometa because of Arimidex triggering RA & Lupus
I've also had conjunctivitis 4 times. My onc said this wasn't a SE of arimidex but I know it is!
I’m now on arimidex. I am doing fine with BC but have developed fibromyalgia.
After about 1.5 years on Aromasin, my cholesterol which is normally 186 was at 254. Something I have not noticed
anyone talking about is high cholesterol.
oh, should add.rather significant impact.of aromasin ‐ was vaginal dryness.
I am on aromasin and have developed dry eyes. does anyone else have this problem?
slammed back with flushings, terrible night sweats, bad mood swings, now have bad joint pain(mostly in my ankle
and right hand/thumb). oh ugh with the femara!
Is there a side effect, that is B9, from Femara that causes vaginal discharge? I just got home from a cruise to AK and
upon wake up this a,.m. [sic] I had a discharge. Not real bloody but kind of like the very last day of a light period.
Block 3. Sample anecdotes of undocumented AEs that message board authors claimed to have experienced from tamoxifen, anastrazole, exemestane, and letrozole. 4. Discussion
Our system results in relatively low recall of drug AEs documented on the label, which seems particularly striking for more common side effects, given that these AEs are very common and one would expect authors to mention these symptoms frequently in their posts. However, many of these common AEs are mentioned very frequently throughout the entire corpus. One example is fatigue, a known AE of anastrazole, but also a potential effect of cancer itself. The terms “fatigue” and “arimidex” co-occurred 210 times within the breast cancer corpus. However, each of these terms occurred in 15,099 and 17,124 messages, respectively, yielding a non-significant p-value. In order for the association rule “arimidex-fatigue” to be significant, these two terms would have had to co-occur at a much higher frequency than what was observed. Our system seems better suited to finding rare AEs that are related to a specific drug, such as “uterine cancer” for “tamoxifen”. However, identifying new adverse events is probably a more appropriate goal than measuring the frequency of known side effects. It is also important to note that multiple event terms returned by our system may match just a single AE in the documented AE list. For example, if the documented AE list contained “pain in extremity”, then the terms “finger pain” and “toe pain” in the system’s returned list would be considered members of the documented AE list. This also contributes to our system’s low recall. The low recall against the documented list is due mostly to the fact that the documented list is long. Some of the terms in this list are pervasive within posts (e.g., fatigue, nausea, pain), and do not co-occur frequently enough with the specific drug to suggest a significant association rule. However, other AEs in the documented list are never mentioned by authors at all. Some examples of terms that are absent from our breast cancer corpus but are documented AEs for one or more of the four drugs we investigated are “increased bilirubin”, “increased creatinine”, “thrombocytopenia”, “increased alkaline phosphate”, and “stevens-johnson syndrome”. Many of these are laboratory abnormalities that might not be noticed or known by patients. 4.2 Undocumented events returned by system Our system exhibited relatively high precision in the AEs that it returned. However, on average, 23% of the drug-event rules identified by our system were undocumented on the label. Some of these undocumented AEs appeared to be unrelated to the drug. These events tended to co-occur with the drug very infrequently (generally less than 10 times), and occur very infrequently throughout the corpus. Because the number of co- occurrences between the drug and event was so small, it was simple for us to refer directly to the messages where these terms co-occurred and determine whether the author reported the event as being an AE of the drug. Some pairs that our system identified were considered to be false positives since they were not specifically documented as, but were similar to, AEs on the drug label. “arimidex-gout” was one of these pairs. Although gout is not a documented AE of anastrazole, it is a case of acute arthritis. The messages that mentioned this event noted remedies for gout that may be effective in treating aromatase inhibitor-induced arthritis. “Menopause” is an event that was returned by all four drugs. Authors used the term “menopause” to describe the AEs that they were experiencing as menopausal, particularly hot flashes; they did not specifically claim that a particular drug had induced Our system does not determine the kind of relationship (indication or side effect) between the drug and event for each association rule. Rather, it suggests only that the drug and event terms are correlated. For example, the events “bone density decreased”, “bone loss”, and “osteoporosis” were all significantly associated with tamoxifen. However, tamoxifen is known to increase bone density . Referring back to the original messages where these event terms and “tamoxifen” co-occurred revealed that post authors were glad that tamoxifen prevented or remedied these events. Some terms were actually contraindications for the drug, such as “Factor V Leiden mutation” (associated with an increased risk of blood clots) for tamoxifen. Even though these terms were related to the drug, are not caused by the drugs and were thus treated as The remainder of the undocumented events consisted of messages where the author claimed to have experienced the AE in response to the drug. Block 3 contains several examples of these undocumented AEs. Some of these AEs may be common knowledge in the medical community even though they are not listed on the label. For example, it is known that tamoxifen tends to make it more difficult for patients to lose weight  even though neither “weight gain” nor “difficulty to lose weight” is mentioned as an AE on the tamoxifen label. Other AEs mentioned in the corpus may not be commonly accepted as an AE of that particular drug. Controlled studies would be needed to verify that these undocumented AEs actually occur from a particular drug. However, these anecdotes may serve as signals that could be rigorously verified in We understand that medical message board corpora are very different from clinical trial data, health care data, and databases of reported adverse events. Medical message boards provide a community and support for their members. Post authors not only communicate how they are feeling and any AEs they have experienced, but also AEs that they are worried about or that a friend may have claimed to experience. However, given that we have shown that it is able to reliably extract known AEs for several drugs, it may have potential utility for identifying undocumented AEs for dietary supplements that breast cancer patients use as well. Extending the system to determine the relation between the drug and event, for example, indication or AE, would be useful for other drugs as well as nutritional supplements and herbal preparations. We will also improve our system to extract other types of information from medical message boards. Currently our system is able to identify documented AEs of drugs with high precision by the frequency that they co- occur. However, the current system cannot determine what type of speech act the author used for each specific case where they co-occurred. For example, the author could be claiming that they actually experienced the AE, claiming that their friend experienced the AE, wondering if a drug causes that particular AE, or are just worried about a specific AE. Refining this system to identify these speech acts would enable our system to be used as an alternative to focus groups. Another possible route of exploration would be to extend our system to identify instances of drug non-adherence, and identify the AEs that led to this decision. It would be interesting to know how people generally feel about a particular drug, based on their posts on message boards. Do they like a drug? Do they hate it? Why? Extending our system to perform a sentiment analysis on the breast cancer corpus would allow researchers and medical practitioners to get a sense of what people think We also intend to apply our system to medical message board corpora for conditions other than breast cancer, in order to evaluate its ability to generalize to other conditions. Currently we have compiled corpora of obesity, diabetes, and arthritis message board posts. These are ideal candidates for our method since they contain a large number of messages because they are all chronic conditions with a high prevalence. We expect similar results to those generated from the breast cancer corpus. In addition, we will continue to update the system’s controlled vocabulary based on how well it identifies relevant instances of drug and event mentions. 6. Conclusion
We have designed a system to identify signals of potential adverse events that overcomes many of the difficulties associated with medical message boards and is able to extract useful information from them. Our system is able to generate a corpus of medical message board text by downloading the message pages, extracting the relevant fields from the pages, and programmatically removing identifying information from the documents. These de-identified documents are then searched for terms occurring in the controlled vocabulary in order to extract association rules that may signify meaningful relationships between these terms. We demonstrated the efficacy of this system by extracting the significant drug-event association rules for four hormonal breast cancer treatment drugs from a corpus of breast cancer message board posts and compared this list to a list of all AEs that were documented on each drug label. Although our system’s recall over these documented AE lists was relatively low (average of 35.1%), the precision was relatively high (average of 77.0%), suggesting that the terms in the significant association rules returned are not only correlated, but represent a real semantic relationship as well. In addition, we identified that some of the undocumented AEs considered significant by our system signaled an AE that message board authors claimed to have experienced from the drug. Whether or not the AE was actually caused by the drug is unknown. However, it may signal a true AE from that drug, and may be worth further investigation. Even if the AE is not caused by the drug, it is useful to know what symptoms patients believe (correctly or not) are being caused by the drug. In future work, we plan to extend the system to identify the speech act that was used for each drug-side effect mention and to use this system to identify undocumented adverse Authors’ contributions
JHH, JM, LU, SHennessy, and SHill conceptualized the research. JHH, LU, and SHill designed the system methodology. JM reviewed the drug and supplement vocabularies. SHennessy, JM, and CL interpreted the drug-symptom findings. AB constructed the message board corpus and implemented the system. AC contributed to the validation of the system. AB and JHH wrote the manuscript. All authors contributed to revision and approved this manuscript. Acknowledgment
This project is supported by the National Library of Medicine (RC1LM010342). Access to the Multum database was supported by the National Center for Research Resources (5Kl2RR024132). We thank Cristin Freeman for providing access to the Multum database and for sharing her expertise throughout the project. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library of Medicine or the National Institutes of Health. This study was approved by the Institutional Review Board at the University of Pennsylvania.  Cousineau TM, Rancourt D, Green TC. Web chatter before and after the Women's Health Initiative results: a content analysis of on‐line menopause message boards. Journal of Health Communication 11(2):133‐47, 2006.  Donelle L, Hoffman‐Goetz L. Health literacy and online health discussions of North American Black women. Women & Health 47(4):71‐90, 2008.  Gooden RJ, Winefield HR. Breast and prostate cancer online discussion boards: a thematic analysis of gender differences and similarities. Journal of Health Psychology 12(1):103‐14, 2007.  Meric F, Bernstam EV, Mirza NQ, Hunt KK, Ames FC, Ross MI, et al. Breast cancer on the world wide web: cross sectional survey of quality of information and popularity of websites. BMJ 324(7337):577‐81, 2002.  Schultz PN, Stava C, Beck ML, Vassilopoulou‐Sellin R. Internet message board use by patients with cancer and their families. Clinical Journal of Oncology Nursing 7(6):663‐7, 2003;‐Dec.  Armstrong N, Powell J. Patient perspectives on health advice posted on Internet discussion boards: a qualitative study. Health Expectations 12(3):313‐20, 2009.  Nahm ES, Resnick B, DeGrezia M, Brotemarkle R. Use of discussion boards in a theory‐based health web site for older adults. Nursing Research 58(6):419‐26, 2009;‐Dec.  Copelton DA, Valle G. "You don't need a prescription to go gluten‐free": the scientific self‐ diagnosis of celiac disease. Social Science & Medicine 69(4):623‐31, 2009.  Habermeyer E, Habermeyer V, Jahn K, Domes G, Nagel E, Herpertz SC. [An internet based discussion board for persons with borderline personality disorders moderated health care professionals]. [German]. Psychiatrische Praxis 36(1):23‐9, 2009.  Shigaki CL, Smarr KL, Yang G, Donovan‐Hanson K, Siva C, Johnson RA, et al. Social interactions in an online self‐management program for rheumatoid arthritis. Chronic Illness 4(4):239‐46, 2008.  Glance N, Hurst M, Nigam K, Siegler M, Stickton R, Tomokiyo T. Deriving marketing intelligence from online discussion. Proceedings of the eleventh ACM SIGKDD international Conference on Knowledge Discovery in Data Mining 2005;419‐28.  Feldman R, Fresco M, Goldenberg J, Netzer O, Ungar L. Extracting Product Comparisons from Discussion Boards. In Proceedings of the 2007 Seventh IEEE International Conference on Data Mining 2007;469‐74.  Chklovski T. Deriving quantitative overviews of free text assessments on the web. IUI '06: Proceedings ofthe 11th international conference on Intelligent user interfaces 2006;155‐62.  Dave K, Lawrence S, Pennock DM. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. International World Wide Web Conference. Proceedings of the 12th international conference on World Wide Web , 519‐528. 2003.  Morinaga S, Yamanishi K, Fukushima T. Mining product reputations on the web. KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge 2002;341‐9.  Malouf R, Davidson B, Sherman A. Mining web texts for brand associations. In Proceedings of the AAAI‐ 2006 spring symposium on computational approaches to analyzing weblogs 2006;125‐6.  Almenoff JS, LaCroix KK, Yuen NA, Fram D, DuMouchel W. Comparative Performance of Two Quantitative Safety Signalling Methods: Implications for Use in a Pharmacovigilance Department. Drug Safety 2006;29(10):875‐87.  DuMouchel W, Pregibon D. Empirical bayes screening for multi‐item associations. Proceedings of the Seventh ACM SIGKDD international Conference on Knowledge Discovery and Data Mining , 67‐76. 2001.  Wilson A, Thabane L, Holbrook A. Application of data mining techniques in pharamcovigilance. British Journal of Clinical Pharmacology 2004;57(2):127‐34.  Wang X, Hripcsak G, Markatou M, Friedman C. Active Computerized Pharmacovigilance Using Natural Language Processing, Statistics, and Electronic Health Records: A Feasibility Study. Journal of the American Medical Informatics Association 2005;16(3):328‐37.  Goldman JA, Chu WW, Parker DS, Goldman RM. Term Domain Distribution Analysis: a Data Mining Tool for Text Databases. Methods of Information in Medicine 2010;2010(2):96‐101.  Finkel JR, Grenager T, Manning C. Incorporating non‐local information into information extraction systems by Gibbs sampling. ACL 2005. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics , 363‐370. 2005.  Adrian Benton, Shawndra Hill, Lyle Ungar, Annie Chung, Charles Leonard, Cristin Freeman, et al. A system for de‐identifying medical message board text. International Conference on Machine Learning and Applications 2010. 2010.  Bird S. NLTK: the natural language toolkit. Annual meeting of the ACL. Proceedings of the COLING/ACL on Interactive presentation sessions , 69‐72. 2004.  Friedman C. Discovering Novel Adverse Drug Events Using Natural Language Processing and Mining of the Electronic Health Record. 2009. p. 1‐5.  R.J.Simes. An improved Bonferroni procedure for multiple tests of significance. Biometrika  Resch A, Biber E, Seifert M, Resch H. Evidence that tamoxifen preserves bone density in late postmenopausal women with breast cancer. Acta Oncologica 1998;37(7‐8):661‐4.  Peter J.Hoskin, Sue Ashley, John R.Yarnold. Weight gain after primary surgery for breast cancer ‐ effect of tamoxifen. Breast Cancer Research and Treatment 992;22(2):129‐32.
Mood and anxiety disorders during pregnancy and the postpartum period Miriam B. Rosenthal, MD, Associate Professor of Psychiatry and Reproductive Biology, Case Western Reserve University Chief of Behavior Medicine, University MacDonald Women's Hospital symptons and syndromes of mood and anxiety disorders that may occur during effects of these disorders on mothers, infants and family