class: center, middle # Curing Cancer with HTML5 Rich Trott HTML5DevConf October 2014 San Francisco https://trott.github.io/curing-cancer-with-html5 ??? Hi, everyone! Thank you for coming to "Curing Cancer with HTML5"! --- class: center, middle # Spoiler alert! *We're not actually going to cure cancer with HTML5...* ??? Hopefully this is already clear, but HTML5 will not _actually_ cure cancer. Neither will I. And this talk isn't _actually_ about oncology. It _is_ about search. So why does it have the title "Curing Cancer with HTML5"? Because when I wrote up the talk proposal, I was in a rush. Where I work, some of our researchers are in fact trying to cure cancer, boom, there's my title. Sorry, everyone. I'll try to do better with the title next time. --- class: center, middle # We have lots of search interfaces… ??? I work for the University of California, San Francisco, a health sciences institution. That means we have a hospital and lots of researchers, but we have no undergraduate students, no sports teams, no liberal arts programs, no computer science programs. We also have dozens of online resources for researchers, clinicians, and educators. Each of these online resources has its own search tool. But users shouldn't be asked to search in dozens of different interfaces to find what they're looking for. A typical solution for this issue is what's called federated search. --- class: center, middle # …and our federated search kinda sucks… ??? One of the resources, one that we actually curate and publish ourselves, is called the Legacy Tobacco Documents Library. It's an awesome collection of, literally, millions of internal tobacco industry documents. In fact, this talk was originally going to be about that site, because last year, we started a user-centered redesign for the site. We did tons of user research and found out what was working and wasn't working. But as it turns out, our federated search suffers from many of the same problems identified for the tobacco industry documents site. --- class: center, middle # …so we made it suck less. ??? So it was time to do something about it. Again, this talk was originally going to be about the tobacco industry documents site, not our federated search. I had included an amazing old Camel cigarettes commercial called "What Cigarette Do You Smoke Doctor?" I had included Sylvester Stallone's letter agreeing to be paid a half a million dollars to smoke Brown & Williamson tobacco products in feature films. I was going to talk about the overwhelming amount of metadata we showed users by default including something called—and I swear I'm not just making this up—a Bates Number and a Master Bates Number. But sometime in the last 48 hours, I realized I ought to talk about the basic building blocks, especially if I only have 20 minutes to talk. So I junked my hilarious awesome entertaining talk, and now you get this more pragmatic, narrowly-focused talk. But if you want to see that Camel ad and talk about poor choices made in naming metadata, come talk to me. --- class: center, middle # So, about federated search... ??? So, back to federated search. There are (at least) three ways to pull data out of other resources in real time. In descending order of desirability, here they are. --- class: center, middle # Cool, they have an API for that!
This almost never happens.
??? Now, I've been using amalgamatic for search, but I want to encourage people to think creatively about this. It can really be used to scrape anything. So you can basically take something that doesn't have an API, and create your own API. Like...maybe...this thing. The much-maligned HTML5DevConf online schedule. I don't begrudge the conference about this. This is a super-affordable conference powered by volunteers and a shoestring budget. Yesterday, I thought, "If all the people complaining about the schedule would channel that effort towards fixing it, we'd be all set." --- class: center, middle ### https://github.com/ucsf-ckm/amalgamatic-h5dcsched ??? So I wrote an Amalgamatic plugin for the HTML5DevConf calendar. And gave it a horrible name. But the code is straightforward HTML scraping. --- class: center, middle ### https://github.com/Trott/devconf-calendar ??? Then I wrote a quick browserified page to use the plugin to dump JSON calendar data onto a page. --- class: center, middle ### http://trott.github.io/devconf-calendar ??? And we can see it here, and hope it works, WiFi-willing. --- class: center, middle # Step 3: Profit!
??? Then I tweeted about it and waited for someone to use the data to make a killer calendar interface. It didn't happen. To be fair, I tweeted about it just before midnight yesterday, so not a lot of time and diminishing benefits for someone to do the work. But you know, maybe next year I'll think of doing this a day ahead of the conference rather than halfway into it. --- class: center, middle # Thanks!! ## https://trott.github.io/curing-cancer-with-html5 Rich Trott @trott UC San Francisco