Wednesday, January 21, 2009

ScienceOnline'09: Semantic Web

The last session before lunch happened to be the one where I felt the most overwhelmed by the breadth of information that I simply cannot seem to grasp in any cogent fashion: John Wilbanks's session on the semantic web in science. I've heard and read about the semantic web, but have yet to be able to fully understand what it might look like. Although I still have lots of questions, this session thankfully illuminated some of the goals/aims of the semantic web. You can view the slides here. As with all my posts, but especially this one, any incoherency is my fault alone...
  • open innovation (as understood under traditional collaboration model) aimed to expand the capacity of the external market, and inflows and outflows of knowledge, to aid internal knowledge/advantage
  • Joy's law: the smartest people work elsewhere
  • user innovation: only people who have problems can solve problems
  • new innovation/collaboration enables people to design their own shoes, t-shirts, etc., but doesn't exist for science
  • why not?: intellectual property rights - scientists don't share well; funding models; inertia; incentive structures; no web for data
  • Google search won't give you genes but papers about genes
  • the "semantic web" isn't great but all we can come up with
  • computers need to understand relationship between websites
  • coffee ontology explains relationship between aspects of coffee needs/uses/properties
  • semantic web is lots of specifications: RDF at heart, GRDDL, RDFa, OWL, SPARQL
  • need domain name system for concepts; lack has been reason for failure
  • use web to integrate
  • RDF: Resource Description Framework
  • every arc has direction
  • "literals": facts, instances about things
  • "reification": categories
  • RDF simply and ugly; meant for machines not humans
  • GRDDL gleans resource dialects out of existing
  • RDFa: RDF in HTML
  • OWL: Web Ontology Language; structured relationships
  • essentially wants to query 1000 web pages as 1 same way 1000 papers are queried as 1
  • SPARQL is SQL for semantic web
  • RDF allows data [to be] remixable that is contextually accurate
  • is it legal? have to reconstruct public domain for licensing angle
  • CC Zero license (CC0) allows contractual reconstruction on public domain in database licenses
  • does conflict with protection instinct; if you don't want your data remixed don't put under RDF
  • just because you put genome data online and claim copyright doesn't mean you have it because facts cannot be copyrighted (at least in US)
  • database law has been killed in US several times
  • doesn't scale across science: some (e.g., earth science) cool with sharing, but others (e.g., biology) would rather share toothbrush than data
  • web isn't going to do this for us
  • get practice answers out of existing databases and resources
  • queries are interface to this [semantic web] world
  • lot of this isn't baked yet
  • got to have problem worth solving to use this; wouldn't use this for your calendar
  • trademark is the only way to protect; if you don't like, fork but don't infringe trademark by using name
  • Swoogle is a semantic web search tool
  • Open Biomedical Ontologies is a compilation of ontologies used for semantic web
  • has always been about machine interoperability on data

No comments: