Archive for August 29th, 2008

HTML Tables and the Data Web

Some time ago now, I wrote a post about progressive enhancement of HTML web pages, including some examples of how HTML data tables could be enhanced to provide graphical views of the data contained in them.

I’m not sure if anyone is actively maintaining progressive enhancement browser extensions (I haven’t checked) but here are a couple more possible enhancements released as part of the Google visualisation API, as described in Table Formatters make Visualization tables even nicer:

A couple of other options allow you to colour a table cell according to value (an implementation of the ‘format cell on value’ function you find in many spreadsheets), and a formatter that will “format numeric columns by defining the number of decimal digits, how negative values are displayed and more”, such as adding a prefix or suffix to each number.

I’m not sure if these features are included in the QGoogleVisualizationAPI Google visualisation API PHP wrapper yet, though?

Also in my feed reader recently was this post on Search Engines Extracting Table Data on the Web, which asks:

what if Google focused upon taking information from tables that contain meaningful data (as opposed to tables that might be used on a web page to control the formatting of part or all of a page)?

What if it took all those data filled tables, and created a separate database just for them, and tried to understand which of those tables might be related to each other? What if it then allowed for people to search through that data, or combine the data in those tables with other data that those people own, or that they found elsewhere on the Web?

and then links to a couple of recent papers on the topic.

It strikes me that Javascript/CSS libraries could really help out here – for example structures like Google’s Visualisation API Table component and Yahoo’s UI Library DataTable (which makes it trivial to create sortable tables in your web page, as this example demonstrates: YUI Sortable Data Tables).

Both of these provide a programmatic way (that is, a Javascript way) of representing tabular data and then displaying it in a table in a well defined way.

So I wonder, will the regular, formalised display of tabular data make it easier to scrape the data back out of the table? That is, could we define GRDDL like transformations that ‘undo’ the datatable-to-HTML-table conversions, and map back from HTML tables to e.g. a JSON, XML or javascript datatable representations of the data?

Once we get the data out of the HTML table and into a more abstract datatable representation, might we then be able to use the Javascript data representation as a ‘database table’ and run queries on it? That is, if we have data described using one of these datatable representations, could we run SQL like queries on it in the page, for example by using TrimQuery, which provides a SQL-like query language that can be run against javascript objects?

Alternatively, could we map the data contained in a “regular” Google or Yahoo UI table to a Google spreadsheets like format – in which case, we might be able to use the Google Visualisation API Query Language? (I’m not sure if the Query Language can be applied directly to Google datatable objects?)

It’s not too hard then to imagine a browser extension that can be used to overlay a SQL-like query engine on top of pages containing Yahoo or Google datatables, essentially turning the page into a queryable database? Maybe even Ubiquity could be used to support this?

Library Analytics (Part 7)

In the previous post in this series, I showed how it’s possible to identify traffic referred from particular course pages in the OU VLE, by creating a user defined variable that captured the complete (nasty) VLE referrer URL.

Now I’m not definitely sure about this, but I think that the Library provides URLs to the VLE via an RSS feed. That is, the Library controls the content that appears on the Library Resources page when a course makes such a page available.

In the Googe Analytics FAQ answer How do I tag my links?, a method is described for adding additional tags to a referrer URL that Google Analytics can use to segment traffic referred from that URL. Five tags are available (as described in Understanding campaign variables: The five dimensions of campaign tracking):

Source: Every referral to a web site has an origin, or source. Examples of sources are the Google search engine, the AOL search engine, the name of a newsletter, or the name of a referring web site.
Medium: The medium helps to qualify the source; together, the source and medium provide specific information about the origin of a referral. For example, in the case of a Google search engine source, the medium might be “cost-per-click”, indicating a sponsored link for which the advertiser paid, or “organic”, indicating a link in the unpaid search engine results. In the case of a newsletter source, examples of medium include “email” and “print”.
Term: The term or keyword is the word or phrase that a user types into a search engine.
Content: The content dimension describes the version of an advertisement on which a visitor clicked. It is used in content-targeted advertising and Content (A/B) Testing to determine which version of an advertisement is most effective at attracting profitable leads.
Campaign: The campaign dimension differentiates product promotions such as “Spring Ski Sale” or slogan campaigns such as “Get Fit For Summer”.

(For an alternative description, see Google Analytics Campaign Tracking Pt. 1: Link Tagging.)

The recommendation is that campaign source, campaign medium, and campaign name should always be used (I’m not sure if Google Analytics requires this, though?)

So here’s what I’m proposing: how about we treat a “course as campaign”? What are sensible mappings/interpretations for the campaign variables?

  • source: the course?
  • medium: the sort of link that has generated the traffic, such as a link on the Library resources page?
  • campaign: the mechanism by which the link got into the VLE, such as a particular class of Library RSS feed or the addition of the link by a course team member?

By creating URLs that point back to the Library website for the display in the VLE tagged with “course campaign” variables, we can more easily track (i.e. segment) user activity on the Library website that results from students entering the Library site from that link referral.

Where course teams upload Library URLs themselves, we could maybe provide a “URL Generator Tool” (like the “official” Tool: URL Builder) that will accept a library URL and then automatically add the course code (source), a campaign flag saying the link was course team uploaded, a medium flag saying the link is provided as part of assessment, or further information. The “content” variable might capture a section number in the course, or information about what activity in particular the resource related to?

For example, the tool would be able to create something like:
http://learn.open.ac.uk/mod/resourcepage/view.php?id=36196&utm_source=E891-07J&utm_medium=Library%2Bresource&utm_campaign=Library%2BRSS%2Bfeed

Annotating links in this way would allow Library teams to see what sorts of link (in terms of how they get into the VLE) are effective at generating traffic back to the Library, and could also enable the provision of reports back to course teams showing how effectively students on a particular course are engaging with Library resources from links on the VLE course pages.


TweetMeme Chicklet

Custom Search Engines

How Do I? Instructional Video Metasearch Engine
OUseful web properties search

OUseful feedthru bookmarks...

Pages

 

August 2008
M T W T F S S
« Jul   Sep »
 123
45678910
11121314151617
18192021222324
25262728293031