Tuesday 14 June 2016

Technologies 001

Discussion of particular technologies, and the general considerations that should guide us in choosing between them.

Chris MacMackin:
My default inclination for something like this is to try to use multiple very small and simple tools which are extensible or can interact.
Tim Wilkinson:
we want to use the most generic, uncomplicated, interoperable and cross-platform systems consistent with our requirements, and decisions taken at eth early stages must reflect this
I'd strongly recommend that we try to
  1. Use open-source systems wherever possible 
  2. Choose systems that use standard and widely-used languages, technologies, protocols and interfaces in the most 'vanilla' flavour possible 
  3. Keep in mind integrability of the systems - we don't want a proliferation of small systems with no way of getting them to work together 
  4. Pick systems that are as modular as possible - in particular separation of presentation layer from data - we want to create a recognisable identity across many of our systems 
  5.  Avoid systems which can only be run as services from a remote site 
  6. Where possible use the same technology for similar tasks, even if this means slightly re-thinking our requirements or being creative about implementation. Having more uniform systems will not only make it easier for users and developers to move from one to another and improve integrability; it will also massively reduce duplication of effort in development, maintenance, training etc. 
  7. Make sure we can maintain full control and (where approrpiate) sole visibility of all data and code 
  8. Make use of some kind of workflow/project management tools for systems development - these, like this blog, are 'meta' systems and don't need to be integrable with our operational, production systems, though if they can be they may as well be, especially since workflow will be needed as part of our operations in any case


Unknown said...

In terms of workflow/project management for systems development, we can use Github. It is ideal for this sort of thing. It can also integrate with some other websites to provide extra features.

Unknown said...

Also, great job in developing these lists of issues/things to consider. They're impressively comprehensive!

Unknown said...

I should say that I think the principles you outline here are very good ones. However, there is a tension between #3 and #4--we want systems that integrate together but are at the same time modular. This is possible, but requires discipline to do properly.

The other thing to bear in mind is that some established/standard technologies are more integrate-able than others. For example, phpBB is very well established and standard but is, at the end of the day, forum software. You can write extensions for it, but you will always be building around a forum. Furthermore, it does not (easily, at least) hook into other systems. By contrast, the Django framework (my preferred route) has many pre-built bits of software for it, which can fairly easily be extended to work together.

Incidentally, sorry if I'm focusing too much on platforms and technologies at this early stage. It's just that this is where my primary expertise lies.

Tim Wilkinson said...

re: Wikis, document DBs, triple stores, semantic web

I'm looking into these kind of technologies as strong possibilities. If anyone (including Steve - sorry again if my previous was a bit harsh) wants to do any research or testing in this area, please do, obviously.










Chris - don't know if you have any specialist knowledge in these areas. Bit of a learning curve for me but reasonably familiar concepts and the discipline of an RDBMS background along with knowledge & practical experience of getting that data out via analytical & reporting systems, inc. abstraction layers of things like Business Objects does mean I think I have some chance of getting to grips with it reasonably quickly, esp the back end. For the front and middle tier-ish stuff (if that kind of terminology isn't defunct yet) I hope you may be able to take a lead on. I know nothing of Django and am not really a programmer (lots of VB(!), javascript, a tiny bit of C# is about it - stuff like T-SQL and PL/SQL but not really that relevant). I do have friends who can advise and possibly - once this thing gets off the ground and starts to look like a going concern - even get involved directly.

I certainly think that once we have a better idea of where we'd like to get to eventually (and of contingencies along the way) we need to be thinking about a development and roll-out path that is going to start getting results, proof of concept etc early on while not wasting effort on half-arsed quick fixes. So that should be simple enough. /irony

Your idea of starting (in operational, not necessarily architectural and devleopment terms) with some very structured policy proposals and getting people to work on those sounds sensible and a similar approach might be applied in areas other than policy development. (E.g. a history project etc).

Point taken about the tension and possible trade-off between 3 and 4 - I suppose I was thinking about horizontal integrability/compatibility between systems (e.g. same DB system) and vertical modularity (e.g. good separation of presentation layer) within them. Your other remarks here are useful and will be integrated into my attempts at synthesis and reorganisation.

Unknown said...

I'm afraid I don't have any particularly specialist knowledge on those topics you posted links to. Semantic wikis are interesting, but I'm not entirely clear on how we'd use them. They could be useful for the "knowledge base" you describe, I suppose. Would they be useful for policy development and discussion of theory though?

Django is a Python framework for dynamic web applications. The main reason I've been pushing it is that it is the most widely used such framework in Python and Python is the language which I'm most confident in working for these sorts of things. VB would not be appropriate, as it's proprietary. I think C# is partially proprietary too? Although Microsoft may have opened it up some in recent years--I'm not entirely sure. In any case, the principle behind Django is that it maintains a database (various types can be used: SQLite, PostgreSQL, MySQL, and Oracle) which holds all of the application(s)'s data. Django then provides an interface through which to access it in order to generate webpages and by which webpages can be used to enter new data. Unless I am very much mistaken, all web frameworks (and any website with any degree of sophistication) work this way. I think this means that you could fairly easily apply your more database oriented expertise to the data generated by Labour Roots.

I should emphasise again that I have no experience in web development besides making a couple of personal websites. However, in those cases, I was just designing a theme, which is different from designing an application. I am fairly knowledgeable with Python and Django doesn't look too hard to learn, though. There are also plenty of existing apps written for it which we could build off of. A single Django website (project) can consist of multiple "apps". These apps would all store there data in the same database, so it is very easy to extend them to integrate together. For example, I could take an existing forum app and an existing wiki app and then couple them in such a way that each wiki page would be associated with a particular forum thread, with easy links back and forth between them.

Tim Wilkinson said...

OK thanks Chris that sounds promising. I'll try and draw all this together in the next few days (BTW I didn't mean to suggest anyone would ever want to use VB except for legacy apps! Just giving an idea of my IT experience.)

You're right that a lot of the stuff I'm talking about enters into the knowledge/argument base idea and perhaps less directly into the policy debate side of things; though to make use of policy debates it would be useful to store the evidence and arguments they use in a serachable, re-usable form.

Re: semantic wiki/normal wiki/document-centric db/triple store; I think we definitely (eventually) would benefit from being able to decompose the content of, say a long document into smaller semantic units, and to be able to do the converse: quickly build a document or case or article by assembling stored data.

I wonder if we might therefore need otg try and find a system which can work at both levels: people should be able to store something approaching 'atomic' facts (along with source documentation) but also whole documents (and some documents will be referenced as the source of various facts). The key to this would be to make sure that whenever a resource is used, referenced etc it is more or less transparently stored in the knowledge base. For example, say we set up a research portal which brings together a wide range of sources (say Hansard, the OBR, online newspapers, TheyWorkForYou, etc etc) in a managed way. As people worlk with that data, cite it and draw inferences from it, the system might, with a minimum of intervention fomr the user, be able to import the dataset being used, and log whatever summary, inference, citation, etc the user might generate from it, so that others can buld on that same piece of work later.

Similarly, as someone conducts research for a paper, say, they are likely to gather internet links and assopciate them with notes saying what point the link is the source for. If this process is formalised, the URL, and rachived copy of the web page, machine-generated (provisional) semantic data and the user's own semantic gloss can all be stored into the knowledge base for future use by others.

There are machine methods, which will continue to develop, for analysing semantic content of documents automatically, to an extent - there are also knowledge bases that can be imported en masse

This is all pretty ambitious, but while we may never get to the ideal fully-integrated and semantically complete 'ultimate system', having a reasonable idea of what one would be like should mean we can devise an implementation path that will get fast results yet allow for incremental changes which move us closer to the ideal system while being useful improvements in themselves.

Sorry this may not be all that clear and is a bit sketchy at present, but atht's the nature of the beast. If we can get closer to some kind of workable spec and development path and show that we have a viable project under way and sufficient competence and drive to push it forward, we should be able to recruit more technical and other expertise, at which point the kind of thing I'm talking about here becomes a real possibility.

Unknown said...

Very interesting thoughts. I don't know if I have a huge amount to contribute here--it isn't an area of expertise for me. Some sort of citation system wouldn't be too hard to integrate into the Wiki, I don't think, though. It should just be a matter of writing an extension for the wiki syntax and I have some experiencing doing such things already.

My immediate goal is to come up with a proof of concept for the discussion system which I outlined. I think this might be useful as a central part of our system. Policy proposals would become items for discussion; responses could be used by the policy commissions to modify their proposals. The mini think tanks could publish to the wiki as well. Everything on the wiki would draw references from (and contribute references to) the knowledge-base. Minutes of meetings, reports on activism, etc. could go onto the wiki to solicit responses.

I've found an existing piece of Django wiki software which looks promising. It's not perfect, but it should be a good start. There is a very polished looking piece of forum software for Django as well. What I'm thinking I'll do is create a class of wiki articles which are non-editable. These would have a one-to-one link with a forum thread (or possibly a more standard comment system) and an editable wiki page for the response. It shouldn't be too hard to modify the existing apps to have links between these pages. Initially I'll just be wiring together what's already available using their default themes. As such, they won't necessarily all look like they go together and they certainly won't have a distinctive "Labour Roots" theme. That can come later, hopefully from someone with a better eye for design than I have.

John Walsh said...

Sorry for absence, been busy for a bit – can I butt in here …

Can I add in a concern that there's a lot of technology being discussed here while the purpose seems to an extent in abeyance? Is it possible to argue that the purpose should be leading development?

Going back to the discussion on Left Futures and in particular Tim's 'call to arms' ...


... there was a 4 point list:
1. we should get together in a more organised way - yes, we're doing that here.
2. get more people on board - perhaps we'd need to be able to communicate the embryonic purpose a little better before that can happen.
3. 'thrash out some kind of big picture to provide a direction of travel' - is this being done?
4. get going on small initiatives - Loomio was one, but didn't work for various reasons. Other similar things could be simply signed up to (are there any Django wiki software examples we could use?), then we could do a bit more 3 then 2.

Hope it's ok to say that.

Unknown said...

If I have a place to host it, I could get a basic wiki running very quickly. There are a few improvements I'd like to make to it, but I don't need to make them right away. It would take a bit longer if I want to try to create the "position, discussion, response" model which I've mentioned, as I'd have to write a few extensions to existing software. You can see an example of this software in action (albeit in Spanish) here: http://www.python.org.ar/wiki/

John Walsh said...

I can send you access details for hosting a wiki. I rather we did this privately - is it ok to email them to your Oxford email?

Unknown said...

Actually, could you use the email on my blog? You can find it at http://politicalphysicist.github.io/ I try to keep my professional and political lives as separate as possible.

Tim Wilkinson said...

John - this is the Technologies thread after all, but point taken. It would be really useful to develop a fuller idea of how we want policy discussions, possibly voting, greater communications within Momentum, in particular with the gressroots, etc. Please do post any ideas, under whatever category seems to fit. (I now see you have done so in fact!)

I do think it's useful to be trying to move ahead on all fronts together, at this initial stage anyway, rather than going only for top-down 'waterfall model' planning or bottom-up shoot-first-ask-questions-later approaches. For example, the available te3chnologies are actually a key constraint on the scope of the project over the medium term at least. Similarly, plans at all levels of abstraction/generality need to be mutually adjusted to one another and this is probably best done in parallel rather than sequentially (at any rate they can be done that way).

Tim Wilkinson said...

Chris - my research into wiki stuff suggests to me that we may be best off going with MediaWiki (which Wikipedia uses). Specifically, the 'semantic' extension of it.


I'm still in the middle of all this but thought I'd ask you (a) what you think (b) if there's any prospect of setting up a demo/testing installation of it.

I'll probably put up a 'Wiki' post under Technologies fairly shortly.

I've been putting quite a bit of thought into this and my skill set is pretty well-suited to the topic (even philosophical ontology! - and I forgot to mention data-warehouse design experience). Not much point in pre-announcements but I think I may have a fairly well-developed proposal - which incorporates the comments made here - to post, possibly this weekend or anyway in the next few days...

Tim Wilkinson said...

Sorry, addressed to Chris but for anyone obviously welcome to reply. Not sure if this kind of thing is something John, for example, is interested in/knowledgeable about?

Unknown said...

MediaWiki is an extremely well-developed platform. There are three concerns I have though.

1) It has a very Wikipedia-ish look. We want to make sure that we have our own unique branding and I'm don't that looking like a Wikipedia clone would be a good thing. There may well be ways to customise its appearance, though, so I'm not sure how big an issue this would be.

2) If we want to integrate it with something which isn't strictly wiki-ish then this might be difficult to do.

3) It's written in PHP. As such, it would be difficult for me to help write new extensions. I could learn PHP, but given I have only so much time to contribute this would definitely be a barrier.

While a great system, MediaWiki is definitely not lightweight. I was under the impression that we were aiming to develop more of an ecosystem of small, integrated tools. Are we sure that we want everything to be built around the wiki-structure? I'll wait until you've published your full proposal, though.

Tim Wilkinson said...

I think CSS/javascript 'skins' meen the appearance is pretty easily customised: http://www.mediawiki.org/wiki/Manual:Skins

Proposal is still taking shape so I'm happy to talk aloud here& get any input.

On the integration side of things, my idea is that most of our early stuff will be largely document-centric and so wikiable. In fact I'm thinking we make the wiki format (with WYSIWIG, Halo extension) the standard interface. One issue is talk pages, which are ****ing atrocious. WF did start developing a better workflow-based system - 'Flow' https://www.mediawiki.org/wiki/Flow_Portal but shelved development. I'm wondering if we can do without talk pages at all, and rely on comments used in making revisions. This may sound stupid but given in contested topics we're looking at restricted access by identifiable editors maybe this can work - anyway I'll try and discuss this in whatever proposal I manage to put together.

The idea of a single system is partly that it is more manageable in terms of maintenance, development, consistency etc, but primarily that we basically want to be building a knowledge base from everything we do. An applicable principle is that no-one should ever have to duplicate any work that's already been done, whether it be importing a reference to a web page, finding a pattern in employment statistics, drawing a particular connection between two speeches, whetever. It should be easy for someone researching a topic to build on all previous research done on that topic - and all the info imported or made available (and cached on access) via a portal.

I think we can set up multiple small initiatives, but make use of a common infrastructure that links them all right from the start. It should (TBC) be trivial to set up separate sub-wikis for example, so that a policy dev wiki can draw on, and where appropriate, update and supplement, a base-level 'knowledge base' research wiki, while its content remains distinct. Having single framework is also going to make things like security, reputation/competence scores etc far easier.

I'd also suggest we (at some point in an incremental adaptative process) make use of derived wiki pages and sections of pages, based on data from the underlying semantic db, which semantic mediawiki mirrors tagged wiki data to. This also means that it should be feasible to add rdf data directly to the semantic db, which than then be 'cascaded' up to create simple wiki pages in the knowledge base. User can make further connections of arbitrary complexity between these 'wikinodes', which in turn will be mirrored back to the rdf database.

In essence, we will ultimately then be maintaining a document centric and a semantic data store in parallel, or constant close convergence. At some future point, and maybe only in some areas, the wiki layer might be discarded or replaced, and the rdf data become the definitive resource. This may all sound complictaed but it's actually reasonably elegent as a way to enable a steep development/results curve and also build a robust and usable dataset which is nonetheless easily integrated with document-centric data. How simple something sounds is entirely dependent on how it's described of course, and things that sound simpler may not be once you enumerate all the mappings required and the potential for mismatches broken connections, etc.

Tim Wilkinson said...

The 'semantic web' is where the future is, and has potential to make research and related activities much more effective and efficient. A huge amount of data is available for import or dynamic access via rdf, and much more is on the way.

Planning for this kind of capability means the project has the potential, while starting from a few simple wiki collaborations, to become a seriously useful and even cutting edge tool. That also means programmers are more likely to be interested, too.

Re: learning PHP - fair point but how many options does that leave us? FWIW I think it might be a reasonably easy pathway anyway as there are so many extensions already written, we're likely to be mostly modifying existing code in fairly small ways to start with - a good way to pick up a new language fairly painlessly, in my experience. Not really for me to tell you picking up a new language is no problem, though! But I'd also hope we will get some more programmers on board once we have a tangible project under way anyway.

Unknown said...

Yes, I happened across the page on "Skins" shortly after writing that comment. The CSS, at least, is something I can help with (although I don't know that I have a sufficiently good eye for design).

Granted, we can't base all of our decisions just around what programming language I like. Fair point that initial development should make it easy to pick up. Just to justify my concern somewhat, I'm coming at this from the perspective that I know Python really well, know the useful libraries it has, know how to find more libraries, know how to manage projects in it, etc. However, given that using MediaWiki would have much smaller development requirements, that sort of knowledge won't be as important.

We seem to have been approaching this from somewhat different perspectives, presumably due to our different backgrounds. I was thinking of it as a sort of policy version of open-source software development, with some novel discussion capabilities. You are coming at it from a knowledge-base perspective. This is a very interesting idea, but not one I know much about in practice. It seems like a new approach to this sort of work and as such strikes me as very much a high-risk, high-reward thing to do.

John Walsh said...

Good point about approaches - part of finding a way forward is discovering each of our predilections (I will get round to writing an intro tomorrow). I'm beginning to wonder if it would be good to have 'technologies' and a 'member involvement design' strands to the project.

Re languages, I know php fairly well (via lamp stack projects) and have recently needed to learn a little Python (raspberry pi and home automation) - Chris, I'd have thought that as you know Python well, php would be easy to learn?

Tim Wilkinson said...

Yes I think that's right - I definitely see a knowledge base as an important aspect of the thing - to support generation of the various kinds of factual and discursive materials discussed. Wikipedia is after all a knowledge base, just a very rough and ready one.

THe way I see it is that for policy discussion to really have impact it needs to be backed with policy papers (such as a policy unit or think tank might produce). And that in turn needs (a) research and (b) a finished document. This is just what a wiki is good at. Arranging contributions around reserach and conclusions around document production has some great advantages:

1. Rather than endless back-and-forth, arguments will tend to be centred on providing evidence and sources. This focus means a much greater likelihood of converging - ideally, in Piercean fashion, on the right answer (I simplify)!

2. (Side benefit/'synergy'): In the process of marshalling evidence to support a position, people will be contributing to a knowledge base that can be used both for further policy development in similar (or indeed different) areas, and for the whole range of working areas that have been discussed here.

3. By arranging things around producing a document, we help to move thing forward. Decisions have to be made - one 'current version' is going to prevail and everyone concerned must make their case or lose the argument by default. More generally having a document to work on focuses the attention and provides a catalyst for discussion.

4. This fits well with the model suggested by I think at least two people - and with which I agree - of starting with a complete (or relatively complete) policy document and letting discussuion form around it. Not that discussants would necessarily proceed by amending (a copy of) the content of the policy document itself, but that is possible - and indeed outline or partial documents could be supplied - to be worked on by the policy development team. This could even be a useful and productive model for more directed policy development tasks.

It's probably cklear enough that my approach here is to regard policy discussions as at least in part as factual - or more generally perhaps, capable of being supported by better or worse sources (which might be Gramsci or Eurostat), and as such not being intirely distinct from research.

Regarding the riskiness, I thing that is largely mitigated by an approach that allows us to start small and in one area, then to roll out the rest of the 'master plan' as incrementally. Say we start with a wiki-based discussion of a policy document. We can then attach a semantic engine and graph database, load up data and try getting people to start using semantic tags etc - via a suitable gui. Perhaps we load data that we iknow is relevant to a discussion into the database anbd encourage people to cite it via the application. When people cite a web page, it is archived, analysed, decomposed to some extent and stored as a document which will start to bukld associations ( it has one already - the project in which it was cited). Next time someone wants to cite that web page, it will already be in the data store and linked to other content that may also be relevant.

None of this requires a huge upfront effort with uncertain results. It can all happen dowen a incremental path, and if some extension doesn't work, the rest of the system can continue and another path be taken.

This is all still under-specified of course at this stage, but I hope it gives a flavour. Basically we should think big in terms of the future potential we build in, while starting small and proceeding in with manageable chunks (and even dribs and drabs) each of which is worthwhile in itself and and an improvement, and moves us in the direction of more efficient, machine-assisted research - which is undoubtedly where the future lies.

I think this is a pretty compelling vision myself and eminently doable. But I don't wnat to get too carried away if this is not what anyone else wants!

Tim Wilkinson said...

I'd say actually the analogy with software development helps to illustrate some of the ideas I'd like to see incorporated into our system (with the obvious proviso that policy development is a lot more complicated and 'messy' than the relatively tractable world of programming).

Think of the back-end I'm proposing as the software repository - when code is written, it is version controlled and put into testing queues, etc. The code is organised into menaingful and independent chunks which can be re-used - whole libraries may be developed which can then be imported into other projects, etc. When requirements change, you may want to go back to look at previous versions of the code, etc etc.

Not sure if this idea is of any use in any way, but just going back to the issues I have with Wikipedia's unstructured talk pages - maybe these are analogous (not in every way obviously) to comments in code. One thing programmers like is when code is so clear and well-crafted it's almost 'self-documenting'. We want to equip people with the tools to generate policy proposals that are so well constructed and researched that they are unarguable. (Obviously this is very much one of those 'direction of travel' things!)

The time isn't far off when we'll be able to drag and drop bits of evidence into a paper and have the logical and even evidentiary relations between them calculated on the page. It's not here yet but anything we can do to facilitate people's ability to generate powerful content has got to be a good thing.

Here's another example. Say we set up Vulture Watch, a site which lays out all MPs interests in Preivate health care and the private drug and medical supplies industries. Rathedr than some guy painstakingly compiling an HTML table with the data in patchily sourced and rapidly outdated, this information is stored in our knoweldge base and the web site published based on queries from that data. We set up feeds or web scrapers to update the data, people who know about eh prioject are on the look out for new info and can submit it when they see it in news stories or whatever. The data itself will be linked in the db to those MPs - so the rapid rebuttal team can cite it instantly, with impeccable sources, as part of a reply to some remark by a Conservative MP. Leter someone thinks wouldn't it be a nice feature to include NHS-releted speeches made by these MPs from Hansard, and NHS-related policyinitiatives they have been involved in. Someone runs a search for that data - some will already be held, the rest is (transparently) collected from the wider web and stored. It can now be added to the next published version of Vulture watch (we donlt automatically update the site in situ because we prefer to run manual checks and to have a manageably small number of identifiable page versions for the site (each one is maybe published to archive.org?) - and anyway the Vulture Watch webmaster needs something to do!

Perhaps I should do more of this kind of use case - I have a habit of being too compressed and abstract in exposition.

Tim Wilkinson said...

Forgot to add (re: 'Arranging contributions around research and conclusions around document production has some great advantages') that a common and very wasteful problem in (among others) policy discussions is people talking past each other, or disagreeing on the basis of some vague 'stance' that isnlt necessarily much to do with the real issues under consideration. Having a concrete document to work on means the issue under dispute should be in quite sharp focus, and doing so in terms that are expected to be substantiated in some way or other should mean that people will need to be clearer about what point they are making.

Tim Wilkinson said...

Maybe we switch tack, start with what we definitely agree on and see how far that takes us.

So - we agree that we want a uniform look and feel?
- presumably a uniform interface would be good too then
- especially if it's actually fairly familiar - those tending to join us would I should have thought have a higher statistical probability of being aquainted with wikipedia editing, versions, reverts etc. We can advertise for & recruit them.

And that it should be web-based?
- presumably any rich user interface would probably be implemented in javascript and probably use JSON (and/or JSON-LD, XML)?

We should try out a wiki
-We should try out some use cases on them. Formulating a policy document, publishing a wiki page to a separate website, seeing if it will work as a Momentum intranet / flat communications network

Unknown said...

@John Walsh
I'm sure it wouldn't be too much trouble for me to learn PHP (although I haven't heard good things about it). It's just that I'm really busy for the next few weeks, then I'll be travelling, then I'll be moving into new accommodations... Needing to learn a new language to contribute just makes it that much harder to find the time. Furthermore, it would take quite awhile before I have the level of comfort and familiarity with PHP as I do with Python. But as I way, we can't base our design decisions around what programming language I like.

John Walsh said...

Sorry Chris, slight misunderstanding here probably because of my use of the term 'learn'. I didn't mean 'learn' in the sense of become fully competent with, rather, I meant that as you know Python it should be trivial for you to make sense of bits of php you might come across when, for example, installing and configuring a wiki. This probably isn't the case the other way round (going from php to Python) - whereas Python is a 'proper' programming language, php could be thought of as a lightweight layer between a web server and web pages. Hope that helps.

John Walsh said...

Tim, good to see your Vulture Watch example (alongside more abstract descriptions of the 'semantic web'). I tend to think of such things in the context of cultural-neoliberalism and what the Americans call post-truth politics. It also brings to mind the idea of one bloke in JC's office (Neale Coleman?) acting as Head of Policy and Rebuttal - what a waste of member skills and resources to have one bloke doing this when there are thousands of members willing to provide input.

Having said that, I'm yet to be convinced that the 'semantic web' can produce, except in certain discipline specific areas and, of course, in the world of advertising (parodied so well in Private Eye's 'Malgorithms' section). I guess the nub of the problem comes to light in the Vulture Watch example with the idea that feeds and scrapers will gather data (along with people on the look-out) and then the data becomes authoritative because it's part of the system. Or that health related speeches will be garnered from simply searching the web. In essence, though, isn't this the work of an investigative journalist and the credibility of the data is to a large extent dependent on the journo's standing?

For me, the power of the system would be rooted in participation by members, lots of members beavering away finding out previously hidden data. As such, the problem to solve isn't technical it's a people thing - about attracting members to be involved.

Tim Wilkinson said...

I see the automated side of things as strictly an enabling tool for human activities. The problem with malgorithms is that some machine has been put in charge under facile instructions, or left to free-associate. The kind of examples I'm talking about are perfectly consistent with (and intended to assist) human research - but rather than supporting one Twitter meme or obscure blog post, the research goes into a common pool and can be picked up or built on by any other member of the collective. So far as automatic semantic 'tagging' is concerned, I'd say we'd want it to be human-checked, certainly at first; though things like meta tags etc mean that we won't be relying on fuzzy algorithms and the non-sanity of robots to do our thinking for us. In other words, nothing forces us to subordinate human decision to robotic - but robots can do a lot of the tedious work of sorting and manipulating data in straightforwardly 'deterministic' ways. At the same time, it may be that for exploratory analysis, blue-skies thinking etc, (and cherry-picking, perish the thought) we might one day in the far off future incorporate modules that can perform probabilistic-statistical, associative & Bayesian types of processing - again, so far as it is useful to human beings who do the work. But while I'd like to allow a pathway open to this kind of stuff, it's going to be a long time before we even start to consider implementing anything like that - I'm happy to dioscount the possibility and let this proposal stand on the merits of more immediate benefits.

On the usefulness of automation: I see the role of automatic data dumps, feeds, scrapes, etc as happening in the background aHnd (from the perpective of the user) simply functioning to make data (Hansard, the public whip, ONS, OBR, etc etc) more available and more easily searchable. The ONS website is a nightmare to find anything in. Try it - you search for keywords and get a bewildering array of results, then drill into 'data sets', select the series you want, then download a spreadsheet with slabs of text, merged cells, dates as text etc etc.

Given the initial system is set up to allow it, adding Atom feeds etc to pull in, tag and the data would be a trivial task; some kind of mappings (xslt or whatever it may be) can present the data in a way that is useful for our purposes, make it much more interrogable and available and allow it to be combined more easily with other data sets. As a further - optional - pathway for possible further development, this starts gently to introduce the idea of making it easier for people to combine data sets (or rather to avoid doing it wrong) by defining/analysing the measures and dimensions used in datasets and preventing (or less intrusively for 'power users', warning about) invalid comparsions being made.

Tim Wilkinson said...

The above addresses the issue of credibility of data too. Journos need credibility only when they are in effect acting as a witness - we are asked to trust what they tell us about their anonymous source, what they saw in Aleppo, etc. - or an expert(!) - we're asked to trust their assessment of [whatever it is they've just run off 300 words on].

In the general case, however, journalists provide sources (to their editor at least, even if they're lax about supplying themn in the text (a cynic might suggest they don't really think it's the public's business to demand such things). And tracking and documenting sources is ideal work for a computer program/database. Instead of copying and pasting URLs, copying the author's name, cobbling it all together in some kind of reference, all the citations - of the most credible source yet identified by the collective - are instantly available in a avriety of formats, as inline link or footnote as we choose. We can drill down a chain of sources, quickly see which sentnce in each web page is held to provide the evidence in question, etc etc.

Again this is not something that has to be worked out in full before we can get started - but incremental, non-disruptive efforts over time can certainly get to us to a point like this - all the technology is there and there is no leap of programming inspiration required to join up the dots. I say we keep this pathway open.

Re: people v technical. No tension or tradeoff there - in fact a positive relationship. If we have a good technical project (and IT/internet is basically the medium we will be operating in) we'll attract more participants, both techies and 'users' and those straddling those categories to varying degrees. Certainly we need to take initiatives to recruit people - there are lots of avenues to pursue there - I'll set up a thread - and we can do that in parallel with developing key IT infrastructure.

One thing that occurs to me is that the idea of setting up a flat communications network for Momentum members would not only be an important end in itself, but if we succeed in becoming the default notice board/message exachnge centre for Momentum members, getting them to dip a toe into research/policy discussion/signing up to media monitoring team/whatever is going to be pretty easy, as is putting out calls for particular skill sets, types of contribution, etc. If we are right about the need for this kind of thing and the untapped energy residing in the membership (and we are!) then we will be pushing at an open door in trying to recruit people.

I'll leave further comment (memo: e.g. on the importance of participant status, encouraging enabling & supporting people to take on responsibility, etc) for the recruitment thread.

Unknown said...

One of the big points you make in favour of the semantic web is that it allows people to reuse existing datasets. This is a great goal, but what I'm not clear on is how you propose to make this data browseable. This may simply be due to being unfamiliar with the Semantic Web, but I'd think that unless people already know what resources are available, they'll have quite a hard time finding the data that they are looking for on our system.

Tim Wilkinson said...

I guess much new info would be shared simply in virtue of being added into wiki pages (more or less transparently to the user) - but some kind of 'related topics' template would find more stuff, and could traverse the network to arbitrary depth (and - further riefinements - use various criteria & weightings to prioritise). I guess also queries could be generated from the user interface fairly straightforwardly (though it would take some develeopment - I'll be learning SPARQL among other things if we go down anything like this route, so can do/check/coordinate much of the work on that kind of stuff.) But that kind of thing is easier to scope, design and implement as incremental enhancements once we're further down the road (if we go down this kind of road).

Background - the 'graph' data is basically an enhanced kind of triple (thing1..relation..thing2) store - anything can be linked to anything else (the 'things' can themselves be triples - facts). The query language (and engines) for this kind of data are geared to searching through this kind of network structure and finding connections. Not saying we want to get into all that to start with, but being able to once the resources are available is useful, and adds an aspect of future-proofing, because this kind of thing is going to become much more prevalent even in the next few years.

This stuff all looks very daunting looked at as an all-encompassing blueprint, but I'd reiterate that the ambitious-sounding stuff isn't necessary for the system to get off the ground and start functioning effectively. I just want to have a broad vision of how the system might be developed, and make sure our short- and medium- term plans don't preclude the possibility of making continuing progress along that general 'direction of travel'.

Tim Wilkinson said...

I shoud probably add that in the database field, YouArentGonnaNeedIt is a high stakes bet, because you don't get a second chance to store data once the chance to do so has been passed up, i.e. the data has been discarded (like a used kleenex).

That's why I want to make sure the ability to gather data from the start is built in, even if getting that data back out may not happen immediately as a matter of standard practice (it can always be interrogated on an ad-hoc basis by us). Data, and especially data sets, are almost always found to be useful in one way or another, and often in unexpected ways. Here we have a very clear idea of how they can be useful; the only issue is how quickly we get round to making the fullest use of them.

Here's a thing about the 'semantic web', FWIW. Oceanography gets a name-check:


Tim Wilkinson said...

Chris, what was the wiki system you were thinking of using for a demo setup?

John Walsh said...

Re The Semantic Web Revisited, interesting selection (a 2006 positioning paper) and thus very useful for helping to now position the niche within the field. I see the WSRI has now become the Web Science Trust and have recently held their annual Web Science Conference. All very interesting and no doubt worthwhile as a research niche. The problem is, how have we gotten to a point where Labour Roots is being dominated by comments about a tiny information research niche? Exploring the expertise we all bring is of course very worthwhile but can we leave niches until a bit later and instead focus on the job in hand (which seems to have drifted into the background).

Those Labour First guys would love to see what we're getting up to here - no need for them to intervene here, we'll do their work for them ...

Unknown said...

Tim: It was a fairly small and simple wiki system for the Django framework, called Waliki. You can see an example of it in action (complete with reskinning) in an Argentinian group of Python users: http://www.python.org.ar/wiki/

It's not as powerful or well-developed as MediaWiki, but it is easily extensible and, I think, would serve our purposes if not for the whole semantic web. It would also be considerably easier to integrate with forum software and any other sorts of apps we'd like to develop in future. Something I'd like to do, if I can find the time, is write an extension which would allow some existing forum software for Django to act as a discussion page for each wiki article.

Tim Wilkinson said...

OK thanks, I'll take a look.

I have been looking at https://tiki.org/HomePage which seems quite mature and fairly well-supported, but *possibly* - this is only a vague impression at present and I may be quite wrong - also a bit too pre-packaged and inflexible?

Tim Wilkinson said...

Also going to look at http://www.xwiki.org

I should probably set up a Wiki thread - do we all agree that we will want wiki software for at least some things? How central should wiki technology be?