The Open Mirror: a feasibility study

Jisc is conducting a feasibility study into the “Open Mirror”, which would provide access for the world to the open access research outputs from UK researchers. It would be an aggregation of all UK Open Access content, based upon the network of institutional repositories in the UK. It might support better discovery and access, text-mining, business continuity, management information and preservation services.

The current work is a feasibility study, implying no commitment by Jisc to proceed; it is to assess how practical and valuable an Open Mirror might be. The study runs to December 2013. It is a Jisc co-design project, in partnership with our co-design partners, principally RLUK and SCONUL, but also UCISA and RUGIT. Other than Jisc, the project team comprises Clax Ltd, EDINA, Petr Knoth and Zdenek Zdrahal, Rosemary Russell, Clare Ravenwood, Jeremy Atkinson and Naomi Korn.

The Open Mirror would be achieved by a combination of:

– harvesting the relevant contents from UK institutional repositories, using standard protocols;
– harvesting the relevant contents from international subject repositories, using either standard protocols or bespoke arrangements;
– collecting the relevant articles from publishers, initially via a series of bespoke arrangements, but working toward a common accepted protocol;
– making the resulting content available and visible for the long term, in useful ways;
– partnering with related services;
– building consensus, acceptance and support among all stakeholders.

We would like to ask for feedback on the idea of an Open Mirror. What do you think would be the benefits, the costs, the risks, the opportunities? Overall, would you support it?

Please use the comments facility below to comment on these, or email directly to Neil Jacobs, n.jacobs@jisc.ac.uk.

Future posts here will ask for feedback on the Open Mirror’s feasibility, implementation options, and the kinds of services that it might offer.

8 thoughts on “The Open Mirror: a feasibility study

  1. Ian Cooper

    provide access for the world to the open access research outputs from UK researchers

    That’s OK, but carries the same risk that Copac (and to a lesser extent Archives Hub) has carried – that you carry the costs for infrastructure for use that might end up being used more by non-UK users (simply playing a numbers game). As a man in the middle of discovery, but not resource provision, you also lack useful feedback in the value of your service, algorithms, data quality.

    It might support better discovery and access, text-mining, business continuity, management information and preservation services.

    I’m not sure I see how such a service would offer business continuity, unless your technical implementation (to address some of the points I just raised) is to operate as a caching proxy. In which case I think you have some scary resourcing issues to consider.

    Flippantly I want to ask “why not just ensure the metadata is good so that it can be surfaced by a user’s favourite search engine?”. We already know that the big search engines are already used as one (if not the) main starting point. Is there really a desire to fight against that with yet another aggregator (“but this is a trusted resource”) or is the effort not better spent working with the search companies in order to ensure that academic material is better surfaced?

    1. Neil Jacobs Post author

      Good points, thanks Ian. I guess the argument is that, by making UK OA material more discoverable in a range of ways, we increase the impact of UK research, and that’s a good thing, wherever the users come from. In that way, it’s different to the COPAC / Archives Hub example. And yes, absolutely this has to be partly about search engine optimisation and working with the big search engines. The argument there being that an aggregation might be better able to (i) do SEO – eg by exposing a big corpus of metadata that Google Scholar likes, and (ii) gain the interest of the big search engines for dialogue.
      The business continuity point is also well-made. There could be potential in working with a LOCKSS model to enable this?

  2. John Norman

    My initial comment would be about the economics – what benefit would the mirror be providing to whom at what cost? I must declare an interest since we are working on the business/technology approach I am about to describe, but it seems to me that a cloud-hosted, multi-tenant repository software that incorporates the best of e-Prints, DSpace, Fedora (maybe) and Pure would be preferable because it would provide UK infrastructure at much lower cost, freeing institutions to put their limited resources towards maximising the quantity and quality of information in the repository. APIs would allow the information to be presented in custom ways for local use and the multi-tenant feature would allow a consistent interface with local branding. The service could interface with preservation services such as those emerging from DuraCloud and discovery options such as Google Scholar.

    So the benefits would be more content of higher quality at lower cost, consistent interface without loss of flexibility, and retention of local brand identity. Services like picking up content from publishers are then direct benefit to the institutional repository and new services can be developed on a shared-cost basis across the UK sector.

    Of course you may be able to articulate a different set of benefits for a mirror service (which looks like extra cost and extra demand on resources at first blush), but I’m not readily seeing those from the description I have seen.

  3. Hugh Glaser

    Interesting times.
    Depending on what would be provided, this could very well be an expensive activity, with significant ongoing costs, so it is good to ask whether it is wanted, useful etc..
    It could also be deeply disruptive to the OA process, while it is still at a very delicate stage.

    My view is that something like aggregators are useful, even essential, to get value out of all this work that contributors and repository providers are doing.
    But these are not “mirrors”.

    For want of a better description I will use Virtual Open Repository (VOR), rather than mirror or aggregator.
    Skip to *** if you get bored.

    Mirroring is nearly always wrong on the Web, other than for performance &c..
    A big point about the Web is that you don’t go around copying data and republishing it; it is already available somewhere, and you point at it. If you copy it, you incur problems and costs associated with synchronisation and other things.
    You may need to go and get the data, so that you can add value and then publish metadata, but, like Google etc you then point at the original, which is what people want (although because you have the pages you can provide a cache for when things go wrong (preservation service)).

    So what I would like to see is something that provides the facilities that add value to the UK research output, while definitely not being a mirror.
    Technically it might do all the things a mirror needs to do (for example because it would need to harvest the texts to do text mining), but it would lead users to the articles in the repository archives, not its own copies.

    This is in contrast, I think, to systems such as http://www.researchgate.net and http://trove.nla.gov.au/ (amazing resource!) and indeed http://www.mendeley.com/, which can actually make it quite hard, or almost impossible to get back to the original source.
    An interesting thing about these systems is that they are very user/searcher oriented (which is great).
    However, this means that they take less cognisance of the interests of the repository provider.
    What is the end game of where content should be offered?
    Will repository providers be able to justify their costs if there is no visibility?
    We are still at a very delicate stage of this process, and something like this can deeply upset the socio-technical landscape – what point in the expense of a repository if institution managers can simply tell people to deposit directly into the mirror?
    And if asked, why would a mirror funded by JISC refuse to offer such a service?

    I suspect that Arthur is right, and Trove is quite close to the concept in your question, although over a much wider range of material.
    Studying what is going right and wrong with Trove would be good, although things like government mandates for publication vary between countries.

    Of course, another question is what should be publicly-funded and what should be left to the private sector for added value?
    And how should public funding support the system.
    I favour something that provides a view over UK repositories which provides basic and some more sophisticated facilities (such as text mining), but does not seek to go to the level of things like Trove.

    ***
    An initiative of this sort could provide a polished VOR, that provides a view over a range of repositories chosen by the VOR publisher.
    (There may actually be such things – I stopped looking a while ago – sorry if that is the case.)
    I could be deployed by anyone, to provide a VOR over whatever content they wanted, especially if it was possible to choose individual repository records or searches as inputs.
    But why restrict to geographical VORs?
    Thus the same software could be used to provide a VOR over Physics or North Sea fish or whatever else a special interest group might want.
    And why only one level?
    Having a VOR for Cornwall, the South-East, England, UK, EU, Europe might well be useful.
    And any VOR should itself publish as if it was a repository (sorry, Arthur, it seems Trove doesn’t publish OAI-PMH :-(), so that it can be consumed.
    I now see Elly’s message about http://www.narcis.nl/ – it seems to me that this is much more what I would like to see.
    A lightweight way of getting to things.

    This would achieve all the objectives you outline, without disrupting things, and the costs might be kept down.
    It would also be less likely to become a single point of failure in hard times.
    I would also say that it should be made Open Source, and JISC could use its position to initiate an exciting community for what is a much-needed facility.

    What about JISC getting together with NARCIS and any others to kick start a community project?
    Or just starting itself.
    JISC would need to put resources into writing the core code, but there are many around the world that would join once the credibility has risen.
    And companies would get involved and contribute, so that they could then sell added-value services.

    This could be a real opportunity to move the OA world on towards the vision that many of us have!

    Sorry to go on so long – I do get excited by this stuff, and couldn’t stop typing.
    Hugh

  4. Hugh Glaser

    Where is the British Library in all this?
    They, like other libraries, have a legal obligation to preserve publications in the UK.
    And it turns out they are now doing this for digital preservation.
    It may be that we should not rely on them for everything, but we should at least look at the limits of what the BL does, and relate it to any of their activity.
    And I think that the digital preservation of repository content comes well inside what we all expect the BL to do.

    I (through my government) am paying the BL a lot of money to do this.
    Why would I also (through JISC) pay more lots (sic) of money to do it?
    I never expected JISC to fund the physical collection of papers from UK academics when they were only on paper – that would have been bizarre.
    If the BL isn’t doing what we want, in terms of gathering publications and making them available for scholarly access, then it is the BL that should be influenced to do it properly, not pay more money to mirror(!) the activity – that is bizarre.
    And if we really want to provide a mirror, then getting the data from the BL, rather than harvesting ourselves, should be the cheapest, most obvious way to do it.

    JISC’s limited funds can be used to add other value that is outside the remit or budget of the BL, such as “smarts”; or [perhaps better still by facilitating a culture and infrastructure so that value-added services by commercial organisations can be made economically attractive.

    Sorry, I don’t know if the BL are up for this level of interest yet, but they will be in the time it would take for anyone else to be able to do this.
    Best
    Hugh

  5. Chris Banks

    If the primary objective is to increase the visibility of UK OA publications then a model, along the lines that the BL already offers through EThOS, or through registration with discovery systems (e.g. Primo Central) might be the most effective way forward. EThOS already harvests many repositories for theses and the central service has certainly increased the visibility and use of UK theses. It may be but a small step to set up a parallel service to harvest metadata for published OA content.

    Whilst there is already much duplication of repository activity across institutions, we are nonetheless at yet another delicate junction: if, as expected, HEFCE mandate that outputs for the next REF should be made available “through” the institutional repository then the complex set of interrelations of systems within institutions might slow what otherwise seems to be a natural next stage development – the VOR solution that Hugh Glaser outlines.

    1. neil jacobs

      Hugh, yes, we’re writing up the work now and will post links to the reports here as soon as they’re public. Sorry for the delay. Neil

Comments are closed.