Wanted : science tools for the digital age

The internet may still be less than 10,000-days old, it still fails to deliver for scientists.

By empowering institutions to efficiently track down the number of publications, pushing even further the drive to publish many half-baked ideas and follow the hype instead of long-term research. It is true that it had never been as simple to get access to a paper and makes life easier on many aspects– collaboration often just requires sending an email, but new hurdles have appeared, and these should be removed.05e2e400dd1165870b3787a527e4e753Here is a bunch of ideas on how to use the new digital tools we have at hand to make research easier and thus more efficient, and a limited overview of what we have now.

Dissemination of knowledge

One of great claims of the Internet is that all the information is available, instantly and supposedly for free. The trouble, however, is that knowledge always comes with a price tag. As such, one gathers only the free, often second hand information or a collection of articles, while there are so many books available that could give a better picture of the field… when they’re bought. This is responsible for the phenomenon Daniel Kahneman calls “What you see is what there is”, thus giving the impression that some subjects are worth investigating, while in reality they’ve been studied over and over again and solutions to problem well-known.

  • There should be free access to all scientific articles– authors are funded by public entities
    Some journals are now open-access, such as PLoS One or Optics Express, but the most prestigious (thus important) publications are still behind a paywall. Authors have copyright, and they can easily make their findings freely available or their website, but sadly this is not yet a common practice
  • There should be a database for everything, with open access
    Every article should publish its data online, especially the support data that is not deemed good enough to be part of articles but still carry a lot of value (video, etc.)
  • A working, live document for science– like Wikipedia, but without the restrictions
    Reinventing publishing would be fantastic.  This would bring immense flexibility, by allowing half-baked story to emerge (Matters does it) and updating past articles. The peer-review requirement would be challenging (quality), and giving proper credits would be hard to attribute (and thus grants and funding).
  • A proper social network
    Now there’s ResearchGate and MyScienceWork, but they’re not backed by any major entity and don’t have enough penetration to be truly effective– now they only serve to promote oneself. In addition, that could bring much more feedback and new ideas to an article, which is now limited to the revisions by co-author and reviewers, or people citing the article.
  • Convenient digital object identifier (for data, affixed to any figure)
    Every data mentioned somewhere should be referenced. DOIs are nice, but they are not flexible enough. A “bit.ly” equivalent would be great (for sharing on slides etc.)
  • There should a proper editing language, that would abstract Word or Latex and allow to turn things in a nice format reading (pdf), editing (html) or proofing.
    Word sucks for layout, while latex makes it very hard for proofing, and requires package and extra efforts for anything beyond simple graphics. And Latex2Html is deprecated
  • Math-write should be standard on the web
    There’s MathML and MathJax, but they’re not available on all platform. Even Wikipedia has this ugly math feel because of the png conversion, which morevover makes copying difficult
  • Plotting should not be that hard , expressing idea not so difficult
    d3.js or tangle.js do a good job at that; they could/should be better integrated to python or Matlab.
  • An equivalent to Evernote for scientists, where you can pull out data out of your simulation software, and edit it after the fact.
    It is very difficult to get data out of the software (after simulation, acquisition or processing), or you have to use proprietary software such as Origin. Postscript and similar other vector formats allow it– but it’s very clunky). Even more ridiculous, there is no proper container for data– people are using TIFF or HDF5, but I’m always amazed how hard it is to share complex data (i.e. data with a real and imaginary part).
  • A working, efficient bibliography manager, with highlights etc.
    There’s Readcube, Mendeley and Zotero, but they have a hard time dealing with documents other than articles and are not very helpful to get new articles. Google Scholar is a good tool, but still does not allow you to sip through all the relevant new articles.

Theoretical aspect of science

  • Pull out data from a graph (and create a giant database)
    Most data available in the literature is available in the form of plots, which are very difficult to get data from. There are some attempts to digitize plots (1,2), but they’re sub-optimal. Ideally, any plot submitted to a paper should have its equivalent as a table available somewhere.
  • Have an unified database for all things science
    There are database for many things (optical refractive index, plasmids, nucleotids, supernova), but they should all be gathered and consolidated. The data in the CRC handbook should be freely available. Some other efforts here at Berkeley Lab : Material Project, or computations like multilayer simulator.
  • Redesign BLAS, LAPACK etc. so that you don’t have to use compilers that were designed at the peak of Cold Wars
    Millions of hours have been wasted trying to install g77 in vain– while virtually every other program is built on bloated frameworks– give us something that works, even if we lose 20% in efficiency.
  • A software to make simulation ans share them easily, without requiring a license
    Ipython and Julia (with REPL) do the work and they are cross platform, but they’re still clunky.
  • Easy computational optimization
    Computing power is horribly wasted on parallel processors. Why can’t there be an easy/transparent way to properly use parallel computing, without having to be a grad student with time to spare on the CUDA manual (which eventually will not work on a different platform, and the whole thing unusable to someone not acquainted with pragmas and other ugly syntactic hacks?).

Experimental aspect of science

  • A software suite where you can easily get data out of an instrument without too much hassle
    LabView, and to some extent Simulink do the job, but they are proprietary, very difficult to maintain and work only with a limited number of devices. There are systems such as LabRAD or EPICS, but they are in need of an overhaul and cross-platform capability.With colleagues, I wrote a full Matlab suite (instrument control and UIs) for our own use, but I’m not (yet) allowed to share it.
  • Using (and reusing) tablets for display and control
    Nowadays, screens are everywhere; it would be nice to re-purpose old phones and tablets to serve science, let’s say as a monitor or a user interface. If only they could be officially “jailbroken” past a certain date…
  • An augmented reality tool where you can use your phone to get the reading
    Lots of old instruments only offer analog or digital reading and no outputs (or obsolete interfaces like parallel or GPIB). Using the camera of a phone, it could be possible to log these readings– often times precision would be good enough and would greatly help tuning procedures.
  • A database or off-the shelf incarnation of Arduino projects for science
    e.g. : Arduino Analog-to-digital internal converters are limited to 10bit. By using cheap components available, it is possible to have a 24bit device for less that $100 (ten thousand times more precise)

I’ve worked on some of these things out of necessity but– oh boy, there are so many things to do!

Matlab reusable instrument control and user interface

Matlab reusable instrument control and user interface

24-bit ADC with Arduino

24-bit ADC with Arduino