Category Archives: resources

Rise of the Machines

Recently, there’s been a lot of interesting activity in the field generative AI for science from large companies such as Google, Meta and Microsoft.
Creating new materials from scratch is difficult, since materials involve complex interactions that are difficult to simulate, or a fair amount of luck in experiments (serendipity is scientists’s most terrifying friend)
Thus most of these efforts aim to discover new material by accelerating simulations using machine learning. But recent advances (such as LLM, e.g., ChatGPT) have shown that you can use AI to make coherent sentences instead of a word soup. But the same way cooking is not just about putting ingredient together all at once but carefully preparing them, making a new material involves important intermediate steps.  And new approaches can be used create new materials.

The various steps of making a new material (from Szymanski et al.)

Last month, Google in collaboration with Berkeley Lab announced that their DeepMind’s Gnome project had discovered a lot of new structures: Google DeepMind Adds Nearly 400,000 New Compounds to Berkeley Lab’s Materials Project. They managed to actually make and analyze some of those new materials ; that is quite a tour de force, and while there’s some interesting pushback on the claims, it’s still pretty cool!
In September, I invited Meta’s Open Catalyst at Berkeley Lab (here’s the event description and the recording – accessible to lab employees only)

Zachary Ulissi (Meta/OpenCatalyst) and Jin Qian (Berkley Lab) at Lawrence Berkeley National Laboratory (September 2023)

Meanwhile, Microsoft is collaborating with Pacific Northwest National Laboratory on similar topics
Meanwhile, the research infrastructure has it gears moving; it seems that DeepMind’s AlphaFold is already routinely used at the lab to dream up new protein structures. I wonder where this will go!
Prediction is very difficult, especially if it’s about the future
– Niels Bohr
Thinkpieces blending chips and AI in full bloom:
We need a moonshot for computing – Brady Helwig and PJ Maykish,  Technology Review

The Shadow of Bell Labs

I want to resurface an interesting thread by my former colleague Ilan Gur:

Continue reading

APS DPB newsletter

My piece for the American Physics Society Division of Physics of Beams Annual Newsletter  about the Advanced Light Source upgrade has been published!

Here it is for your own delight:

Antoine’s guide to Marseille

Because of the Paris Olympics, many friend ask me for advice about Paris, and I refer them to the my Insider’s guide to Paris. But there’s another French city I recommend visiting to people: Marseille. It is a city on the Mediterranean, with a very rich culture – the city was founded by settlers from Phocaea 26 centuries ago, with lots of great food, sights and people.
Actibus immensis urbs fulget masiliensis
“The city of Marseille shines through his great achievements”
So here’s a bunch of things not to miss in Marseille:
– Notre Dame de la Garde (“La Bonne Mere”, or the good mother), the cathedral that sits on top of the city. Unique architecture and history, you can see it from pretty much everywhere. Walking up there is doable, or you can take a bus. When you go down, there is path that brings you to Roucas Blanc (the fancy, low-key neighbourhood of Marseille), if you feel like wandering (ask around.)

Continue reading

The pi rule

These days things are getting pretty busy on my end – so many cool projects to engage with and only 24 hours a day.

And you end up doing more things that you can accomplish. The reason often lies in the unrealistic assessment of the time it would take to complete a task, and I came across the “pi” rule, initially posited by my mentor Ken, with a pretty neat explanation from my colleague Val:

If you estimate it will take one unit of time to complete a task, the task will effectively take 3.14 (≈π) times more than you initially anticipated.

The reason for the difference between dream and reality  is that we generally do not factor in:

  • (1) the time it takes to ease into the task (e.g. collecting documentation, emails) and
  • (2) the time requires to document the work done (reports, emails)

Taken together with the times its take to accomplish a task, you end up with roughly a factor three – and you end up feeling terrible during the week-ends trying to catch up what you were set to do during the week, but got busy doing (1) or (2)

A corollary of the pi rule is the “next up” rule: if you work on project with a relatively large team, it generally takes the next unit of time to complete it (e.g. one hour become one day; one day becomes a week; a week becomes a months), generally because of the friction at the interfaces. Reducing these frictions at the interfaces should therefore be a priority.

Engineering interfaces in big science collaborations

I recently learned that my colleague Bertrand Nicquevert has worked extensively on a model to describe interactions between various counterparts:

Modelling engineering interfaces in big science collaborations at CERN: an interaction-based model
https://cds.cern.ch/record/2808723?ln=fr

Continue reading

Ladder of causation

I’ve read an interesting piece on Twitter from the always excellent Kareem Carr on the ladder of causation. I found it very interesting, because it allows you to go beyond the mantra “corelation is not causation“, and links statistics to the concept of falsifiability that Thomas Kuhn puts as central to sciences.

The Ladder of Causation

The Ladder of Causation has three levels:

1. Association. This involves the prediction of outcomes as a passive observer of a system.

2. Intervention. This involves the prediction of the consequences of taking actions to alter the behavior of a system.

3. Counterfactuals. This involves prediction of the consequences of taking actions to alter the behavior of a system had circumstances been different.

I even read the book from which – “The Book of Why” [Full book on the Internet Archive] by Judea Pearl, a Turing prize recipient who worked on Bayesian network. The book quite illuminating, mentioning a bit too often  dark figures such as Galton, Pearson and Fisher (it seems statistician get really high on their own supply.)

This certainly begs the question – “Why not?”

Continue reading

On Mentorship

This last month, I received two awards related to mentorship from Berkeley Lab. They both came as a surprise, since I consider myself more a student of mentorship than someone who has something to show for.

Berkeley Lab Outstanding Mentorship Award

Director’s award for For building the critical foundations of a complex mentoring ecosystem

I began to be interested in mentorship after I realized that mentorship plays a large role in the success of young scientist, (1) having experience myself the difference between having no mentorship and having appropriate mentorship (I’ll be forever grateful to my mentor/colleague/supervisor Ken Goldberg), (2) having had tepid internship supervision experience due to the lack of guidance, (3) realizing that academia is ill-equipped to provide the resources necessary for success.

While I was running Berkeley Lab Series X, I always asked the speakers (typically Nobel prize laureates, stellar scientists and directors of prominent research institutions) how they learned to manage a group, and they answer was generally: “on the spot, via trial and error”, what struck me as awfully wrong. If people don’t get the proper resources/training, many are likely to fail, and drag their own group down the abyss. In this post, I will try to share resources I gathered along the years, and what I learned about mentorship, and provide some resources I found useful. This is more descriptive of my experience than prescriptive, but I hope you find this useful.

Continue reading

Lamaseries

It’s been a few months since the ChatGPT craze started, and we’re finally seeing some interesting courses and guidelines, particularly for coding, where I found the whole thing quite impressive.

https://static.tvtropes.org/pmwiki/pub/images/llama_loogie_tintin.jpg

Ad hoc use of LLaMa

Here’s a few that can be of interest, potentially growing over time (this is mostly a notes to self.)

Plus – things are getting really crazy: Large language models encode clinical knowledge (Nature, Google Research.)

 

Updates on AI for big science

There’s a lot of things happening on the front of AI for Big Science (AI for large scale facilities, such as synchrotrons.)

The recently published DOE report in AI for Science, Energy, and Security Report provides interesting insights, and a much-needed update to the AI for Science Report of 2020.

Computing Facilities are upgrading to provide scientists the tools to engage with the latest advances in machine learning. I recently visited NERSC’s Perlmutter supercomputer, and it is LOADED with GPU for AI training.

A rack of Tesla A100 from the Perlmutter supercomputer at NERSC/Berkeley Lab

Meanwhile, companies with large computing capabilities are making interesting forays in using AI for science, for instance Meta, which is developing OpenCatalyst in collaboration with Carnegie-Mellon University, where the goal is to create AI models to speed up the study of catalysts, which are generally very computer-intensive (see the Berkeley Lab Materials Project.) Now the cool part is to verify these results using x-ray diffraction at a synchrotron facilities. Something a little similar happened with AlphaFold where newly derived structure may need to be tested with x-rays at the Advanced Light Source: Deep-Learning AI Program Accurately Predicts Key Rotavirus Protein Fold (ALS News)

Continue reading

Institutional Open Data

Things are moving in terms of Open Data! The Department of Energy has just released an update to it Public Access Plan (initially published in 2014), and embracing the use of persistent identifiers for papers and data, to promote the FAIR principles (Findability, Accessibility, Interoperability, and Reusability of data and metadata.)

Mariposa Lillies, from Alexis Madrigal of the Oakland Garden Club

And let me insist on the last bit:

Data without metadata is mostly useless

At the time where Twitter was a nice place to share thoughts and disseminate bite-sized knowledge, I thought the Twitter posts/URL were something akin to Digital Object Identifiers – you could post an image with caption, and share the link on your blog or with anyone (now Twitter doesn’t allow to share those so easily.) Zenodo allows you to creat actual DOI for your data (data will include your ORCID and metadata.), albeit not as user-friendly – and to some extent, github works the same way (the visualization and graphical content is not the best)

At Berkeley Lab, the Office of Research Compliance has updated its guideline, providing excellent resources to build a Data Management Plan.

Out Of Many

Last week I was lucky to meet with Vanessa Chan, the Chief Commercialization Officer for the Department of Energy and Director of the Office of Technology Transitions. She wanted to hear what kind of hurdles when it comes to start a company (hint: a lot.) I told her that a major, overlooked issue is that you generally to be a permanent resident to start at company in the US, whereas two-thirds of postdocs are foreign nationals and on visas. There are ways to get around the requirement (such as Unshackled), but it’s a little sad not more is done to provide support to those willing and able (plus – it is a well-known trope that many US companies are founded by foreign nationals, what I tend to believe is among what sets California apart from other states and other countries, where entrepreneurship doesn’t flourish as much as expected despite many efforts)

Conversation with Vanessa Chan