Tips, stories and advice about phenotype-based small molecule screens
A few questions are frequently asked by new small molecule screeners. Below are some bits of advice that I wish someone had given me before I got started. I have learned a lot of things the hard way; in fact that is how I learn most things, but then again, I enjoy the steep parts on the learning curve.
What concentration should I screen at?
There is obviously no correct answer to this question, but here a few things to consider. Most chemical libraries are sold with compounds dissolved as 10 mM stocks in DMSO. If you are performing phenotype based screens in living organisms, the effects of DMSO can put a ceiling on the highest concentration of compound you can screen at. With Arabidopsis, we have found that DMSO needs to be kept at or below 1% to prevent growth inhibitory effects; that is, there are minimal effects of DMSO in a quantitative cell expansion assay in etiolated hypocotyls at 1%, but above this, you can start to see reductions in hypocotyl length. Given that, the highest concentration we'd consider screening at is 100 uM, (stock solutions are usually 10 mM). We perform our screens at 25 uM and do not get too excited about hits if they require higher concentrations to induce robust phenotypes. This will obviously vary from case-to-case, and I can imagine scenarios where our 25 uM rule would miss interesting compounds, but that's our general baseline.
A comment on compound solubility.
On the topic of concentration, an additional consideration is compound solubility. For the LATCA project, my lab has done 1000s of dose curves (usually at concentrations between 1 and 100 uM). In doing these, we saw numerous cases where compounds crash out, usually at concentrations above 50 - 75 uM. The evidence of this can be seen indirectly from oddly shaped dose curves (potency decreases at high concentrations), and more directly, and obviously, from the formation of crystals or precipitates in screening wells (this is very obvious with colored compounds, but can also be seen with most compounds as cloudiness or speckled plates, if you look closely). Keep in mind that the compounds present in screening libraries are usually "Rule of 5" compliant, which means that, in short, they are designed or selected to have properties that help promote passive diffusion across membranes. The average CLogP for most library members is around 3, which means they are predicted to partition about 1000 times more into the organic phase of an octanol-water mixture. Many compounds in a library will have a CLogP near 5. As a result, aqueous solubility is always a factor to consider when designing screens and characterizing compounds. For most libraries out there being sold, solubility will not be a major concern at concentrations at or below 25 uM (this will depend on the compounds present, though).
What’s this Lipinski “Rule of 5” all about?
Lipinski extracted and analyzed a set of ~2000 small molecule drugs or late-stage trial compounds from the World Drug Index (where all compounds and formulations of compounds are registered). This dataset of validated bioactive small molecule drugs were examined for 4 characters: LogP, MW, number of H-Bond donors (HDon) and number of H-Bond acceptors (HAcc). The major factor that motivated Lipinski’s famous study was the desire to eliminate compounds with poor solubility / permeability from entering and / or moving down the drug-discovery pipelines at Pfizer. The paper and his famous Rule of 5 is often credited with the goal of identifying “drug-like” compounds, but it really came from the other side of things- eliminating bad compounds from further consideration. The reason it worked so beautifully is that the vast majority of drug uptake is via passive diffusion, which means that most small molecule drugs / perturbagens need to share certain properties to make uptake as good as possible. Too big (MW>500), you get stuck in the membrane and cross more slowly. To greasy (LogP>5), you have a harder time going into solution in the first place. Lipinski showed that a combined Mass > 500 and CLogP > 5 was rarely observed in small molecule drugs, and these are the two properties that Lipinski identified as contributing most to the goal of spotting “bad” compounds. Additionally, more than 5 hydrogen-bond donors or 10 H-bond acceptors are also viewed as problematic and were not commonly observed in small molecule drugs. Since my lab is a plant lab, you may ask if the Rule of 5 is applies to agrichemicals? The answer is yes, pretty much, with a few differences probably not worth making too much of a big deal about when it comes to the academic lab contemplating which libraries to purchase. Again- the name of the game is getting across cellular membranes: physical properties that interfere with that process are likely to cause problems in almost any organism examined.
Lipinski’s paper (which is highly readable and recommended) spawned an explosion in chemical informatics and a great number of other metrics have been enlisted identifying compounds unlikely to make good drug candidates including: number of rotatable bonds, predictions of aqueous solubility, total polar surface area and others. Often times, the newer metrics examined are highly correlated with MW and / or LogP, so it can be hard to tease the importance of these various factors apart. An excellent review appeared in Nature Reviews Drug Discovery recently for those with more of an interest in this topic. If you are a bioinformaticist reading this and thinking – hey I’d love to look at X’s dataset and computational tools – be forewarned. The same open source models that dominate biology were not adopted by chemical informaticists in the early days and the tools and datasets are not easy to obtain and / or are costly. One recently released software package that I highly reccomend is Thomas Girke's ChemMineR, an R package that contains many of the most commonly used routines for analysing and clustering compounds.
Do stock solution concentrations matter?
Short answer: Yes, at least, sometimes.
Long answer: I am now going to start sounding like the anal-retentive chef, but this one is not so obvious to most new screeners. The concentration of a stock solution matters (not just the final assay concentration) and changing the stock concentration can affect assay results. This is particularly true for compounds with aqueous solubility problems. For many compounds it may not matter at all- but it can occasionally make a huge difference in results and it is wise to keep track of stock solution concentrations used to make assay plates.
I will relate the importance of this with the experience Yang Zhao and I had with one of our compounds of interest, hypostatin. Hypostatin is quite a greasy compound and shows solubility problems above 50 uM (this is evidenced by the formation of small crystals / aggregates in plates after they have solidified). When we first isolated hypostatin, the screening stock used was 2.5 mM (in DMSO). Hypostatin was the first really nice hit in our natural variation project- and I was quite excited when we saw it, asking the students involved for details almost hourly (I have calmed down a bit since then). Hypostatin's effects retested several times in independent replicates in our 96-well assay system, so I was satisfied that it was a "real" hit (or more appropriately a "well-behaved" hit). When it came time to do more characterization, we purchased mg quantities of the compound and observed that the original phenotype was not reproducing.
HPLC-DAD analysis of the new and old stocks showed that they looked the same, so everyone in the lab was baffled. Yang, the main student on the project, mentioned that he noticed "milkiness" when adding the compound to agar that he had not seen before. The new and original assay plates both contained 25 uM hypostatin, but it turned out Yang was using a new, more concentrated stock (I think it was 20 mM, instead of 2.5 mM). Changing the stock solution back to 2.5 mM fixed everything, and the project was resuscitated from a potentially very frustrating end point. Admittedly, this particular problem is not going to be an issue for most compounds. Nonetheless, it is important to keep track of details so that the source(s) of problems can be located when bizarre things pop up. Stock solution concentrations can matter, so keep track. Pay attention. I credit Yang with that particular save because he is very attentive to details.
Should I bother with replicates?
Yes, definitely! But there is one thing I suggest you consider. Just because a hit does not appear in both replicates does not mean it is not real. A number of factors can contribute to poor behavior of an individual compound in the primary screen, so if a hit looks golden, but you only see it in one rep, you might consider following it up to see if you can get it to behave better. This is true even if you have a very reproducible and consistent assay. In some cases the issue is the chemical and not the assay.
Do I need robotics / liquid handling equipment to start screening?
Short answer: No.
Long answer: It can be very helpful, but it is not necessary and it can even slow you down a bit in the beginning because of the learning curve for operating and programming liquid handling. My advice is this: run a pilot screen using a small library like LOPAC, Spectrum or LATCA. For this pilot screen, you can use multi-channel pipetters, or more preferably, pin tools. Pin tools are cheaper on a per-screen basis because there are no tip costs, but they have an initial cost that is high. Pin tools will increase your throughput and reduce the possibility for error, but if you are trying to be as cheap as possible, a multichannel pipetter is the way to go. After your pilot screen, you will have developed a sense for ways to optimize the screening process and learned about molecules that are active in your assay. Once you have this information, you can decide if you want to scale things up. For larger screens, investing in automation (through a screening center or collaborator or whatever) is probably worthwhile. My lab does a lot of screening and at this point I cannot imagine working without our liquid handling facilities. Having said that, all the screens that went into developing LATCA and our natural variation project (a screen of ~14K compounds against 8 strains) were all done without liquid handling (not by choice, though)- so don't let the absence of a liquid handling station nearby deter you from initiating a project. I should point out though, that there is a belief by some that robotics are necessary for screening. I had an early grant that tried (and failed) to get the LATCA project funded. One major criticism of the proposal was that we had a screening section that included the use of pin tools and this indicated that we were not equipped properly for the research.
How do you make your screening plates?
Using a pin tool or pipette tip, we spot 1 ul of a DMSO stock solution into the well of a 96-well polypropylene plate. DMSO is greasy and likes to spread out over the surface of polystyrene. We then add 100 ul of molten agar, that is kept at 55 oC with a stirring heat block with an attached temperature probe. The the molten agar is applied with rapid expulsion force at the side of the well to promote compound mixing. Plates are allowed to solidify at 4 oC for several hours to overnight.
Who should I buy a screening library from?
There are dozens of companies that sell individual compounds or "diversity" sets that are subsets from a larger compound collection in plated formats. If you want a list of all of the vendors out there, a great collection of commercially available compounds can be found on the Shoichet lab's ZINC website. The number of commercially available compounds is in the millions (>8 million purchasable according to ZINC, but I don't know if that is non-redundant or not). A not-too-old publication by Baurin et al. that I found useful found 1.6 million unique compounds from 2.7 million compounds listed for sale from 23 vendors (this paper has a table with all of the vendors, which is quite handy). So there is no shortage of compounds available, and a great number of them are Rule of 5 compliant (again see the Baurin et al. publication). My lab has used pre-plated diversity libraries from Chembridge, Tripos and Sigma/TimTec. We have additionally made our own cherry-picked libraries using compounds from Life Chemicals, Asinex, Chembridge and Vitas M (more on “cherry picking” later).
OK, so now here is what I wish knew and or thought of before I got into this:
Before you buy a library, get a clear statement from the vendor about restocking policies, resupply pricing and the prices charged for resynthesis of compounds. These issues are critical, because you don't want to be stuck with a great hit you cannot follow up. In general, for any given diversity library you are considering, the vendor will know how many of the compounds are currently still in stock. You and others may end up screening the library purchased for several years, so if 10% are out of stock now, what will it be in 5 years? Some vendors will guarantee resupply even if a compound goes out of stock, which is a great policy (i.e. they will cover the resynthesis costs in the event the original stock dries up). When pressed, the vendor may try to side track the issue or say that it varies from case to case. Have none of that. It is in your best interest to push them for a firm statement and policy. You may find it acceptable if 5% of their compounds are out of stock if resynthesis costs are low. But if their resynthesis costs are ridiculously high (as I experienced in one case), then you may be in for unexpected headaches. One other thing to keep in mind is that a great number of compounds are available from multiple vendors, so if you are caught in a bind, search before you contract the synthesis of a hit compound. Having said all of that, it has generally been my experience that I can usually get enough of a hit compound from the original vendor or another source for follow up studies. So you shouldn’t waste months agonizing over the "perfect" decision, but try to get all the relevant information up front. Unfortunately, the one time I did have trouble with resupply was also with a vendor that had high resynthesis costs. This is why I wish I had had this particular advice earlier!
I got a hit, now what?
The very first thing is to make sure it is real. If you can dip back into the original screening plates, do another set of tests to confirm activity (for good reasons, many centers and / or drug czars will not let you cherry pick from their wells). Whatever the case, you should next restock your hits to confirm their activity again. This is because what is in the well of the screening plate may no longer be what the vendor originally sold you (more about this later- but the vendor keeps solids which are much, much more stable than compounds in DMSO solutions). As a parenthetical comment- you should be wary of publications or colleagues that make big claims based only on assays from library screening plates. The data is frequently reliable, but there are enough examples of problems with the stock solutions that compounds must be restocked QC’d before any sound (i.e. publishable) claims can be made (more on the QC later). Before you do any restocking though, consider the following. Most vendors have a graded pricing scheme where the cost per mg compound goes down substantially as the number of compounds purchased increase. It is therefore in your best interest to be organized about what compounds you want and to order as many as you need at the same time.
If your screen yielded many hits, you should inspect them to identify trends, or use similarity-based clustering tools (ChemMine, JKlustor) to identify clusters of similar compounds. From this preliminary analysis, you may identify potential structure-activity relationships that could help guide you in the selection of analogs, which is my next point. Depending on how many hits compounds you have, and your budget, you may want to purchase analogs as well. Analogs may, in some cases, be more potent / active than the parent hit, their main value is usually that they can provide closely related, but inactive, negative control compounds.
OK, I've still got a hit, and I still need to know: now what?
My next bit of advice is what I consider generating a null hypothesis about mechanism. It is easy to think you have discovered a molecule with amazing biological properties that targets a protein one else has ever "drugged" before, but the reality is that medicinal chemists and agricultural chemists have been doing this kind of stuff for a very long time, and you cannot ignore their impressive and voluminous "prior art". Additionally, the scaffolds used in many library syntheses are often inspired (stolen?) from well characterized and easily synthesized classes with validated bioactivity (as one example the dihydropyridines, which are best known as anti-hypertensives / calcium channel antagonists). So there is a good chance that a little, or maybe even a lot, may be known about a compound that is similar to your hit.
In my opinion, if there is a compound reasonably similar to yours and its mechanism is known, the null (or skeptics?) hypothesis should be that your compound has a mechanism of action similar to the known compound. I do not mean to imply that this is generally likely to be true (obviously varies from case to case)- but before a “new mechanism” hypothesis is chased down, you might consider some simple experiments to rule out the null hypothesis, if you can. The flipside to my advice here is that there are many examples of chemically related compounds that work by different mechanisms (the sulfonamides are a classic example, but there are others too). So the observation of strong similarity between a hit and a published molecule is not going to prove anything. Nonetheless- you should examine existing similarities and consider if the mechanisms of related molecules might explain the phenotypes you are observing in your assay. Given this long-winded preamble, the real question to be answered is:
How do I identify what is known about my hit and closely related compounds?
The answer is Scifinder Scholar. In an ideal world, chemists would have built their databases in the open source model that drives biology (but they are working on this, for example the PubChem project). But at the moment, the definitive repository of all knowledge about all small molecules resides in the hands of the American Chemical Society and their cash cow: Scifinder Scholar. Scifinder is really an amazing thing, because it keeps track of the compounds listed or used in all published literature (including patents). So, once you have a molecule of interest, you HAVE to search Scifinder to learn if anything is known about it or related compounds. To not search Scifinder after discovering a new small molecule of interest would be like cloning a gene and not performing a BLAST search. Most campuses have a site license for Scifinder, so for academics, getting access to this tool is usually not a problem. In spite of that, it is still very frustrating that you can't just download all the data (or even small chunks of it) and mine it like you might mine whole genome data with perl scripts and the like, but I'm off topic.
More to follow...