where to start
This page is a work in progress.
If you’re at the beginning of your Python experience, and you’re not totally sure where to begin, I’m writing this for you.
And I have to say, if this is your first foray into Python, I’m very excited for you! It’s easier now than ever before to manage Python on an operating system. For you, that means less time banging your head against a keyboard trying to get your libraries to cooperate with one another, and more time learning that language.
Are you excited yet?!
Note: Most of the tips here are based on experience with Mac OS. If you’re a Windows or LINUX user, use this resource with other documentation specific to your OS.
1. To get Python working, install Anaconda or Miniconda
I recommend using either Anaconda or Miniconda to install and maintain Python on your machine(s).
-
Anaconda
andMiniconda
are software packages that install the Python language, some other useful packages, and most importantly,conda
. -
conda
itself is an open source package manager that was built in Python and helps keep all of its libraries compatible. (Well, it was originally built for Python, but it’s technically language-agnostic. So if you find yourself using other open-source languages like R or Julia often, conda is a great way to maintain them.)
Should I download Anaconda or Miniconda?
If you’re brand new to Python, Anaconda will probably be a safer bet. It’s a little bulky and will take a little longer to install, but it will also give you the most options while learning the language. Miniconda is a stripped down version of Anaconda, so if you don’t have much disk space, go with that. Personally, I like Miniconda, since it’s more lightweight for laptops, shared computers, or login nodes where disk space is limited.
Why conda?
This is getting a little ahead of myself, but with conda, you can install multiple parallel “environments” of Python (or whatever other languages you prefer). That means you can have a Python installation you use most of the time and an older one that works with some random chunk of code you inherited from someone else who worked on an older version of Python.
Another powerful aspect of conda is that you can use it to install non-Python-related software. For example, you can set up a single or different environments for NCO (NetCDF Operators), CDO (Climate Data Operators), and the NCL (NCAR Command Language). This, in my opinion, is what makes it so valuable; see the post on setting up these environments for more information.
Anaconda/Miniconda are free (their developer, Continuum, offers proprietary add-ons, but there’s no reason you’ll ever need those).
Alternatives to Anaconda/Miniconda
“So what about pip for installing Python? I know someone who seems to prefer that.” Sure, pip is great! But for what it’s worth, pip comes with Anaconda and Miniconda, so you may as well go with one of those instead. They work well together.
“Hmm… ok, and what about Canopy? I think I met a ghost once who uses it!” Canopy seems like it could be great, but it’s not free to use the full distribution, and the Python community really seems to be gathering around conda these days.
2. Mess around with conda and get the hang of it
Alright, now that you’ve decided which one you want, install it and start learning about conda. I can’t do this as much justice as the half-hour getting started with conda documentation. That will get you where you need to be.
3. Install the libraries you’ll need most
If you have Anaconda, some of these will already be on your system. If instead you went with Miniconda, you’ll likely need to grab a few extra things. The most useful libraries for any Python installation are below:
library | main use |
---|---|
numpy, scipy | core Python tools |
matplotlib | plotting |
jupyter | Jupyter Notebook and related tools |
My favorites more specific to Earth science data analysis include:
library | main use |
---|---|
cartopy | plotting maps |
pandas | loading/saving .csv, .txt, and other spreadsheet files; general panel data statistics package |
xarray | pandas for 3+ dimensions; NetCDF and HDF input/output; quick plotting |
netcdf4 | NetCDF data analysis; great as secondary option to xarray |
gdal | library and packages for the Geospatial Data Abstraction Library; useful for reading in HDF and geotiff files (remote sensing data sets) |
wrf-python | wrapper for Fortran functions that analyze WRF output |
seaborn | more plotting options; has a nice color bar builder and interfaces with ColorBrewer |
cmocean | really great color blind-friendly colormaps |
You can install these one at a time:
conda install matplotlib
Or you can install a bunch at once, e.g.:
conda install numpy scipy matplotlib pandas xarray jupyter
If you install cartopy, I’d recommend going for the conda-forge channel option (per the recommendation of the folks over at SciTools, who develop it):
conda install -c conda-forge cartopy
Warning: I think
cartopy
is the best option for geospatial plotting, since it’s the replacement for the soon-to-be-retiredbasemap
and will give you less grief down the line. But if you’re inheriting code from others, or if you simply preferbasemap
for one reason or another, they don’t work well together (the short reason:cartopy
uses a package calledshapely
, andbasemap
doesn’t work with it installed). So you’ll have to pick just one on your default environment.
4. Go forth and code
Make sure you install the four packages in the first table above (numpy
, scipy
, matplotlib
, and jupyter
). This will get you the main ingredients you need to get familiar with Python.
Probably the easiest way to learn the language is using Jupyter Notebook, which takes Python and ports it through a browser window, providing a great interface where you can add notes, images, and even Latex to document your workflow.
To start up a notebook, you want to navigate to a directory where you’d like to save it on your computer, then type:
jupyter notebook
If all works smoothly, your default web browser will pop up with a window, and you’re good go to.
I recommend skimming the Jupyter Notebook documentation and going deeper on some tutorials from here. YouTube has plenty of good videos, and here are some website options:
- Dataquest’s Jupyter Notebook for Beginners
- An unofficial (but seemingly legitimate) Quick Start Guide
- This Getting Started tutorial on Medium
5. Optional: Read about how to set up some useful conda environments
Check out my approach on setting up NCO, CDO, and NCL using conda alone.