Data science Machine learning Data analysis Python

Published:
Est. reading time: 5 minutes
Author: Mia Hatton

Python is a popular programming language with a wide variety of applications including data science, web development, scientific computing, and software development.

Mia Hatton

Budding data scientist with an entrepreneurial and science communication background.

More

What is Python?

Python is a popular programming language with a wide variety of applications including data science, web development, scientific computing, and software development.

From python.org:

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Python’s simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.

The distinguishing features of Python include:

It is a high-level language

Python is relatively easy to learn - compared to assembly languages and machine languages - because a lot of functionality such as memory addresses and call stacks are under the hood.

It is an interpreted language

Python does not need to be compiled but instead is executed line-by-line, by an interpreter. This means that errors can be left unchecked in code that does not run, but it does have the advantages of platform independence and greater flexibility compared to compiled languages.

It is dynamically typed

In languages such as Java and C#, variable types need to be declared, e.g. int x = 1. In Python, the type does not need to be declared (x = 1). If you try to do something that will throw a type error (e.g. by attempting to add an integer to a string, 2 + "3"), it will be caught at run-time. Dynamically typed languages allow variable types to be changed automatically, for example:

x = 1
# x is an integer

x = "Hello World!"
# x is now a string

It is an object-oriented language that also supports functional programming

Object-oriented languages allow users to define, create and edit their own types, which allows for efficient code-reuse and flexibility. Being able to create custom classes allows for a modular structure that improves readability and ease of troubleshooting.

Functional programming is a paradigm in which code is encapsulated in mathematical functions, and is supported in the Python language.

Other advantages of the Python language

Readability

Python is designed with readability in mind. To illustrate this concept, have a look at these two scripts, each of which prints the sum of 1 and 2 to the screen:

// Java code

public class SumOneAndTwo {

    public static void main {
        int x = 1;
        int y = 2;

        System.out.println(x+y);
        // Prints 3
    }
}
# Python code

x = 1
y = 2

print(x+y)
# Prints 3

It is clear from this example that Python is less verbose than Java, although its dynamic typing makes it less clear what type each variable is.

The founding philosophy of Python is summarised in the Zen of Python, which among other constraints, defines Pythonic code as beautiful, explicit, and simple.

Packages

The base Python language is supplemented by a huge number of packages that offer additional functionality. PyPi, the Python Package Index, currently lists 221,311 projects, so it is highly likely that someone has already done the work you need to get your own project started.

Popular Python packages for data science include:

  • Pandas is data analysis and manipulation tool that makes data wrangling easier
  • NumPy is a scientific computing package that provides multi-dimensional array objects and tools to work with them
  • Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python
  • Scikit-learn provides a variety of tools for predictive data analysis
  • TensorFlow is an AI library designed for development and training of machine learning models

Popularity of Python

Python is frequently cited as the language of choice by data scientists, and its popularity continues to rise. According to The PYPL PopularitY of Programming Language Index, which judges language popularity by the frequency of Google searches for tutorials, Python is the most popular language and its popularity has grown by 19% in the last five years (as of March 2020).

IEEE, which combines metrics from a number of sources to rank languages by popularity, listed Python as the number one programming language across all language types in its The Top Programming Languages 2019 report.

Getting started with Python

To start using Python, you need to install it. For data science, installing the Anaconda distribution is recommended. The Anaconda distribution includes Python, R, and a number of common data science packages such as Matplotlib, as well as Jupyter Notebook (see below).

You can install Anaconda from this page.

Development tools

Python is an interpreted language so does not require a compiler. You can write Python code in any text editor, but a number of IDEs (Integrated Development Environments) are available that make coding easier with features such as autocomplete.

Popular IDEs for Python include:

  • Visual Studio Code
  • pyCharm
  • Spyder
  • Sublime Text
  • Atom

Jupyter Notebook is built around Python, and is a popular solution for collaborative data science development in the cloud. A notebook can contain a mixture of Python (or a number of other languages) code, text in markdown format, and code outputs such as plots.

Learning Python

If you want to upskill your team or learn Python yourself, the following resources will help you.

Books

Websites

  • Learn Python offers a free, step-by-step course to learn Python from scratch
  • DataCamp features a large number of Python-based, interactive data science courses
  • Codecademy has a number of interactive Python courses available, including some specific to data science skills
  • W3 Schools offers free tutorials in a large number of programming languages