skip to navigation
skip to content

Planet Python

Last update: October 21, 2020 04:47 PM UTC

October 21, 2020


Real Python

Level Up Your Skills With the Real Python Slack Community

The Real Python Community Slack is an English-speaking Python community with members located all over the world. It’s a welcoming group in which you’re free to discuss your coding and career questions, celebrate your progress, vote on upcoming tutorial topics, or just hang out with us at the virtual water cooler.

As a community member, you also get access to our weekly Office Hours, a live online Q&A session with the Real Python team where you’ll meet fellow Pythonistas to chat about your learning progress, ask questions, and discuss Python tips and tricks via screen sharing.

The aim of this guide is to help you:

  • Navigate some of Slack’s most useful features
  • Get the most out of the Real Python Slack community
  • Get your questions answered by other Real Python members
  • Learn how to communicate technical problems to your peers
  • Get comfortable with the tools you’ll use when you get your first (or next) developer job

We’ll update this guide periodically and welcome any recommendations or questions that you may have. You can share them with me (@Ricky White) in Slack or in the comments below. We’ll make all update announcements in the #hangouts channel of Slack.

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

Successfully Posting to Slack#

Slack lets you post messages and ask questions that everyone within the channel can see. You can also send direct messages to individual users. However, there are a few things to consider before you hit that send button. Let’s discuss those considerations in the context of the three most common ways you’ll post messages in Slack: replying, posting, and cross-posting.

Answering a Member’s Question#

The best way to respond to another member’s post is to use the Reply in thread button. Using threads has the advantage of keeping the entire conversation in one place, which won’t happen if you reply with a new post.

Here’s an example of how to use Reply in thread:

The thread feature is an excellent way to ensure that questions from other members don’t get buried in a flurry of answers to a previous question, which can result in questions being left unanswered.

Another benefit of threads is that they make it very clear to community helpers which questions have already been answered and which remain unanswered. This helps members determine where to focus their time and energy.

Posting Your Question#

When you run into a problem with your code, you may be tempted to jump on Slack, write out your problem, hit Send message, then copy in your code and hit Send again. You may even want to write a more detailed question or explain the solutions you’ve already tried and then—yep, you guessed it—hit Send again.

That’s three posts for the same question.

This approach seems harmless and is technically possible in Slack. But which post are people supposed to respond to? The question, the code, or maybe the initial post where you stated the problem? It’s unclear.

Instead, you should make sure your problem, question, and code are all contained in just one post. This allows people to follow the guidelines for replying in one succinct thread instead of across multiple threads, which can lead to repetition in responses.

You’ll learn more about the best ways to structure your questions and format your code in a bit. For now, all you need to know is that limiting your question to a single post will benefit you and the rest of the Real Python Slack community.

Cross-Posting#

As a general rule, you should try to avoid cross-posting your question to several channels. Cross-posting might seem like an efficient way to get more people to see your question so that you’re more likely to get an answer, but it often has the opposite effect.

Read the full article at https://realpython.com/community-slack-guide/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

October 21, 2020 02:00 PM UTC


Stack Abuse

Matplotlib Scatter Plot - Tutorial and Examples

Introduction

Matplotlib is one of the most widely used data visualization libraries in Python. From simple to complex visualizations, it's the go-to library for most.

In this tutorial, we'll take a look at how to plot a scatter plot in Matplotlib.

Import Data

We'll be using the Ames Housing dataset and visualizing correlations between features from it.

Let's import Pandas and load in the dataset:

import pandas as pd

df = pd.read_csv('AmesHousing.csv')

Plot a Scatter Plot in Matplotlib

Now, with the dataset loaded, let's import Matplotlib, decide on the features we want to visualize, and construct a scatter plot:

import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('AmesHousing.csv')

fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(x = df['Gr Liv Area'], y = df['SalePrice'])
plt.xlabel("Living Area Above Ground")
plt.ylabel("House Price")

plt.show()

Here, we've created a plot, using the PyPlot instance, and set the figure size. Using the returned Axes object, which is returned from the subplots() function, we've called the scatter() function.

We need to supply the x and y arguments as the features we'd like to use to populate the plot. Running this code results in:

matplotlib simple scatter plot tutorial

We've also set the x and y labels to indicate what the variables represent. There's a clear positive correlation between these two variables. The more area there is above ground-level, the higher the price of the house was.

There are a few outliers, but the vast majority follows this hypothesis.

Plotting Multiple Scatter Plots in Matplotlib

If you'd like to compare more than one variable against another, such as - check the correlation between the overall quality of the house against the sale price, as well as the area above ground level - there's no need to make a 3D plot for this.

While 2D plots that visualize correlations between more than two variables exist, some of them aren't fully beginner friendly.

An easy way to do this is to plot two plots - in one, we'll plot the area above ground level against the sale price, in the other, we'll plot the overall quality against the sale price.

Let's take a look at how to do that:

import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('AmesHousing.csv')

fig, ax = plt.subplots(2, figsize=(10, 6))
ax[0].scatter(x = df['Gr Liv Area'], y = df['SalePrice'])
ax[0].set_xlabel("Living Area Above Ground")
ax[0].set_ylabel("House Price")

ax[1].scatter(x = df['Overall Qual'], y = df['SalePrice'])
ax[1].set_xlabel("Overall Quality")
ax[1].set_ylabel("House Price")

plt.show()

Here, we've called plt.subplots(), passing 2 to indicate that we'd like to instantiate two subplots in the figure.

We can access these via the Axes instance - ax. ax[0] refers to the first subplot's axes, while ax[1] refers to the second subplot's axes.

Here, we've called the scatter() function on each of them, providing them with labels. Running this code results in:

matplotlib multiple scatter plots in subplots

Plotting a 3D Scatter Plot in Matplotlib

If you don't want to visualize this in two separate subplots, you can plot the correlation between these variables in 3D. Matplotlib has built-in 3D plotting functionality, so doing this is a breeze.

First, we'll need to import the Axes3D class from mpl_toolkits.mplot3d. This special type of Axes is needed for 3D visualizations. With it, we can pass in another argument - z, which is the third feature we'd like to visualize.

Let's go ahead and import the Axes3D object and plot a scatter plot against the previous three features:

import matplotlib.pyplot as plt
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D

df = pd.read_csv('AmesHousing.csv')

fig = plt.figure()
ax = fig.add_subplot(111, projection = '3d')

x = df['SalePrice']
y = df['Gr Liv Area']
z = df['Overall Qual']

ax.scatter(x, y, z)
ax.set_xlabel("Sale price")
ax.set_ylabel("Living area above ground level")
ax.set_zlabel("Overall quality")

plt.show()

Running this code results in an interactive 3D visualization that we can pan and inspect in three-dimensional space:

matplotlib 3d scatter plot
matplotlib 3d scatter plot

Customizing Scatter Plot in Matplotlib

You can change how the plot looks like by supplying the scatter() function with additional arguments, such as color, alpha, etc:

ax.scatter(x = df['Gr Liv Area'], y = df['SalePrice'], color = "blue", edgecolors = "white", linewidths = 0.1, alpha = 0.7)

Running this code would result in:

matplotlib customize scatter plot

Conclusion

In this tutorial, we've gone over several ways to add a subplot to a Matplotlib plot.

If you're interested in Data Visualization and don't know where to start, make sure to check out our book on Data Visualization in Python.

Data Visualization in Python, a book for beginner to intermediate Python developers, will guide you through simple data manipulation with Pandas, cover core plotting libraries like Matplotlib and Seaborn, and show you how to take advantage of declarative and experimental libraries like Altair.

Data Visualization in Python

Understand your data better with visualizations! With over 275+ pages, you'll learn the ins and outs of visualizing data in Python with popular libraries like Matplotlib, Seaborn, Bokeh, and more.

October 21, 2020 01:30 PM UTC

How to Iterate over Rows in a Pandas DataFrame

Introduction

Pandas is an immensely popular data manipulation framework for Python. In a lot of cases, you might want to iterate over data - either to print it out, or perform some operations on it.

In this tutorial, we'll take a look at how to iterate over rows in a Pandas DataFrame.

If you're new to Pandas, you can read our beginner's tutorial. Once you're familiar, let's look at the three main ways to iterate over DataFrame:

Iterating DataFrames with items()

Let's set up a DataFrame with some data of fictional people:

import pandas as pd

df = pd.DataFrame({
    'first_name': ['John', 'Jane', 'Marry', 'Victoria', 'Gabriel', 'Layla'],
    'last_name': ['Smith', 'Doe', 'Jackson', 'Smith', 'Brown', 'Martinez'],
    'age': [34, 29, 37, 52, 26, 32]},
    index=['id001', 'id002', 'id003', 'id004', 'id005', 'id006'])

Note that we are using id's as our DataFrame's index. Let's take a look at how the DataFrame looks like:

print(df.to_string())
      first_name last_name  age
id001       John     Smith   34
id002       Jane       Doe   29
id003      Marry   Jackson   37
id004   Victoria     Smith   52
id005    Gabriel     Brown   26
id006      Layla  Martinez   32

Now, to iterate over this DataFrame, we'll use the items() function:

df.items()

This returns a generator:

<generator object DataFrame.items at 0x7f3c064c1900>

We can use this to generate pairs of col_name and data. These pairs will contain a column name and every row of data for that column. Let's loop through column names and their data:

for col_name, data in df.items():
	print("col_name:",col_name, "\ndata:",data)

This results in:

col_name: first_name
data: 
id001        John
id002        Jane
id003       Marry
id004    Victoria
id005     Gabriel
id006       Layla
Name: first_name, dtype: object
col_name: last_name
data: 
id001       Smith
id002         Doe
id003     Jackson
id004       Smith
id005       Brown
id006    Martinez
Name: last_name, dtype: object
col_name: age
data: 
id001    34
id002    29
id003    37
id004    52
id005    26
id006    32
Name: age, dtype: int64

We've successfully iterated over all rows in each column. Notice that the index column stays the same over the iteration, as this is the associated index for the values. If you don't define an index, then Pandas will enumerate the index column accordingly.

We can also print a particular row with passing index number to the data as we do with Python lists:

for col_name, data in df.items():
	print("col_name:",col_name, "\ndata:",data[1])

Note that list index are zero-indexed, so data[1] would refer to the second row. You will see this output:

col_name: first_name 
data: Jane
col_name: last_name 
data: Doe
col_name: age 
data: 29

We can also pass the index value to data.

for col_name, data in df.items():
	print("col_name:",col_name, "\ndata:",data['id002'])

The output would be the same as before:

col_name: first_name
data: Jane
col_name: last_name
data: Doe
col_name: age
data: 29

Iterating DataFrames with iterrows()

While df.items() iterates over the rows in column-wise, doing a cycle for each column, we can use iterrows() to get the entire row-data of an index.

Let's try iterating over the rows with iterrows():

for i, row in df.iterrows():
	print(f"Index: {i}")
	print(f"{row}\n")

In the for loop, i represents the index column (in our case it's id001 will be the first row) and row contains the data for that index in all columns. Our output would look like this:

Index: id001
first_name     John
last_name     Smith
age              34
Name: id001, dtype: object

Index: id002
first_name    Jane
last_name      Doe
age             29
Name: id002, dtype: object

Index: id003
first_name      Marry
last_name     Jackson
age                37
Name: id003, dtype: object

...

Likewise, we can iterate over the rows in a certain column. Simply passing the index number or the column name to the row. For example, we can specify printing the first column of the row using by using:

for i, row in df.iterrows():
	print(f"Index: {i}")
	print(f"{row['0']}")

Or:

for i, row in df.iterrows():
	print(f"Index: {i}")
	print(f"{row['first_name']}")

They both produce this output:

Index: id001
John
Index: id002
Jane
Index: id003
Marry
Index: id004
Victoria
Index: id005
Gabriel
Index: id006
Layla

Iterating DataFrames with itertuples()

The itertuples() function will also return a generator, which generates row values in tuples. Let's try this out:

for row in df.itertuples():
    print(row)

You'll see this in your Python shell:

Pandas(Index='id001', first_name='John', last_name='Smith', age=34)
Pandas(Index='id002', first_name='Jane', last_name='Doe', age=29)
Pandas(Index='id003', first_name='Marry', last_name='Jackson', age=37)
Pandas(Index='id004', first_name='Victoria', last_name='Smith', age=52)
Pandas(Index='id005', first_name='Gabriel', last_name='Brown', age=26)
Pandas(Index='id006', first_name='Layla', last_name='Martinez', age=32)

The itertuples() method has two arguments: index and name.

We can choose not to display index column by setting the index parameter to False:

for row in df.itertuples(index=False):
    print(row)

Our tuples will no longer have the index displayed:

Pandas(first_name='John', last_name='Smith', age=34)
Pandas(first_name='Jane', last_name='Doe', age=29)
Pandas(first_name='Marry', last_name='Jackson', age=37)
Pandas(first_name='Victoria', last_name='Smith', age=52)
Pandas(first_name='Gabriel', last_name='Brown', age=26)
Pandas(first_name='Layla', last_name='Martinez', age=32)

As you've already noticed, this generator yields namedtuples with the default name of Pandas. We can change this by passing People argument to the name parameter. You can choose any name you like, but it's always best to pick names relevant to your data:

for row in df.itertuples(index=False, name='People'):
    print(row)

Now our output would be:

People(first_name='John', last_name='Smith', age=34)
People(first_name='Jane', last_name='Doe', age=29)
People(first_name='Marry', last_name='Jackson', age=37)
People(first_name='Victoria', last_name='Smith', age=52)
People(first_name='Gabriel', last_name='Brown', age=26)
People(first_name='Layla', last_name='Martinez', age=32)

Iteration Performance with Pandas

The official Pandas documentation warns that iteration is a slow process. If you're iterating over a DataFrame to modify the data, vectorization would be a quicker alternative. Also, it's discouraged to modify data while iterating over rows as Pandas sometimes returns a copy of the data in the row and not its reference, which means that not all data will actually be changed.

For small datasets you can use the to_string() method to display all the data. For larger datasets that have many columns and rows, you can use head(n) or tail(n) methods to print out the first n rows of your DataFrame (the default value for n is 5).

Speed Comparison

To measure the speed of each particular method, we wrapped them into functions that would execute them for 1000 times and return the average time of execution.

To test these methods, we will use both of the print() and list.append() functions to provide better comparison data and to cover common use cases. In order to decide a fair winner, we will iterate over DataFrame and use only 1 value to print or append per loop.

Here's how the return values look like for each method:

For example, while items() would cycle column by column:

('first_name', 
id001        John
id002        Jane
id003       Marry
id004    Victoria
id005     Gabriel
id006       Layla
Name: first_name, dtype: object)

iterrows() would provide all column data for a particular row:

('id001', 
first_name     John
last_name     Smith
age              34
Name: id001, dtype: object)

And finally, a single row for the itertuples() would look like this:

Pandas(Index='id001', first_name='John', last_name='Smith', age=34)

Here are the average results in seconds:

Method Speed (s) Test Function
items() 1.349279541666571 print()
iterrows() 3.4104003086661883 print()
itertuples() 0.41232967500279 print()
Method Speed (s) Test Function
items() 0.006637570998767235 append()
iterrows() 0.5749766406661365 append()
itertuples() 0.3058610513350383 append()

Printing values will take more time and resource than appending in general and our examples are no exceptions. While itertuples() performs better when combined with print(), items() method outperforms others dramatically when used for append() and iterrows() remains the last for each comparison.

Please note that these test results highly depend on other factors like OS, environment, computational resources, etc. The size of your data will also have an impact on your results.

Conclusion

We've learned how to iterate over the DataFrame with three different Pandas methods - items(), iterrows(), itertuples(). Depending on your data and preferences you can use one of them in your projects.

October 21, 2020 12:30 PM UTC


Codementor

How To Take A Screenshot Using Python & Selenium?

This tutorial will guide you how to use Selenium and Python to capture Python Selenium screenshots and check how your website is rendered over different browsers.

October 21, 2020 09:41 AM UTC

The More, the Better — Why Become a Multi-Language Programmer

Are you just taking your first step into web development, and you want to learn programming? Discover the benefits of learning more than one programming language.

October 21, 2020 08:41 AM UTC


Kushal Das

Fixing errors on my blog's feed

For the last few weeks, my blog feed was not showing up in the Fedora Planet. While trying to figure out what is wrong, Nirik pointed me to the 4 errors in the feed according to the W3C validator. If you don't know, I use a self developed Rust application called khata for my static blog. This means I had to fix these errors.

The changes are in the git. I am using a build from there. I will make a release after the final remaining issue is fixed.

Oh, I also noticed how bad the code looks now as I can understand Rust better :)

Also, the other Planets, like Python and Tor, are still working for my feed.

October 21, 2020 06:34 AM UTC

October 20, 2020


Python for Beginners

Datacamp Review 2020

One of the fastest growing careers out there is that of a data scientist. The reason may be that people are drawn to the hefty six figure salaries or that they have a knack for data analysis and love working with big data. 

DataCamp is the best source of reference material for data science. It is the first online learning platform dedicated to providing data science training to professionals seeking the knowledge and understanding of the topic. Established in 2014, DataCamp is a MOOC-providing platform. MOOC stands for Massive Open Online Courses meaning that the company specializes in providing online courses to students all over the world.

In this Datacamp review, I am going to tell how easy it is to use DataCamp then touch on the quality of courses offered. I’ll follow with telling you about some of the features you will find with DataCamp and how you can start exploring DataCamp for free before finishing up the review with the pricing and whether or not it is worth paying for DataCamp.

Ease of Use

When you first arrive at the DataCamp main page, you will notice that DataCamp is organized for you to learn, practice and participate in projects. Once you have selected a course to learn, DataCamp will setup a dashboard for you where half of the screen contains the instruction box and the other half of the screen is the coding box. The instructions teach you about the syntax of a language then present you with some sample code and an exercise. Here is where you get some practice coding. You enter code in the coding box and you can run the code and test the results and when you are satisfied with your code, you submit your code. You earn XP points if you get it correct, otherwise you are told that you did not code it right. 

The nice thing about DataCamp is that the instructions is both text-based and video driven. This is quite refreshing in my opinion because it is like attending school. There is reading material and instructor led instruction in the form of videos.

Another great thing about DataCamp is that it will quiz you along the way. Instead of a coding exercise, it will give you a multiple choice question to which you must select the right answer and submit. 

Course Quality

There are 314 courses in DataCamp and the number is growing steadily. As for the quality of each course, I would give it a rating of 9.0 out of 10.0. I hardly give out 10.0 ratings because I always leave room for improvement and with DataCamp, I think the quality is good to excellent but it is not perfect. I think DataCamp’s instructors can add a bit of real world experience in each course and give us an idea to some of the problems we will see in the real world situations. We need to see real life problems that naturally occur in today’s environments and how to apply our new skills towards solving them.

DataCamp Features

As a MOOC-based platform, DataCamp tries to be unique and provide a competitive edge when it comes to online training. Here are some features that DataCamp provides:

DataCamp Pricing

DataCamp is available for both the individual and for businesses. As an individual, you have three plans to choose from. Starting from the lowest level, an individual can sign up for free. This is an excellent way to start off and to give the platform a test drive. It will give you access to all courses but only one chapter per course. This may not sound like much but it would give you a chance to see the quality of training available and whether DataCamp is a good fit for you or not. 

Start Your Datacamp Free Trial

The next level which is the basic plan costs $25 per month and you get access to forty-four courses. The premium level costs $29 per month and you get access to over three hundred fourteen courses. Instead of seven projects, you get all eighty-one projects to work on. This is considerably more than the basic level and it is worth the money.

As a business, there are two plans: the Professional and the Enterprise. The Professional costs $300 per user per year and the Enterprise costs $499 per user per year. Both plans get access to all core courses and early access to new courses. 

Are DataCamp Courses Good?

Everyone agrees that DataCamp is really good at providing training to a particular skill but not so good at imparting critical thinking that is needed to solve complex problems or giving the confidence to work as a data scientist when facing unforeseen challenges. Still DataCamp’s approach to simplifying concreteness and interactivity is innovative. 

Is DataCamp Worth it? 

With DataCamp training, you will learn the skills needed to become a successful data scientist but will DataCamp help you land a job? Once you have completed your training on DataCamp, the real question is whether the Statement of Accomplishment, a certificate available from DataCamp to signify that you have completed data science training is worth anything in the real world? 

Unfortunately, most employers don’t think this certificate tells them anything about your skill or ability. It simply signifies that you took the initiative to self improve yourself from an academic point of view but it doesn’t certify you as an expert data scientist.

If it’s certification that you are looking for then no, DataCamp is not worth it. But if it’s data science training or preparation for working in the data science field then DataCamp is your best bet. 

The post Datacamp Review 2020 appeared first on PythonForBeginners.com.

October 20, 2020 08:49 PM UTC


PyCoder’s Weekly

Issue #443 (Oct. 20, 2020)

#443 – OCTOBER 20, 2020
View in Browser »

The PyCoder’s Weekly Logo


Python For Feature Film

A look into how Python is used to bring your favorite movies to the big screen.
DHRUV GOVIL

Data Management With Python, SQLite, and SQLAlchemy

In this tutorial, you’ll learn how to store and retrieve data using Python, SQLite, and SQLAlchemy as well as with flat files. Using SQLite with Python brings with it the additional benefit of accessing data with SQL. By adding SQLAlchemy, you can work with data in terms of objects and methods.
REAL PYTHON

Profile, Understand, and Optimize Code Performance

alt

You can’t improve what you can’t measure. Profile and understand code behavior and performance (Wall-time, I/O, CPU, HTTP requests, SQL queries). Install in minutes. Browse through appealing graphs. Supports all Python versions. Works in dev, test/staging & production →
BLACKFIRE sponsor

Reading Poorly Structured Excel Files with Pandas - Practical Business Python

Raise your hand if you’ve ever had to deal with a poorly formatted Excel spreadsheet. Wow, that’s a lot of you! Did you know you can use pandas and openpyxl to read even the craziest Excel sheets?
CHRIS MOFFITT

Python Booleans: Optimize Your Code With Truth Values

In this tutorial, you’ll learn about the built-in Python Boolean data type, which is used to represent the truth value of an expression. You’ll see how to use Booleans to compare values, check for identity and membership, and control the flow of your programs with conditionals.
REAL PYTHON

Exploring Fractals on a Cloud Computer

Fractals might be some of the most interesting mathematical structures to study and to visualize. Learn what fractals are and how to create beutiful fractal animations with Python.
ERIC MATTHES • Shared by Eric Matthes

New BBC micro:bit Released

Now with a built-in speaker and microphone!
MICROBIT.ORG

Introducing spaCy v3.0 nightly

EXPLOSION.AI

New Sound Pack for PyGame Zero

SEAN.CO.UK • Shared by Sean McManus

Discussions

How Can I Generate Three Random Integers That Satisfy Some Condition?

A little bit of algebra goes a long way. But also, when was the last time you got to use SciPy’s Diophantine equation solver?
STACK OVERFLOW

I’ve Accidentally Made a Weird Art Generator. Help Me… I Cant Stop Running It!

REDDIT

Python Jobs

Senior Full Stack Developer (Chicago, IL, USA)

Panopta

Senior Software Engineer (Remote)

Silicon Therapeutics

Senior Research Programmer (Remote)

Silicon Therapeutics

Java Lead With Python Experience (Sacramento, CA, USA)

Benvia LLC

More Python Jobs >>>

Articles & Tutorials

The Real Python Podcast – Episode #31: Python Return Statement Best Practices and Working With the map() Function

The Python return statement is such a fundamental part of writing functions. Is it possible you missed some best practices when writing your own return statements? This week on the show, David Amos returns with another batch of PyCoder’s Weekly articles and projects. We also talk functional programming again with an article on the Python map function and processing iterables without a loop.
REAL PYTHON podcast

Getting Started With MicroPython

Are you interested in the Internet of Things, home automation, and connected devices? If so, then you’re in luck! In this course, you’ll learn about MicroPython and the world of electronics hardware. You’ll set up your board, write your code, and deploy a MicroPython project to your own device.
REAL PYTHON course

Identify Issues in Your Python Applications Before It Affects Customers

alt

Quickly locate latency, bottlenecks, and other potential issues with detailed flame graphs and end-to-end distributed tracing using Datadog’s application performance management. Get started with a free Datadog trial today →
DATADOG sponsor

The Surprising Impact of Medium-Size Texts on PostgreSQL Performance

Learn how medium-size text fields impact query performance in PostgreSQL and how to gain performance benefits using the TOAST method. While the article isn’t strictly about Python, you’ll likely find it useful if you often store medium-to-large size text in PostgreSQL.
HAKI BENITA

Build Plugins With Pluggy

Plugin architecture can be a nice way to make it easy to add functionality to your project in the future, or allow third-party developers to extend your applications. Learn how plugin architecture works on how to use Pluggy to manage plugins.
KRACEKUMAR RAMARAJU

Getting Started With OpenTelemetry and Distributed Tracing in Python

Learn why distributed tracing is the foundation for observability, and how to instrument your Python applications with OpenTelemetry in under 10 minutes.
LIGHTSTEP sponsor

Play the Long Game When Learning to Code

When it comes to coding, taking the time to internalize and truly understand the concepts you’re learning pays dividends over memorizing syntax and solutions to interview problems.
DANIEL CHAE opinion

Monitor Your GitHub Build With a Raspberry Pi Pumpkin

Looking for a spooky way to track your build status? GitHub’s Martin Woodward has put together a 3D printed pumpkin that lights up to show his build status.
ASHLEY WHITTAKER

type() vs. isinstance()

What’s the difference between type() and isinstance() methods, and which one is better for checking the type of an object?
SEBASTIAN WITOWSKI

Projects & Code

evennia: Online Multiplayer Text-Based Game Framework

GITHUB.COM/EVENNIA

pandasgui: A GUI for Pandas DataFrames

GITHUB.COM/ADAMEROSE

lightly: Computer Vision Framework for Self-Supervised Learning

GITHUB.COM/LIGHTLY-AI

pyinstrument: Call Stack Profiler for Python

GITHUB.COM/JOERICK

nebulo: Instant GraphQL API for PostgreSQL & SQLAlchemy

GITHUB.COM/OLIRICE • Shared by Oliver Rice

PumpkinPi: Spooky Build Status Indicator

GITHUB.COM/MARTINWOODWARD

Events

PyTexas 2020 (Virtual)

October 24 to October 53, 2020
PYTEXAS.ORG

SciPy Japan 2020

October 30 to November 3, 2020
SCIPY.ORG


Happy Pythoning!
This was PyCoder’s Weekly Issue #443.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

October 20, 2020 07:30 PM UTC


Test and Code

135: Speeding up Django Test Suites

All test suites start fast. But as you grow your set of tests, each test adds a little bit of time to the suite.
What can you do about it to keep test suites fast?
Some things, like parallelization, are applicable to many domains.
What about, for instance, Django applications?
Well, Adam Johnson has thought about it a lot, and is here to tell us how we can speed up our Django test suites.

Topics include:

Special Guest: Adam Johnson.

Sponsored By:

Support Test & Code : Python Testing for Software Engineering

Links:

<p>All test suites start fast. But as you grow your set of tests, each test adds a little bit of time to the suite. <br> What can you do about it to keep test suites fast? <br> Some things, like parallelization, are applicable to many domains. <br> What about, for instance, Django applications?<br> Well, Adam Johnson has thought about it a lot, and is here to tell us how we can speed up our Django test suites. </p> <p>Topics include:</p> <ul> <li>parallelizing tests</li> <li>moving from disk to memory</li> <li>using fake data and factory functions</li> <li>targeted mocking</li> </ul><p>Special Guest: Adam Johnson.</p><p>Sponsored By:</p><ul><li><a href="https://monday.com/testandcode" rel="nofollow">monday.com</a>: <a href="https://monday.com/testandcode" rel="nofollow">Creating a monday.com app can help thousands of people and win you prizes. Maybe even a Tesla or a MacBook.</a></li><li><a href="https://testandcode.com/datadog" rel="nofollow">Datadog</a>: <a href="https://testandcode.com/datadog" rel="nofollow">Modern monitoring & security. See inside any stack, any app, at any scale, anywhere. Visit testandcode.com/datadog to get started.</a></li></ul><p><a href="https://www.patreon.com/testpodcast" rel="payment">Support Test & Code : Python Testing for Software Engineering</a></p><p>Links:</p><ul><li><a href="https://gumroad.com/l/suydt" title="Speed Up Your Django Tests" rel="nofollow">Speed Up Your Django Tests</a> &mdash; the book by Adam Johnson</li><li><a href="https://en.wikipedia.org/wiki/Kukicha" title="Kukicha" rel="nofollow">Kukicha</a> &mdash; "or twig tea, ..., is a Japanese blend made of stems, stalks, and twigs."</li></ul>

October 20, 2020 03:15 PM UTC


Python Morsels

Variables are pointers


Transcript

Variables in Python are not buckets that contain things, but pointers: variables point to objects.

Let's say we have a variable x which points to a list of 3 numbers:

>>> x = [1, 2, 3]

If we assign y to x, this does something kind of interesting:

>>> y = x
>>> x == y
True

The variable x is equal to the variable y at this point, x and y also have the same id, meaning they both point to the same memory location.

>>> id(x)
140043174674888
>>> id(y)
140043174674888

This means they both point to the same object. So if we mutate the object x points to (by appending to that list) x will now have 4 in it but so will y!

>>> x.append(4)
>>> x
[1, 2, 3, 4]
>>> y
[1, 2, 3, 4]

The reason this happens is all about the line that we wrote above:

>>> y = x

Assignment statements never copy anything in Python. Assignments take a variable name and point them to an object.

When I say variables are pointers, I mean they're not buckets that contain things.

When you do an assignment, you're pointing the variable name on the left-hand side of the equals sign (y in this case) to whatever object is referenced on the right-hand side of the equals sign (the list that x already happens to point to in this case).

So variables in Python are pointers, not buckets that contain things.

October 20, 2020 03:00 PM UTC


Real Python

Getting Started With MicroPython

Are you interested in the Internet of Things, home automation, and connected devices? Have you ever wondered what it would be like to build a blaster, a laser sword, or even your own robot? If so, then you’re in luck! MicroPython can help you do all of those things and more.

In this course, you’ll learn about:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

October 20, 2020 02:00 PM UTC


Stack Abuse

Python: Check Index of an Item in a List

Introduction

Lists are useful in different ways compared to other datatypes because of how versatile they are. In this article we'll take a look at one of the most common operations with lists - finding the index of an element.

We will take a look at different scenarios of finding an element, i.e. finding the first, last, and all occurrences of an element. As well as what happens when the element we're looking for doesn't exist.

Using the index() Function

All of the operations we mentioned in the last paragraph can be done with the built-in index() function. The syntax for this function is index(element[, start[, end]]).

The element parameter naturally represents the element we're looking for. The start and end parameters are optional and represent the range of indices in which we look for the element.

The default value for start is 0 (searching from the beginning), and the default value for end is the number of elements in the list (searching to the end of the list).

The function returns the first position of the element in the list that it could find, regardless of how many equal elements there are after the first occurrence.

Finding the First Occurrence of an Element

Using the index() function without setting any values for start and end will give us the first occurrence of the element we're looking for:

my_list = ['a', 'b', 'c', 'd', 'e', '1', '2', '3', 'b']

first_occurrence = my_list.index('b')
print("First occurrence of 'b' in the list: ", first_occurrence)

Which would give us the expected output:

First occurrence of 'b' in the list: 1

Finding All Occurrences of an Element

To find all occurrences of an element, we can use the optional parameter start so that we search in only certain segments of the list.

For example, let's say that we would the first occurrence of an element at index 3. In order to find the next one, we would need to continue our search for the first appearance of that element after index 3. We would repeat this process, changing where our search begins, as long as we found new occurrences of the element:

my_list = ['b', 'a', 2, 'n', False, 'a', 'n', 'a']

all_occurrences = []
last_found_index = -1
element_found = True

while element_found:
    try:
        last_found_index = my_list.index('a', last_found_index + 1)
        all_occurrences.append(last_found_index)
    except ValueError:
        element_found = False
    
if len(all_occurrences) == 0:
    print("The element wasn't found in the list")
else:
    print("The element was found at: " + str(all_occurrences))

Running this code would give us:

The element was found at: [1, 5, 7]

We had to use a try block here, since the index() function throws an error when it can't find the specified element in the given range. This might be unusual to developers who are more used to other languages since functions like these usually return -1/null when the element can't be found.

However, in Python we have to be careful and use a try block when we use this function.

Another, more neat way of doing this same thing would be by using list comprehension and ignoring the index() function altogether:

my_list = ['b', 'a', 2, 'n', False, 'a', 'n', 'a']

all_occurrences = [index for index, element in enumerate(my_list) if element == 'a']

print("The element was found at: " + str(all_occurrences))

Which would give us the same output as before. This approach has the added benefit of not using the try block.

Finding the Last Occurrence of an Element

If you need to find the last occurrence of an element in the list, there are two approaches you can use with the index() function:

Regarding the first approach, if we knew the first occurrence of the element in a reversed list, we could find the position of the last occurrence in the original one. More specifically, we can do this by subtracting reversed_list_index - 1 from the length of the original list:

my_list = ['b', 'a', 2, 'n', False, 'a', 'n', 'a']

reversed_list_index = my_list[::-1].index('n')
# or alteratively:
# reversed_list_index2 = list(reversed(my_list)).index('n')

original_list_index = len(my_list) - 1 - reversed_list_index

print(original_list_index)

Which would give us the desired output:

6

As for the second approach, we could tweak the code we used to find all occurrences and only keep track of the last occurrence we found:

my_list = ['b', 'a', 2, 'n', False, 'a', 'n', 'a']

last_occurrence = -1
element_found = True

while element_found:
    try:
        last_occurrence = my_list.index('n', last_occurrence + 1)
    except ValueError:
        element_found = False
    
if last_occurrence == -1:
    print("The element wasn't found in the list")
else:
    print("The last occurrence of the element is at: ", last_occurrence)

Which would give us the same output:

6

Conclusion

We have taken a look at some of the most common uses for the index() function, and how to avoid it in some cases.

Keep the potentially unusual behavior of the index() function in mind, where it throws an error instead of returning -1/None when an element isn't found in the list.

October 20, 2020 12:25 PM UTC


Evennia

On using Markdown with Sphinx - onward to Evennia 0.9.5

Last post I wrote about the upcoming v1.0 of Evennia, the Python MU* creation engine. We are not getting to that 1.0 version quite yet though: The next release will be 0.9.5, hopefully out relatively soon (TM).

Evennia 0.9.5 is, as you may guess, an intermediary release. Apart from the 1.0 roadmap just not being done yet, there is one other big reason for this - we are introducing documentation versioning and for that a proper release is needed as a base to start from. Version 0.9.5 contains everything already in master branch, so if you have kept up-to-date you won't notice too much difference. Here are some highlights compared to version 0.9:

Many contributors helped out along the way. See the changelog where contributors of the bigger new features are listed.

The path to a new documentation

For many years we've used the Github wiki as our documentation hub. It has served us well. But as mentioned in my previous post, it has its drawbacks, in particular when it comes to handling documentation for multiple Evennia versions in parallel.

After considering a bunch of options, I eventually went with sphinx, because it has such a good autodoc functionality (parsing of the source-code docstrings). This is despite our wiki docs are all in markdown and I dislike restructured text quite a bit. Our code also uses friendly and in-code-readable Google-style docstrings instead of Sphinx' hideous and unreadable format. 

Luckily there are extensions for Sphinx to handle this: 

What could go wrong? Well, it's been quite a ride.

Getting Markdown into reST

Linking to things in recommonmark turned out to be very flaky. I ended up forking and merging a bunch of PRs from the project but that was not enough: Clearly this thing was not built to convert 200 pages of technical markdown from a github wiki.

My custom fork of recommonmark had to be tweaked a bit for my needs, such as not having to specify the .md file ending in every link and make sure the url-resolver worked as I expected. There were a bunch of other things but I will probably not merge this back, the changes are pretty Evennia-specific.

Even so, many of my wiki links just wouldn't work. This is not necessarily recommonmark's fault, but how sphinx works by grouping things into toctrees, something that the Evennia wiki doesn't have. 

Also, the recommonmark way to make a toctree in Markdown is to make a list of links - you can't have any descriptive text, making the listing quite useless (apparently people only want bland lists of link-names?). After trying to figure out a way to make this work I eventually capitulated - I make pretty lists in Markdown while using a "hidden" toctree to inform sphinx how the pages are related.

Getting the wiki into the new doc site

This required more custom code. I wrote a custom importer that reads the wiki and cleans/reformats it in places where recommonmark just dies on them. I also made a preprocessor that not only finds orphan pages but also builds a toctree and remaps all links in all documents to their actual location on compilation. The remapper makes it a lot easier to move things around. The drawback is that every page needs to be uniquely named. Since this was already the case in the wiki, this was a good tradeoff. So with a lot of custom code the wiki eventually could port automatically.   

The thing is, that even with all this processing, recommonmark doesn't support stuff like Markdown tables, so you still have to fall back to reST notation for those. And Napoleon, while doing a good job of parsing google docstrings, do not expect Markdown. So the end result is mostly markdown but we still have to fall back to reST for some things. It's probably as far as we get.

Deploying the docs 

Figuring out how to build and deploy these components together was the next challenge. Sphinx' default Makefile was quite anemic and I also wanted something that regular contributors could use to test their documentation contributions easily. I ended up having to expand the Makefile quite a lot while also adding separate deploy scripts and interfaces to github actions (which we recently started using too).

Finally, the versioning. The sphinx-multiversion plugin works by extracting the branches you choose from git and running the sphinx compiler in each branch.  The plugin initially had a bug with how our docs are located (not at the root of the package) but after I reported it, it was quickly fixed. The result is a static document site where you can select between the available versions in the sidebar.

I've not gotten down to trying to make LaTeX/PDF generation work yet. I'm dreading it quite a bit... 

Where we are

The github wiki is now closed for external contributions. The v0.9.5 of the new documentation will pretty much be an import of the last state of the wiki with some minor cleanup (such as tables). While we'll fix outright errors in it, I don't plan to do many fixes of purely visual glitches from the conversion - the old wiki is still there should that be a problem.

The main refactoring and cleanup of the documentation to fit its new home will instead happen in v1.0. While the rough structure of this is already in place, it's very much a work in progress at this point.

Conclusions

Evennia 0.9.5 has a lot of features, but the biggest things are 'meta' changes in the project itself. After it is out, it's onward towards 1.0 again!


October 20, 2020 12:21 AM UTC

October 19, 2020


Podcast.__init__

The Journey To Replace Python's Parser And What It Means For The Future - Episode 285

The release of Python 3.9 introduced a new parser that paves the way for brand new features. Every programming language has its own specific syntax for representing the logic that you are trying to express. The way that the rules of the language are defined and validated is with a grammar definition, which in turn is processed by a parser. The parser that the Python language has relied on for the past 25 years has begun to show its age through mounting technical debt and a lack of flexibility in defining new syntax. In this episode Pablo Galindo and Lysandros Nikolaou explain how, together with Python's creator Guido van Rossum, they replaced the original parser implementation with one that is more flexible and maintainable, why now was the time to make the change, and how it will influence the future evolution of the language.

Summary

The release of Python 3.9 introduced a new parser that paves the way for brand new features. Every programming language has its own specific syntax for representing the logic that you are trying to express. The way that the rules of the language are defined and validated is with a grammar definition, which in turn is processed by a parser. The parser that the Python language has relied on for the past 25 years has begun to show its age through mounting technical debt and a lack of flexibility in defining new syntax. In this episode Pablo Galindo and Lysandros Nikolaou explain how, together with Python’s creator Guido van Rossum, they replaced the original parser implementation with one that is more flexible and maintainable, why now was the time to make the change, and how it will influence the future evolution of the language.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to pythonpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
  • Your host as usual is Tobias Macey and today I’m interviewing Pablo Galindo and Lysandros Nikolaou about their work on replacing the parser in CPython and what that means for the language

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by discussing the role of the parser in the lifecycle of a Python program?
  • What were the limitations of the previous parser, and how did that contribute to complexity and technical debt in the CPython runtime?
  • What are the options for styles of parsers, and what are the benefits of using a PEG style grammar?
  • How does the new parser impact the approachability of the CPython code for new contributors?
  • What was the process for reimplementing the parser and guarding against regressions in the syntax?
  • As developers switch to the 3.9 release, what potential edge cases/bugs might they see from introducing the new parser?
  • What new syntax options does this parser provide for the Python language?
    • Are there any specific features that are planned for implementation in the 3.10 release that are enabled by the new parser grammar?
  • As the language evolves due to new capabilities offered by the updated parser, how will that impact other implementations such as PyPy?
  • What were the most interesting, unexpected, or challenging aspects of this project?
  • What other aspects of the CPython code do you think should be reconsidered or reimplemented in light of the changes in computing and the usage of the language?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

October 19, 2020 11:20 PM UTC


RoseHosting Blog

How to Install pip on Ubuntu 20.04

In this article, we will talk about pip, how to install it as well as how to use it on ...

Read moreHow to Install pip on Ubuntu 20.04

The post How to Install pip on Ubuntu 20.04 appeared first on RoseHosting.

October 19, 2020 05:45 PM UTC


Codementor

13 Reasons Why It’s High Time to Start Learning to Program

Software development is something that is gaining popularity at lightning speed with the development of technology. The demand for regular developers is high compared to most other mainstream professions. But, what are the other reasons for learning to code?

October 19, 2020 04:50 PM UTC


Anarcat

SSH 2FA with Google Authenticator and Yubikey

About a lifetime ago (5 years), I wrote a tutorial on how to configure my Yubikey for OpenPGP signing, SSH authentication and SSH 2FA. In there, I used the libpam-oath PAM plugin for authentication, but it turns out that had too many problems: users couldn't edit their own 2FA tokens and I had to patch it to avoid forcing 2FA on all users. The latter was merged in the Debian package, but never upstream, and the former was never fixed at all. So I started looking at alternatives and found the Google Authenticator libpam plugin. A priori, it's designed to work with phones and the Google Authenticator app, but there's no reason why it shouldn't work with hardware tokens like the Yubikey. Both use the standard HOTP protocol so it should "just work".

After some fiddling, it turns out I was right and you can authenticate with a Yubikey over SSH. Here's that procedure so you don't have to second-guess it yourself.

Installation

On Debian, the PAM module is shipped in the google-authenticator source package:

apt install libpam-google-authenticator

Then you need to add the module in your PAM stack somewhere. Since I only use it for SSH, I added this line on top of /etc/pam.d/sshd:

auth required pam_google_authenticator.so nullok

I also used no_increment_hotp debug while debugging to avoid having to renew the token all the time and have more information about failures in the logs.

Then reload ssh (not sure that's actually necessary):

service ssh reload

Creating or replacing tokens

To create a new key, run this command on the server:

google-authenticator -c

This will prompt you for a bunch of questions. To get them all right, I prefer to just call the right ones on the commandline directly:

google-authenticator --counter-based --qr-mode=NONE --rate-limit=1 --rate-time=30 --emergency-codes=1 --window-size=3

Those are actually the defaults, if my memory serves me right, except for the --qr-mode and --emergency-codes (which can't be disabled so I only print one). I disable the QR code display because I won't be using the codes on my phone, but you would obviously keep it if you want to use the app.

Converting to a Yubikey-compatible secret

Unfortunately, the encoding (base32) produced by the google-authenticator command is not compatible with the token expected by the ykpersonalize command used to configure the Yubikey (base16 AKA "hexadecimal", with a fixed 20 bytes length). So you need a way to convert between the two. I wrote a program called oath-convert which basically does this:

read base32
add padding
convert to hex
print

Or, in Python:

def convert_b32_b16(data_b32):
    remainder = len(data_b32) % 8
    if remainder > 0:
        # XXX: assume 6 chars are missing, the actual padding may vary:
        # https://tools.ietf.org/html/rfc3548#section-5
        data_b32 += "======"
    data_b16 = base64.b32decode(data_b32)
    if len(data_b16) < 20:
        # pad to 20 bytes
        data_b16 += b"\x00" * (20 - len(data_b16))
    return binascii.hexlify(data_b16).decode("ascii")

Note that the code assumes a certain token length and will not work correctly for other sizes. To use the program, simply call it with:

head -1 .google_authenticator | oath-convert

Then you paste the output in the prompt:

$ ykpersonalize -1 -o oath-hotp -o append-cr -a
Firmware version 3.4.3 Touch level 1541 Program sequence 2
 HMAC key, 20 bytes (40 characters hex) : [SECRET GOES HERE]

Configuration data to be written to key configuration 1:

fixed: m:
uid: n/a
key: h:[SECRET REDACTED]
acc_code: h:000000000000
OATH IMF: h:0
ticket_flags: APPEND_CR|OATH_HOTP
config_flags: 
extended_flags: 

Commit? (y/n) [n]: y

Note that you must NOT pass the -o oath-hotp8 parameter to the ykpersonalize commandline, which we used to do in the Yubikey howto. That is because Google Authenticator tokens are shorter: it's less secure, but it's an acceptable tradeoff considering the plugin is actually maintained. There's actually a feature request to support 8-digit codes so that limitation might eventually be fixed as well.

Thanks to the Google Authenticator people and Yubikey people for their support in establishing this procedure.

October 19, 2020 03:08 PM UTC


Real Python

Python Booleans: Optimize Your Code With Truth Values

The Python Boolean type is one of Python’s built-in data types. It’s used to represent the truth value of an expression. For example, the expression 1 <= 2 is True, while the expression 0 == 1 is False. Understanding how Python Boolean values behave is important to programming well in Python.

In this tutorial, you’ll learn how to:

  • Manipulate Boolean values with Boolean operators
  • Convert Booleans to other types
  • Convert other types to Python Booleans
  • Use Python Booleans to write efficient and readable Python code

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

The Python Boolean Type#

The Python Boolean type has only two possible values:

  1. True
  2. False

No other value will have bool as its type. You can check the type of True and False with the built-in type():

>>>
>>> type(False)
<class 'bool'>
>>> type(True)
<class 'bool'>

The type() of both False and True is bool.

The type bool is built in, meaning it’s always available in Python and doesn’t need to be imported. However, the name itself isn’t a keyword in the language. While the following is considered bad style, it’s possible to assign to the name bool:

>>>
>>> bool
<class 'bool'>
>>> bool = "this is not a type"
>>> bool
'this is not a type'

Although technically possible, to avoid confusion it’s highly recommended that you don’t assign a different value to bool.

Python Booleans as Keywords#

Built-in names aren’t keywords. As far as the Python language is concerned, they’re regular variables. If you assign to them, then you’ll override the built-in value.

In contrast, the names True and False are not built-ins. They’re keywords. Unlike many other Python keywords, True and False are Python expressions. Since they’re expressions, they can be used wherever other expressions, like 1 + 1, can be used.

It’s possible to assign a Boolean value to variables, but it’s not possible to assign a value to True:

>>>
>>> a_true_alias = True
>>> a_true_alias
True
>>> True = 5
  File "<stdin>", line 1
SyntaxError: cannot assign to True

Because True is a keyword, you can’t assign a value to it. The same rule applies to False:

>>>
>>> False = 5
  File "<stdin>", line 1
SyntaxError: cannot assign to False

You can’t assign to False because it’s a keyword in Python. In this way, True and False behave like other numeric constants. For example, you can pass 1.5 to functions or assign it to variables. However, it’s impossible to assign a value to 1.5. The statement 1.5 = 5 is not valid Python. Both 1.5 = 5 and False = 5 are invalid Python code and will raise a SyntaxError when parsed.

Python Booleans as Numbers#

Booleans are considered a numeric type in Python. This means they’re numbers for all intents and purposes. In other words, you can apply arithmetic operations to Booleans, and you can also compare them to numbers:

>>>
>>> True == 1
True
>>> False == 0
True
>>> True + (False / True)
1.0

There aren’t many uses for the numerical nature of Boolean values, but there’s one technique you may find helpful. Because True is equal to 1 and False is equal to 0, adding Booleans together is a quick way to count the number of True values. This can come in handy when you need to count the number of items that satisfy a condition.

Read the full article at https://realpython.com/python-boolean/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

October 19, 2020 02:00 PM UTC


Stack Abuse

Change Font Size in Matplotlib

Introduction

Matplotlib is one of the most widely used data visualization libraries in Python. Much of Matplotlib's popularity comes from its customization options - you can tweak just about any element from its hierarchy of objects.

In this tutorial, we'll take a look at how to change the font size in Matplotlib.

Change Font Size in Matplotlib

There are a few ways you can go about changing the size of fonts in Matplotlib. You can set the fontsize argument, change how Matplotlib treats fonts in general, or even changing the figure size.

Let's first create a simple plot that we'll want to change the size of fonts on:

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(0, 10, 0.1)
y = np.sin(x)
z = np.cos(x)

ax.plot(y, color='blue', label='Sine wave')
ax.plot(z, color='black', label='Cosine wave')
fig.suptitle('Sine and cosine waves')
plt.xlabel('Time')
plt.ylabel('Intensity')
leg = ax.legend()

plt.show()

matplotlib plot

Change Font Size using fontsize

Let's try out the simplest option. Every function that deals with text, such as suptitle(), xlabel() and all other textual functions accept an argument - fontsize.

Let's revisit the code from before and specify a fontsize for these elements:

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(0, 10, 0.1)
y = np.sin(x)
z = np.cos(x)

ax.plot(y, color='blue', label='Sine wave')
ax.plot(z, color='black', label='Cosine wave')
fig.suptitle('Sine and cosine waves', fontsize=20)
plt.xlabel('Time', fontsize=16)
plt.ylabel('Intensity', fontsize=16)
leg = ax.legend()

plt.show()

Here, we've set the fontsize for the suptitle as well as the labels for time and intensity. Running this code yields:

matplotlib fontsize argument

We can also change the size of the font in the legend by adding the prop argument and setting the font size there:

leg = ax.legend(prop={"size":16})

This will change the font size, which in this case also moves the legend to the bottom left so it doesn't overlap with the elements on the top right:

matplotlib change legend font size

However, while we can set each font size like this, if we have many textual elements, and just want a uniform, general size - this approach is repetitive.

In such cases, we can turn to setting the font size globally.

Change Font Size Globally

There are two ways we can set the font size globally. We'll want to set the font_size parameter to a new size. We can get to this parameter via rcParams['font.size'].

One way is to modify them directly:

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(0, 10, 0.1)
y = np.sin(x)
z = np.cos(x)

plt.rcParams['font.size'] = '16'

ax.plot(y, color='blue', label='Sine wave')
ax.plot(z, color='black', label='Cosine wave')
plt.xlabel('Time')
plt.ylabel('Intensity')
fig.suptitle('Sine and cosine waves')
leg = ax.legend()

plt.show()

You have to set these before the plot() function call since if you try to apply them afterwards, no change will be made. This approach will change everything that's specified as a font by the font kwargs object.

However, when we run this code, it's obvious that the x and y ticks, nor the x and y labels didn't change in size:

matplotlib change font size rc params

Depending on the Matplotlib version you're running, you won't be able to change these with rc parameters. You'd use axes.labelsize and xtick.labelsize/ytick.labelsize for them respectively.

If setting these doesn't change the size of labels, you can use the set() function passing in a fontsize or use the set_fontsize() function:

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(0, 10, 0.1)
y = np.sin(x)
z = np.cos(x)

# Set general font size
plt.rcParams['font.size'] = '16'

# Set tick font size
for label in (ax.get_xticklabels() + ax.get_yticklabels()):
	label.set_fontsize(16)
	
ax.plot(y, color='blue', label='Sine wave')
ax.plot(z, color='black', label='Cosine wave')
plt.xlabel('Time', fontsize=16)
plt.ylabel('Intensity', fontsize=16)

fig.suptitle('Sine and cosine waves')
leg = ax.legend()

plt.show()

This results in:

matplotlib change font size xtick and label

Conclusion

In this tutorial, we've gone over several ways to change the size of fonts in Matplotlib.

If you're interested in Data Visualization and don't know where to start, make sure to check out our book on Data Visualization in Python.

Data Visualization in Python, a book for beginner to intermediate Python developers, will guide you through simple data manipulation with Pandas, cover core plotting libraries like Matplotlib and Seaborn, and show you how to take advantage of declarative and experimental libraries like Altair.

Data Visualization in Python

Understand your data better with visualizations! With over 275+ pages, you'll learn the ins and outs of visualizing data in Python with popular libraries like Matplotlib, Seaborn, Bokeh, and more.

October 19, 2020 01:19 PM UTC


Doug Hellmann

sphinxcontrib-spelling 7.0.0

sphinxcontrib-spelling is a spelling checker for Sphinx-based documentation. It uses PyEnchant to produce a report showing misspelled words. What’s new in 7.0.0? Handle ValueError raised by importlib.util.find_spec (contributions by Rust Saiargaliev) Remove obsolete comment and guard in setup() (contributions by Jon Dufresne) Remove unnecessary UnicodeEncodeError (due to Python 3) (contributions by Jon Dufresne) Use Python …

October 19, 2020 01:00 PM UTC


Stack Abuse

Python: Slice Notation on List

Introduction

The term slicing in programming usually refers to obtaining a substring, sub-tuple, or sublist from a string, tuple, or list respectively.

Python offers an array of straightforward ways to slice not only these three but any iterable. An iterable is, as the name suggests, any object that can be iterated over.

In this article, we'll go over everything you need to know about Slicing Lists in Python.

Slicing a List in Python

There are a couple of ways to slice a list, most common of which is by using the : operator with the following syntax:

a_list[start:end]
a_list[start:end:step]

The start parameter represents the starting index, end is the ending index, and step is the number of items that are "stepped" over.

If step isn't explicitly given, the default value is 1. Note that the item with the index start will be included in the resulting sublist, but the item with the index end won't be. The first element of a list has the index of 0.

Example without the step parameter:

# A list of strings:
a_list = ['May', 'the', 'Force', 'be', 'with', 'you.']
sublist = a_list[1:3]
print(sublist)

This should print:

['the', 'Force']

To skip every other word, set step to 2:

a_list = ['The', 'Force', 'will', 'be', 'with', 'you.', 'Always.']
sublist = a_list[1:8:2]
print(sublist)

Output:

['Force', 'be', 'you.']

If step isn't listed, the sublist will start from the beginning. Likewise, if end isn't listed, the sublist will end at the ending of the original list:

a_list = ['Do.', 'Or', 'do', 'not.', 'There', 'is', 'no', 'try.']
sublist = a_list[:4]
print(sublist)
sublist = a_list[4:]
print(sublist)

That snippet of code prints out:

['Do.', 'Or', 'do', 'not.']
['There', 'is', 'no', 'try.']

Finding the Head and Tail of List with Slice Notation

The slice notation can be used with negative indexing as well. Negative indexing works the same way as regular indexing, except for the fact that it starts indexing from the last element which has the index -1.

This can be used to obtain the head and tail of a list of a given length. The head of a list is a sublist that contains the first n elements of a list, and the tail is a sublist that contains the last n elements.

Let's go ahead and separate a tail and head of a list:

# The length of the tail
n = 2
a_list = ['Never', 'tell', 'me', 'the', 'odds!']

# Head of the list:
sublist = a_list[:n]
print(sublist)

# Tail of the list:
sublist = a_list[-n:]
print(sublist)

This outputs:

['Never', 'tell']
['the', 'odds!']

Using Slice Notation to Reverse a List

Even the step parameter can be negative. If we set it to a negative value, the resulting list will be reversed, with the step value. Instead of stepping forward, we're stepping backwards, from the end of the list to the start and including these elements:

a_list = ['Power!', 'Unlimited', 'power!']
sublist = a_list[::-1]
print(sublist)

This results in:

['power!', 'Unlimited', 'Power!']

Replacing Elements of a Sublist with Slice Notation

The slice notation can be used to asign new values to elements of a certain sublist. For example, let try to replace the tail and the head of a list:

a_list = ['I', 'am', 'no', 'Jedi.']
print(a_list)
# Replacing the head of a list
a_list[:1] = ['You', 'are']
print(a_list)
# Replacing the tail of a list
a_list[-1:] = ['Sith']
print(a_list)

The expected output is:

['I', 'am', 'no', 'Jedi.']
['You', 'are', 'no', 'Jedi.']
['You', 'are', 'no', 'Sith']

Replacing Every n-th Element of a List with Slice Notation

An easy way to replace every n-th element of a list is to set the step parameter to n in the slicing notation:

 a_list = ['I’m', 'just', 'a', 'simple', 'man', 'trying', 'to', 'make', 'my', 'way', 'in', 'the', 'universe.']
    
print(a_list)

# Replacing every other word starting with the word with the index 1
a_list[1::2] = ['only', 'common', 'attempting','do', 'best','the']
print(a_list)

This results in:

['I’m', 'just', 'a', 'simple', 'man', 'trying', 'to', 'make', 'my', 'way', 'in', 'the', 'universe.']
['just', 'simple', 'trying', 'make', 'way', 'the']
['I’m', 'only', 'a', 'common', 'man', 'attempting', 'to', 'do', 'my', 'best', 'in', 'the', 'universe.']

Conclusion

Slicing any sequence in Python is easy, simple, and intuitive. Negative indexing offers an easy way to acquire the first or last few elements of a sequence, or reverse its order.

In this article, we've covered how to apply the Slice Notation on Lists in Python.

October 19, 2020 12:30 PM UTC


Chris Moffitt

Reading Poorly Structured Excel Files with Pandas

Introduction

With pandas it is easy to read Excel files and convert the data into a DataFrame. Unfortunately Excel files in the real world are often poorly constructed. In those cases where the data is scattered across the worksheet, you may need to customize the way you read the data. This article will discuss how to use pandas and openpyxl to read these types of Excel files and cleanly convert the data to a DataFrame suitable for further analysis.

The Problem

The pandas read_excel function does an excellent job of reading Excel worksheets. However, in cases where the data is not a continuous table starting at cell A1, the results may not be what you expect.

If you try to read in this sample spreadsheet using read_excel(src_file) :

Excel

You will get something that looks like this:

Excel

These results include a lot of Unnamed columns, header labels within a row as well as several extra columns we don’t need.

Pandas Solutions

The simplest solution for this data set is to use the header and usecols arguments to read_excel() . The usecols parameter, in particular, can be very useful for controlling the columns you would like to include.

If you would like to follow along with these examples, the file is on github.

Here is one alternative approach to read only the data we need.

import pandas as pd
from pathlib import Path
src_file = Path.cwd() /  'shipping_tables.xlsx'

df = pd.read_excel(src_file, header=1, usecols='B:F')

The resulting DataFrame only contains the data we need:

Clean DataFrame

The logic is relatively straightforward. usecols can accept Excel ranges such as B:F and read in only those columns. The header parameter expects a single integer that defines the header column. This value is 0-indexed so we pass in 1 even though this is row 2 in Excel.

In some instance, we may want to define the columns as a list of numbers. In this example, we could define the list of integers:

df = pd.read_excel(src_file, header=1, usecols=[1,2,3,4,5])

This approach might be useful if you have some sort of numerical pattern you want to follow for a large data set (i.e. every 3rd column or only even numbered columns).

The pandas usecols can also take a list of column names. This code will create an equivalent DataFrame:

df = pd.read_excel(
    src_file,
    header=1,
    usecols=['item_type', 'order id', 'order date', 'state', 'priority'])

Using a list of named columns is going to be helpful if the column order changes but you know the names will not change.

Finally, usecols can take a callable function. Here’s a simple long-form example that excludes unnamed columns as well as the priority column.

# Define a more complex function:
def column_check(x):
    if 'unnamed' in x.lower():
        return False
    if 'priority' in x.lower():
        return False
    if 'order' in x.lower():
        return True
    return True

df = pd.read_excel(src_file, header=1, usecols=column_check)

The key concept to keep in mind is that the function will parse each column by name and must return a True or False for each column. Those columns that get evaluated to True will be included.

Another approach to using a callable is to include a lambda expression. Here is an example where we want to include only a defined list of columns. We normalize the names by converting them to lower case for comparison purposes.

cols_to_use = ['item_type', 'order id', 'order date', 'state', 'priority']
df = pd.read_excel(src_file,
                   header=1,
                   usecols=lambda x: x.lower() in cols_to_use)

Callable functions give us a lot of flexibility for dealing with the real world messiness of Excel files.

Ranges and Tables

In some cases, the data could be even more obfuscated in Excel. In this example, we have a table called ship_cost that we want to read. If you must work with a file like this, it might be challenging to read in with the pandas options we have discussed so far.

Excel table

In this case, we can use openpyxl directly to parse the file and convert the data into a pandas DataFrame. The fact that the data is in an Excel table can make this process a little easier.

Here’s how to use openpyxl (once it is installed) to read the Excel file:

from openpyxl import load_workbook
import pandas as pd
from pathlib import Path
src_file = src_file = Path.cwd() / 'shipping_tables.xlsx'

wb = load_workbook(filename = src_file)

This loads the whole workbook. If we want to see all the sheets:

wb.sheetnames
['sales', 'shipping_rates']

To access the specific sheet:

sheet = wb['shipping_rates']

To see a list of all the named tables:

sheet.tables.keys()
dict_keys(['ship_cost'])

This key corresponds to the name we assigned in Excel to the table. Now we access the table to get the equivalent Excel range:

lookup_table = sheet.tables['ship_cost']
lookup_table.ref
'C8:E16'

This worked. We now know the range of data we want to load. The final step is to convert that range to a pandas DataFrame. Here is a short code snippet to loop through each row and convert to a DataFrame:

# Access the data in the table range
data = sheet[lookup_table.ref]
rows_list = []

# Loop through each row and get the values in the cells
for row in data:
    # Get a list of all columns in each row
    cols = []
    for col in row:
        cols.append(col.value)
    rows_list.append(cols)

# Create a pandas dataframe from the rows_list.
# The first row is the column names
df = pd.DataFrame(data=rows_list[1:], index=None, columns=rows_list[0])

Here is the resulting DataFrame:

Excel shipping table

Now we have the clean table and can use for further calculations.

Summary

In an ideal world, the data we use would be in a simple consistent format. See this paper for a nice discussion of what good spreadsheet practices look like.

In the examples in this article, you could easily delete rows and columns to make this more well-formatted. However, there are times where this is not feasible or advisable. The good news is that pandas and openpyxl give us all the tools we need to read Excel data - no matter how crazy the spreadsheet gets.

October 19, 2020 12:25 PM UTC


Kushal Das

Update hell due to not updating for a long time

SecureDrop right now runs on Ubuntu Xenial. We are working on moving to Ubuntu Focal. Here is the EPIC on the issue tracker.

While I was creating the Docker development environment on Focal, I noticed our tests were failing with the following message:

Traceback (most recent call last):                                                                                            
  File "/opt/venvs/securedrop-app-code/bin/pytest", line 5, in <module>              
    from pytest import console_main
  File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/pytest/__init__.py", line 5, in <module>
    from _pytest.assertion import register_assert_rewrite
  File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/_pytest/assertion/__init__.py", line 8, in <module>
    from _pytest.assertion import rewrite
  File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/_pytest/assertion/rewrite.py", line 31, in <module>
    from _pytest.assertion import util
  File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/_pytest/assertion/util.py", line 14, in <module>
    import _pytest._code
  File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/_pytest/_code/__init__.py", line 2, in <module>
    from .code import Code
  File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/_pytest/_code/code.py", line 29, in <module>
    import pluggy
  File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/pluggy/__init__.py", line 16, in <module>
    from .manager import PluginManager, PluginValidationError
  File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/pluggy/manager.py", line 6, in <module>
    import importlib_metadata
  File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/importlib_metadata/__init__.py", line 471, in <module>
    __version__ = version(__name__)
  File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/importlib_metadata/__init__.py", line 438, in version
    return distribution(package).version
  File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/importlib_metadata/__init__.py", line 411, in distribution
    return Distribution.from_name(package)
  File "/opt/venvs/securedrop-app-code/lib/python3.8/site-packages/importlib_metadata/__init__.py", line 179, in from_name
    dists = resolver(name)
  File "<frozen importlib._bootstrap_external>", line 1382, in find_distributions
  File "/usr/lib/python3.8/importlib/metadata.py", line 466, in find_distributions
    found = cls._search_paths(context.name, context.path)
AttributeError: 'str' object has no attribute 'name'
make: *** [Makefile:238: test-focal] Error 1


Found out that pluggy dependency is too old. We update all application dependencies whenever there is a security update, but that is not the case with the development or testing requirements. These requirements only get installed on the developers' systems or in the CI. Then I figured that we are using a version of pytest 3 years old. That is why the code refuses to run on Python3.8 on Focal.

The update hell

Now, to update pluggy, I also had to update pytest and pytest-xdist, and that initial issue solved. But, this broke testinfra. Which we use in various molecule scenarios, say to test a staging or production server configurations or to test the Debian package builds. As I updated testinfra, molecule also required an update, which broke due to the old version of molecule in our pinned dependency. Now, to update I had to update molecule.yml and create.yml file for the different scenarios and get molecule-vagrant 0.3. Now, after I can run the molecule scenarios, I noticed that our old way of injecting variables to the pytest namespace via pytest_namespace function does not work. That function was dropped in between. So, had to fix that as the next step. This whole work is going on a draft PR, and meanwhile, some new changes merged with a new scenario. This means I will be spending more time to rebase properly without breaking these scenarios. The time takes to test each one of them, which frustrates me while fixing them one by one.

Lesson learned for me

We should look into all of our dependencies regularly and keep them updated. Otherwise, if we get into a similar situation again, someone else has to cry in a similar fashion :) Also, this is difficult to keep doing in a small team.

October 19, 2020 06:29 AM UTC


Mike Driscoll

PyDev of the Week: Sunita Dwivedi

This week we welcome Sunita Dwivedi as our PyDev of the Week! Sunita works for the DISH Network. She is active with PyDEN, the Denver, CO Python users group as well as PyColorado.

Let’s take some time to learn more about Sunita!

Can you tell us a little about yourself (hobbies, education, etc):

I live by the phrase “A life not tried enough is not lived enough”. I don’t know who said it, may be I dreamt it. Just Kidding.

I love working in IT, Rock climbing is my favorite hobby and before COVID-19 I would host regular dinner parties and cook Indian food. I an active member in the tech community and Dev manager at Dish Networks

Why did you start using Python?

My interest in data analytics and data science lead me to Python. Being a high level language it was easy to learn python. Python requires proper indentation as part of the syntax — if you don’t use indentation correctly, your program won’t work. This makes it readable from the get go

Also Python has a large standard library plus thousands of open-source 3rd party libraries, which meant that I could develop code more with less effort, since many of the tools they needed, are ready to be plugged in and used.

What other programming languages do you know and which is your favorite?

I know c, c++, java and scala.

It is such a hard choice to pick a language. I feel each of the languages has its special powers.

Python is for the ease of use.

Scala is my favorite for compute heavy problems.

C is the first language I learned, it’s like first love, I am biased towards it.

Java is just is an all-rounder. You level of control one gets when coding is phenomenal

What projects are you working on now?

The current project I am working on is to consolidate customer data for my organization in one place. Create a customer master hub that provides a holistic view of customer from expense to orders to customer experience. I am using Springboot, aws and some python for this project

Which Python libraries are your favorite (core or 3rd party)?

Dask and pyspark libraries were fun to speed up data analytics processes.

Pandas is the most basic but such a great foundational library. It forced a lot of other languages to have a similar library.

How did you get into giving Python talks?

I went to lot of tech meetups when I was getting into python. Seeing so many people sharing knowledge, taking time from their schedule to help others out was inspirational. For all that I got to learn, made me realize the value of meetups. I also wanted to give back to this community that is so welcoming and helpful. So I decided to give Python and data science talks

Do you have any tips for people who would like to give technical talks?

You don’t have to be an expert in the language and be able to answer all questions. As long as you believe and know what you will share will benefit others, it is worth talking about. Also when prepping to give a talk, you learn more.

Is there anything else you’d like to say?

Whenever in doubt, reach out to someone who will support you as well as be a mirror of truth. We all have our moments when we need that little nudge and make sure you surround yourself with people who will do it for you

Thanks for doing the interview, Sunita!

The post PyDev of the Week: Sunita Dwivedi appeared first on The Mouse Vs. The Python.

October 19, 2020 05:05 AM UTC

October 18, 2020


Anarcat

CDPATH replacements

after reading this post I figured I might as well bite the bullet and improve on my CDPATH-related setup, especially because it does not work with Emacs. so i looked around for autojump-related alternatives that do.

What I use now

I currently have this in my .shenv (sourced by .bashrc):

export CDPATH=".:~:~/src:~/dist:~/wikis:~/go/src:~/src/tor"

This allows me to quickly jump into projects from my home dir, or the "source code" (~/src), "work" (src/tor), or wiki checkouts (~/wikis) directories. It works well from the shell, but unfortunately it's very static: if I want a new directory, I need to edit my config file, restart shells, etc. It also doesn't work from my text editor.

Shell jumpers

Those are commandline tools that can be used from a shell, generally with built-in shell integration so that a shell alias will find the right directory magically, usually by keeping track of the directories visited with cd.

Some of those may or may not have integration in Emacs.

autojump

fasd

z

fzf

Emacs plugins not integrated with the shell

Those projects can be used to track files inside a project or find files around directories, but do not offer the equivalent functionality in the shell.

projectile

elpy

bookmarks.el

recentf

references

https://www.emacswiki.org/emacs/LocateFilesAnywhere

October 18, 2020 09:30 PM UTC