Planet Python
Last update: March 18, 2015 04:47 AM
March 17, 2015
Wow, it’s already March 2015 and I haven’t written anything since then here.
On March 9, it was public holiday at Victoria. So it was a long weekend. I received invitation to join melb-django hack weekend. Since I didn’t have any schedule yet, I decided to join and get ready for my first time hack experience.
The hack weekend lasted for two days: on Sunday March 8 and on Monday March 9. Unfortunately I could only joined on Sunday. Actually it will be great if I could join the second day, because on the first day I spent most of my time for setting up the environment.
Since no one had an idea what django project/app to hack, Curtis decided to continue the previous hack project which was django-dequorum. Curtis explained what is django-dequorum and what features which is undone. django-dequorum is a simple forum django application. Users could post threads and other users could make comments. To make things easier what things need to be done, Curtis put them on the github issues. And we agreed that we will use git flow to hack this django-dequorum. Everyone will fork and make pull request to contribute to this project.
Suddenly, it was 12.30 So we decided to have lunch before we hack django-dequorum. We walked to Lenthil, a unique canteen at Abbotsford. They provide vegetarian menu. They didn’t charge on how much we ate. They put poster explained how they run this canteen and how much does it cost. We put the money inside a box.

After lunch, Nicole took the initiative to do the look and feel. Nicole drew the layout.

I didn’t join the layout discussion because I was still struggling with the environment setup. At first I setup my environment with python 2.7.x and later I found that they are using python3. So I re-set up my virtualenv.
And then I was confused. I didn’t want git to track changes on the django project directory but I wanted git to track changes on django-dequorum app. I put a symbolic link, still failed. I finally managed with folder symbolic link.
Other problem raised after we forked the github repo. How to keep our forked repo updated with funkybob repo? Since I had this problem before so I already knew the answer. I helped my friends on how to add another git remote repository.
I still didn’t make any commits to this open source project, but hopefully I could contribute in the future. It was a nice hack weekend with friendly new friends.
March 17, 2015 09:47 PM
We’re looking for proposals on every aspect of Python: programming from novice to advanced levels, applications and frameworks, or how you have been involved in introducing Python into your organization.
EuroPython is a community conference and we are eager to hear about your experience.
Please also forward this Call for Proposals to anyone that you feel may be interested.
Submissions will be open from Monday, March 16, until Tuesday, April 14.
Presenting at EuroPython
We will accept a broad range of presentations, from reports on academic and commercial projects to tutorials and case studies. As long as the presentation is interesting and potentially useful to the Python community, it will be considered for inclusion in the program.
Can you show something new and useful? Can you show the attendees how to: use a module? Explore a Python language feature? Package an application? If so, please consider submitting a talk.
First time speakers are especially welcome.
There are four different kinds of contributions that you can present at EuroPython:
Regular talk / 170 slots.
These are standard “talks with slides”, allocated in slots of
- 30 minutes (80 slots)
- 45 minutes (85 slots)
- 60 minutes (5 slots)
depending on your preference and scheduling constraints. A Q&A session is held at the end of the talk and included in the time slot.
-
Hands-on training / 20 slots.
These are advanced training sessions to dive into the subject with all details. These sessions are 2.5 - 3 hours long. The training attendees will be encouraged to bring a laptop. They should be prepared with less slides and more source code. Room capacity for the two trainings rooms is 70 and 180 seats.
-
Posters / 25 slots
Posters are a graphical way to describe a project or a technology, printed in large formats; posters are exhibited at the conference, can be read at any time by participants, and can be discussed face to face with their authors during the poster session.
-
Helpdesk / 5 slots
Helpdesks are a great way to share your experience on a technology, by offering to help people answering their questions and solving their practical problems. You can run a helpdesk by yourself or with colleagues and friends. People looking for help will sign up for a 30 minute slot, get there and talk to you. There is no specific preparation needed; you just need to be proficient in the technology you run the helpdesk for.
Discounts for speakers and trainers
Since EuroPython is a not-for-profit community conference, it is not possible to pay out rewards for talks or trainings. Speakers of regular talks will instead have a special 25% discount on the conference ticket, trainings get a 100% discount to compensate for the longer preparation time. Please note that we can not give discounts to submitters of posters or helpdesk proposals.
Topics and Goals
Suggested topics for EuroPython presentations include, but are not limited to:
-
Core Python
-
Alternative Python implementations: e.g. Jython, IronPython, PyPy, and Stackless
-
Python libraries and extensions
-
Python 2 to 3 migration
-
Databases
-
Documentation
-
GUI Programming
-
Game Programming
-
Network Programming
-
Open Source Python projects
-
Packaging Issues
-
Programming Tools
-
Project Best Practices
-
Embedding and Extending
-
Education, Science and Math
-
Web-based Systems
Presentation goals are usually some of the following:
-
Introduce the audience to a new topic
-
Introduce the audience to new developments on a well-known topic
-
Show the audience real-world usage scenarios for a specific topic (case study)
-
Dig into advanced and relatively-unknown details on a topic
-
Compare different solutions available on the market for a topic
Language for Talks & Trainings
Talks and training should, in general, be held in English.
However, since EuroPython is hosted in Bilbao and EuroPython has traditionally always been very open to the local Python communities, we are also accepting a number of talks and trainings in Spanish and Basque.
The talk submission form lets you choose the language you want to give the talk in.
If you speak Basque/Spanish and don’t feel comfortable speaking English, please submit the talk title and abstract directly in Spanish/Basque. If you are able to give the talk in multiple languages, please submit one proposals for the talk in each language, with title and description adjusted accordingly.
Inappropriate Language and Imagery
Please consider that EuroPython is a conference with an audience from a broad geographical area which spans countries and regions with vastly different cultures. What might be considered a “funny, inoffensive joke” in a region might be really offensive (if not even unlawful) in another. If you want to add humor, references and images to your talk, avoid any choice that might be offensive to a group which is different from yours, and pay attention to our EuroPython Code of Conduct.
Community Based Talk Voting
Attendees who have bought a ticket in time for the Talk Voting period gain the right to vote for talks submitted during the Call For Proposals.
The Program WG will also set aside a number of slots which they will then select based on other criteria to e.g. increase diversity or give a chance to less mainstream topics.
Release agreement for submissions
All submissions will be made public during the community talk voting, to allow all registrants to discuss the proposals. After finalizing the schedule, talks that are not accepted will be removed from the public website. Accepted submissions will stay online for the foreseeable future.
We also ask all speakers to:
-
accept the video recording of their presentation
-
upload their talk materials to the EuroPython website
-
accept the EuroPython Speaker Release Agreement which allows the EPS to make the talk recordings and uploaded materials available under a CC BY-NC-SA license
Talk slides will be made available on the EuroPython web site. Talk video recordings will be uploaded to the EuroPython YouTube channel and archived on archive.org.
For more privacy related information, please consult our privacy policy.
Contact
For further questions, feel free to contact our helpdesk@europython.eu
March 17, 2015 03:18 PM
How Things Will Proceed
Hi there! I've been busy since the last post. I've been thinking mainly about the following areas:
.jpg?kwid=1tfa-bq%2560kjuf-lqd) |
Source: Wikipedia article on Placebo drugs, license: public domain |
- What differentiates this blog for my readers?
- What is the best way, for me, of developing my knowledge and master of these techniques?
- What pathway is also going to work for people who are either reading casually, or interested in working through problems at a similar pace?
- Preparing examples and potential future blog posts...
I think I have zeroed into something that is workable. I believe in an integrative approach to learning -- namely that incorporating information from multiple disparate areas results in insights and information which aren't possible when considering only a niche viewpoint. At the same time, I also believe it's essentially impossible to effectively learn from a ground-up, broad-base theoretical presentation of concepts. The path to broad knowledge is to start somewhere accessible, and then fold in additional elements from alternative areas.
I will, therefore, start where I already am: applying machine learning for categorisation of images. At some point, other areas will be examined, such as language processing, game playing, search and prediction. However, for now, I'm going to "focus". That's in inverted commas (quotes) because it's still an incredibly broad area for study.
The starting point for most machine learning exercises is with the data. I'm going to explain the data sets that you'll need to follow along. All of these should be readily downloadable, although some are very large. I would consider purchasing a dedicated external drive for this if you have the space, disk space requirements may reach several hundred gigabytes, particularly if you want to store your intermediate results.
The data sets you will want are:
- The MNIST databast. It's included in this code repository which we will also be referring to later when looking at deep learning / neural networks: https://github.com/mnielsen/neural-networks-and-deep-learning
- The Kaggle "National Data Science Bowl" dataset: http://www.kaggle.com/c/datasciencebowl
- The Kaggle "Diabetic Retinopathy" dataset: http://www.kaggle.com/c/diabetic-retinopathy-detection
- Maybe also try a custom image-based data set of your own choosing. It's important to pick something which isn't already covered by existing tutorials, so that you are effectively forced into the process of experimentation with alternative techniques, but which can be considered a categorisation problem so that similar approaches should be effective. You don't need to do this, but it's a fun idea. You could use an export of your photo album, the results of google image searches or another dataset you create yourself. Put each class of images into its own subdirectory on disk.
For downloading data, I recommend Firefox over Chrome, since it is much more capable at resuming interrupted downloads. Many of these files are large, and you may genuinely have trouble. Pay attention to your internet plan's download limits if you have only a basic plan.
The next post will cover the technology setup I am using, including my choice of programming language, libraries and hardware. Experienced Python developers will be able to go through this very fast, but modern hardware does have limitations when applying machine learning algorithms, and it is useful to understand what those are at the outset.
Following that will be the first in a series of practical exercises aimed to obtain a basic ability to deploy common algorithms on image-based problems. We will start by applying multiple approaches to the MNIST dataset, which is the easiest starting point. The processing requirements are relatively low, as are the data volumes. Existing tutorials exist online for solving this problem. This is particularly useful to start with, since it gives you ready-made benchmarks for comparison, and also allows easy cross-comparison of techniques.
I'd really like it if readers could reply with their own experiences along the way. Try downloading the data sets -- let me know how you go! I'll help if I can. I expect that things will get more interesting when we come to sharing the experimental results.
Happy coding,
-Tennessee
March 17, 2015 01:38 PM
Motivation
A few months ago we posted a similar article that presented a way to implement real-time notifications on Django using Node.js, socket.io and Redis. It got quite a few comments asking us why we used Node.js instead of a gevent-based solution. Our response was that we had hands-on experience with the Node.js solution, and that we would try to write another article about a gevent-based solution in the future. This is that article.
This time we’ll be replacing Node.js with a 100% Python implementation using gevent-socketio and Redis with RabbitMQ, but we also didn’t want to bore you with the same vanilla notifications site, so we’re going to build something different. Something useful.
This time we’re going to build a complete GeoDjango-based site to report geo-located incidents in real-time using Google Maps.
The Application
The application is a Django 1.7 site that uses GeoDjango (backed by PostGIS) to track and report in real-time geo-located incidents that occur in certain areas of interest around the world. It provides views to manage incidents and areas of interest, a view to monitor the occurrence of incidents in real-time and a view to report incidents that uses geolocator to detect the user’s location.
Whenever an incident is saved (or updated), a message is sent to a RabbitMQ broadcast queue. At this time, the system checks whether the incident occurred in an area of interest, and a special alert message is sent if necessary. Any subscriber to the queue (which are created when a client connects to the notifications socket.io namespace) will receive the message and send a packet down the socket’s channel. It is up the the client’s JavaScript code to update the maps and generate notifications and alerts if necessary.
Although simple, the site has all the basic functionality and can be used as a basis for similar projects. The source is available on GitHub.
The model
To represent the incidents and the areas of interest, we’re going to use the following model:
from django.contrib.gis.db import models
class Incident(models.Model):
objects = models.GeoManager()
URGENT = 'UR'
HIGH = 'HI'
MEDIUM = 'ME'
LOW = 'LO'
INFO = 'IN'
SEVERITY_CHOICES = (
(URGENT, 'Urgent'),
(HIGH, 'High'),
(MEDIUM, 'Medium'),
(LOW, 'Low'),
(INFO, 'Info'),
)
name = models.CharField(max_length=150)
description = models.TextField(max_length=1000)
severity = models.CharField(max_length=2, choices=SEVERITY_CHOICES, default=MEDIUM)
closed = models.BooleanField(default=False)
location = models.PointField()
created = models.DateTimeField(editable=False, auto_now_add=True)
class AreaOfInterest(models.Model):
objects = models.GeoManager()
name = models.CharField(max_length=150)
severity = models.CharField(max_length=2, choices=Incident.SEVERITY_CHOICES, default=Incident.MEDIUM)
polygon = models.PolygonField()
The Incident class represents the occurrence of an event around a specific geographic point, specified by the location field. The AreaOfInterest is used to define a region for which the user is going to be alerted if an incident is reported within it. The polygon field specifies the geographic area.
The alerts are going to be sent only when the incident’s severity is above the area’s target severity, and location is within the area’s polygon. This is done using spatial QuerySets within the Incident post_save signal handler:
areas_of_interest = [
area_of_interest.geojson_feature for area_of_interest in AreaOfInterest.objects.filter(
polygon__contains=kwargs['instance'].location,
severity__in=kwargs['instance'].alert_severities,
)
]
Sending notifications
Once a notification has been constructed, we connect to a RabbitMQ broadcast queue (using Kombu) and publish the notification:
def send_notification(notification):
with BrokerConnection(settings.AMPQ_URL) as connection:
with producers[connection].acquire(block=True) as producer:
maybe_declare(notifications_exchange, producer.channel)
producer.publish(
notification,
exchange='notifications',
routing_key='notifications'
)
When a user accesses the site’s home view, it connects to a socket.io namespace:
from django.conf import settings
from kombu import BrokerConnection
from kombu.mixins import ConsumerMixin
from socketio.namespace import BaseNamespace
from socketio.sdjango import namespace
from .queues import notifications_queue
@namespace('/notifications')
class NotificationsNamespace(BaseNamespace):
def __init__(self, *args, **kwargs):
super(NotificationsNamespace, self).__init__(*args, **kwargs)
def get_initial_acl(self):
return ['recv_connect']
def recv_connect(self):
if self.request.user.is_authenticated():
self.lift_acl_restrictions()
self.spawn(self._dispatch)
else:
self.disconnect(silent=True)
def _dispatch(self):
with BrokerConnection(settings.AMPQ_URL) as connection:
NotificationsConsumer(connection, self.socket, self.ns_name).run()
When a connection is established (and authentication is verified), a new greenlet is spawned, passing the control to a NotificationConsumer instance:
class NotificationsConsumer(ConsumerMixin):
def __init__(self, connection, socket, ns_name):
self.connection = connection
self.socket = socket
self.ns_name = ns_name
def get_consumers(self, Consumer, channel):
return [Consumer(queues=[notifications_queue], callbacks=[self.process_notification])]
def process_notification(self, body, message):
self.socket.send_packet(dict(
type='event',
name='notification',
args=(body,),
endpoint=self.ns_name
))
message.ack()
Each message sent to the broadcast queue is handled by the callback process_notification which send a new packet down the socket’s channel with the body of the notification object.
The image above is a screenshot of the site’s home page while an alert is being shown to the user. The client side of the communication is quite simple:
var socket = io.connect(
"/notifications",
{
"reconnectionDelay": 5000,
"timeout": 10000,
"resource": "socket.io"
}
);
socket.on('connect', function(){
console.log('connect', socket);
});
The client connects to the socket, and hooks the appropriate callbacks. Socket.io hides away the complexities of choosing a transport layer and handling retries and reconnects. The notification callback handles most of the client’s logic.
socket.on('notification', function(notification){
console.log('notification', notification);
if (notification.type === "post_save") {
if (notification.created) {
map.data.addGeoJson(notification.feature);
} else {
var feature = map.data.getFeatureById(notification.feature.id);
map.data.remove(feature);
if (! notification.feature.properties.closed) {
map.data.addGeoJson(notification.feature);
}
}
} else if (notification.type === "post_delete") {
var feature = map.data.getFeatureById(notification.feature.id);
map.data.remove(feature);
} else if (notification.type === "alert") {
showAlert(buildAlertModalBodyHtml(notification))
} else {
console.log(notification);
}
});
socket.on('disconnect', function(){
console.log('disconnect', socket);
});
Upon receiving a notification we make use of Google Maps Data Layer API to draw onto the map. Notice that all we had to do is just given the GeoJSON representation of the objects (which is generated by GeoDjango) to the map, and the rest is take care for us.
Managing events and areas of interest
The site provides views to manage incidents and areas of interest that use Google Maps JavaScript API to manipulate the objects graphically within maps. GeoJson format is supported both by GeoDjango and Google Maps, so we use it as the exchange format in the forms.
We also provide a simple incident report view (depicted above) that uses the geolocator JavaScript library to detect the user’s location.
Conclusions
Although the exercise proved to be really interesting (specially the features related to the spatial features), we really didn’t find any significant advantages over our previous solution based on Node.js. In fact, we had to tackle several complications related to the restrictions that gevent places on the packages that can or can’t be used. First of all, we had to make sure that the libraries that work on the greenlets were either gevent specific, or monkey patching compatible (kombu is). We also had problems running the site using Gunicorn so we had to switch to Chaussette. There was also the matter of gevent-socketio only supporting version 0.9 of the socket.io protocol (hence the bower dependency that points to the 0.9 branch of the client repo).
We hope that you find the information presented in this post useful. As usual, feel free to leave comments or suggestions on how to improve the solution.
Acknowledgements
The bulk of the notifications architecture is based on the solution presented by Jeremy West on his blogpost Django, Gevent, and Socket.io. His tutorial is a great way to understand how gevent-socketio works and how to integrate it into Django.
March 17, 2015 01:33 PM
As we previously wrote, signup for our free Sponsor Workshops is open and the schedule has now been completed! While registration isn't required, it helps us plan for room sizes and for drinks and snacks, so head to Eventbrite and choose as many as you want!
Wednesday morning gets under way at 9 AM with a team from Elastic taking attendees through the popular Elasticsearch distributed search engine. Honza Král will introduce the various Python clients for working with Elasticsearch, and will be joined by Logstash developer Pier-Hughes and Peter from their solutions engineering team. The full description is available at https://us.pycon.org/2015/schedule/presentation/475/.
The 3:30 PM Wednesday slot features Mark Lavin, Caleb Smith, and David Ray of Caktus Group taking the stage to share their knowledge of RapidSMS and Django. We previously wrote about how they've used SMS while building a voter registration system in Libya, so come see first hand how they do it. The talk is beginner friendly so bring a laptop to check out the code and follow along.
The last slot on Thursday, running from 3:30 to 5:00, will be a trio of talks from Google. Brian Dorsey will be on hand to show how Kubernetes can scale up your usage of Docker, complete with a live demo (he gives great demos btw). The second talk will be on CoLaboratory by Jeff Snyder, covering the project, its integration with Google Drive, and further integrations with IPython and now the Jupyter project. Finally, Alex Perry will cover the use of Python decorators within monitoring pipelines to deliver positive value with minimal impact.
Be sure to sign up today!
March 17, 2015 12:48 PM
Another project night where we will focus on our HAB project: Sending a technical payload into space, and back, as part of the 2015 Global Space Balloon Challenge (http://http://balloonchallenge.org/). The project will include a payload that will pay homage to the first NASA balloon flights in 1969 designed to take large area photographs of the earth from a very high altitude.The payload will include a computer with operating system, many python scripts and various hardware including sensors, transmitters and other tech gear.
The monthly project nights until April will focus on building a high altitude balloon to send into near space. There is something to do for everyone, from art, to programming, to mechanical and electrical engineering, to finding stuff, reading regulations, making recovery plans, buying stuff, coming up with a team name, what experiments should be included in the payload etc. Don't wait for a direct invitation, sign up on our meetup group:
http://www.meetup.com/PYthon-Piedmont-Triad-User-Group-PYPTUG/events/220377127/This meeting will be on
Wednesday, Mar. 18 at 6pm in the Dash room at Inmar:
635 Vine St,Room 1130H "Dash"Winston-Salem, NCThis will be at the Inmar building in downtown Winston Salem.Some preliminary work has already started and discussion is ongoing on the PYPTUG mailing list:
https://groups.google.com/forum/#!forum/pyptugAnd look for the
Near Space Technical Payload Official Thread (should be at the top)
Keep an eye on this site for progress reports. At launch, you will be able to track the actual balloon through a web page.
Note: this is tomorrow Wednesday the 18th. Come by and learn how to network multiple Raspberry Pi model A+ without ethernet...
March 17, 2015 12:20 PM
Yesterday Cerberus 0.8.1 was released with a few little fixes, one of them being more a new feature than a fix really: sub-document fields can now be set as field dependencies by using a ‘dotted’ notation. So, suppose we set the following validation schema: schema = { 'test_field': { 'dependencies': [ 'a_dict.foo', 'a_dict.bar' ] }, […]
March 17, 2015 09:16 AM
Morning all!
XFS -> ext4
So the reason for our extra-long maintenance window this morning was primarily a migration from XFS to ext4 as our filesystem for user storage. We'll write more about the whys and wherefores of this later, but the short version is that the main reason for using XFS, project quotas, were no longer needed, and a bug in the version of XFS support by Ubuntu LTS left us vulnerable to long periods of downtime after unplanned reboots, while XFS did some unnecessary quotachecks. The switch to ext4 removes that risk, and has simplified some of our code too, bonus!
In other news, we've managed to squeeze in a few more user-visible improvements :)
Features bump for paid plans
We've decided to tweak the pricing and accounts pages so that all plans are customisable. As a bonus side-effect, we've slightly improved all the existing paid plans, so our beloved customers are going to get some free stuff:
- All Hacker plans now allow you to replace your .pythonanywhere.com domain with a custom one
- We've bumped the disk space for Hacker plans from 512MB to 2Gigs
- And we've bumped the Web Developer CPU quota from 3000 to 4000 seconds
Package installs
bottlenose, python-amazon-simple-product-api, py-bcrypt, Flask-Bcrypt, flask-restful, markdown (for Python 3), wheezy.template, pydub, and simpy (for Python 3) are now part of our standard batteries included
Pip wheels available
We've re-written our server build scripts to use wheels, and to build them for each package we install. We've made them available (at /usr/share/pip-wheels), and we've added them to the PythonAnywhere default pip config. So, if you're installing things into a virtualenv, if it so happens we already have a wheel for the package you want, pip will find it and the install will complete much faster.
Python 3 is now the default for save + run
The "Save and Run" button at the top of the editor, much beloved of teachers and beginners (and highly relevant for our education beta) now defaults to Python 3. It's 2015, this is the future after all. We didn't want to break things for existing users, so they will still have 2 as the default, but we can change that for you if you want. Just drop us a line to support@pythonanywhere.com
Security and performance improvements
Other than that, we've added a few minor security and performance tweaks.
Onwards and upwards!
March 17, 2015 07:42 AM
Several recent blog posts have focused on Python-related and PSF-funded activities in Africa and the Middle East. But the Python community is truly global, and it has been exciting to witness its continued growth. New groups of people are being introduced to Python and to programming so frequently that it’s difficult to keep up with the news. Not only that, but the scope and lasting impact of work being accomplished by Pythonistas with very modest financial assistance from the PSF is astonishing.
One example is the recent work in South America by
Manuel Kaufmannn. Manuel’s project is to promote the use of Python “to solve daily issues for common users." His choice of Python as the best language to achieve this end is due to his commitment to "the Software Libre philosophy,” in particular, collaboration rather than competition, as well as Python's ability "to develop powerful and complex software in an easy way."
Toward this end, one year ago, Manuel began his own project, spending his own money and giving his own time, traveling to various South American cities by car (again, his own), organizing meet-ups, tutorials, sprints, and other events to spread the word about Python and its potential to solve everyday problems (see
Argentina en Python).
This definitely got the PSF's attention, so in January 2015, the PSF awarded him a $3,000 (USD) grant. With this award, Manuel has been able to continue his work, conducting events that have established new groups that are currently expanding further. This ripple effect of a small investment is something that the PSF has seen over and over again.
On January 17, Resistencia, Argentina was the setting for its first-ever Python Sprint. It was a fairly low-key affair, held at a pub/restaurant “with good internet access.” There were approximately 20 attendees (including 4 young women), who were for the most part beginners. After a general introduction, they broke into 2 work groups, with Manuel leading the beginners' group (see
Resistencia, Chaco Sprint), by guiding them through some introductory materials and tutorials (e.g.,
Learning Python from PyAr's wiki).

Foto grupal con todos los asistentes (group photo of all attendees).
Photo credit: Manuel Kaufmann
As can happen, momentum built, and the Sprint was followed by a Meet-up on January 30 to consolidate gains and to begin to build a local community. The Meet-up's group of 15 spent the time exploring the capabilities of Python, Brython, Javascript, Django, PHP, OpenStreet Map, and more, in relation to needed projects, and a new Python community was born (see
Meetup at Resistencia, Chaco).
The next event in Argentina, the province of Formosa's first official Python gathering, was held on February 14. According to Manuel, it was a great success, attended by around 50 people. The day was structured to have more time for free discussion, which allowed for more interaction and exchange of ideas. In Manuel’s opinion, this structure really helped to forge and strengthen the community. The explicit focus on real world applications, with discussion of a Python/Django software application developed for and currently in use at Formosa’s Tourist Information Office, was especially compelling and of great interest to the attendees. See
PyDay Formosa and for pictures, see
PyDay Pics.
It looks as though these successes are just the beginning: Manuel has many more events scheduled:
You can learn more and follow Manuel’s project at the links provided and at
Twitter. And stay tuned to this blog, because I plan to cover more of his exciting journey to bring Python, open source, and coding empowerment to many more South Americans.
I would love to hear from readers. Please send feedback, comments, or blog ideas to me at msushi@gnosis.cx.
March 17, 2015 12:51 AM
March 16, 2015
Unofficial Python Job Board is a 100% free and community run job board.
To add a job vancancy/posting simply send a pull request ( yes you read that correctly ) to Python Job Repository
Submitting a Job Vacancy/Posting/Ad
The jobs board is generated automatically from the git repository, and hosted using github pages.
All adverts are held as markdown files, with an added header, under the jobs/ directory. Jobs files should look something like this:
---
title: <Job Advert Title (required)>
company: <Your Company (required)>
url: <Link to your site/a job spec (optional)>
location: <where is the job based? >
contract: permanent (or contract/temporary/part-time ..)
contact:
name: <Your name (required)>
email: <Email address applicants should submit to (required)>
phone: <Phone number (optional)
...: ...
created: !!timestamp '2015-02-20' <- The date the job was submitted
tags:
- london
- python
- sql
---
Full job description here, in Markdown format
To add your job, submit a pull request to this repo, that adds a single file to the jobs/ directory. This file should match the example above.
When each pull request is submitted, it gets validated, and then manually reviewed, before being added to the site. If the pull request fails the validation testing (Travis) then you must fix this before the pull request can proceed.
Previewing your submission
To preview your submission before creating a Review-request, there are a number of steps to follow:
- Install hyde - hyde.github.io <code>pip install hyde</code>
- Install fin - <code>pip install fin</code>
- clone/checkout the https://github.com/pythonjobs/template repository
- Within this clone, put your new file in <code>hyde/content/jobs/[job_filename].html
- Delete the contents of the <code>deploy</code> directory.
- from within <code>hyde/</code>, run <code>hyde serve</code>
- Open a web browser, and navigate to http://localhost:8080/
March 16, 2015 01:39 PM
This week, we welcome Eli Bendersky (@elibendersky) as our PyDev of the Week. I have enjoyed reading his blog over the years as he writes some pretty interesting articles on Python. You can see some of the projects he works on at github. Let’s spend a few minutes getting to know our fellow Pythoneer!
Can you tell us a little about yourself (hobbies, education, etc):
I hold a B.Sc in Electrical Engineering, and have been employed in both hardware and software engineering positions over the years. In the past few years I mostly gravitated towards system programming, infrastructure and tooling – working on things like compilers, debuggers and other low-level stuff.
As for hobbies, I guess kids count? That’s definitely what takes most of my off-work time nowadays
Other than family, I occasionally manage to carve out some free time for reading, exercising and self-education on topics ranging from programming and math to biology. I use my blog (http://eli.thegreenplace.net) as an outlet to document things I learned that I found most interesting.
Why did you start using Python?
I’m fairly late to the game, actually. In addition to C and C++, my main work hammer was Perl for the first part of my career. Eventually I became disillusioned with it, and after a brief fling with Ruby, ended up in Python-land in 2008. I haven’t looked back since. I’ll be forever thankful to Python for igniting my love for programming and open-source on a whole new level. Switching to Python turned out to be a great decision, given how much momentum the language has gained since 2008.
What other programming languages do you know and which is your favorite?
So I mentioned C and C++. C++ has been mostly paying the bills for me in the last few years, but Python is always in the picture. Other than that, I used to know Perl pretty well, dabbled with Ruby, Common Lisp and Scheme. I can find my way around Javascript most days. A bunch of assembly languages (from the more standard x86 to esoteric things like various DSPs and microcontrollers). Over the years I’ve written bits and pieces of code in Ada, Java and Matlab. There’s also the list of “languages to look at in the future”, right now it includes things like Go and Erlang. My favorite is Python, though. It’s always the first tool I reach for.
What projects are you working on now?
At work I’m hacking on all kinds of internal stuff I can’t talk much about, but some of it percolates upstream into the LLVM (compiler infrastructure) and Clang (C++ front-end) open-source projects. I have a bunch of small open-source projects I have authored and now maintain (mostly Python packages) – my Github account (https://github.com/eliben) has all the details. As a core Python developer my activity comes in short and rare bursts, unfortunately. And there’s my blog, which is a kind of an ongoing project, I guess.
Which Python libraries are your favorite (core or 3rd party)?
I’m a big believer in keeping dependencies small, and developing the core parts of a project yourself. Therefore, my favorite Python libraries are the included batteries – the stdlib. Often folks look for 3rd party libraries for things that are sufficiently served by stdlib modules. I think there’s great value in sticking to the core as much as possible, because it’s part of a common language all Python programmers speak.
The Python ecosystem is powerful, though, and some 3rd party “libraries” are complete frameworks with a bunch of subsystems of their own. I really like using Django for web apps, for example – I’d see no reason to develop a web framework of my own, or to use any of the infinitude of “micro-frameworks” popping up. Django is so well entrenched, it’s a common idiom and language many programmers understand – it has an ecosystem of its own. Another example is the excellent scientific stack Python has – Numpy, Scipy, matplotlib, etc. I’m really excited about the central place Python taking in the world of “big data” thanks to these technologies.
Is there anything else you’d like to say?
Python presents an interesting challenge to programmers. Used correctly, it’s easy to write extremely readable and maintainable Python code. This is an almost unique quality of Python among the modern programming languages.
But the language is powerful, and with some creativity you can create unintelligible monstrosities that are definitely clever, but not very collaboration-friendly. Stick to the simple things as much as possible; if you find you really need to use some metaclass magic or something similarly advanced, encapsulate it well and hide it from most of the code. And don’t forget to document it very well. So stick to the KISS principle, basically.
Thanks so much!
The Last 10 PyDevs of the Week
March 16, 2015 12:30 PM
Caktus has been involved in quite a few projects (Libyan voter registration, UNICEF Project Mwana, and several others) that include text messaging (a.k.a. Short Message Service, or SMS), and we always use RapidSMS as one of our tools. We've also invested our own resources in supporting and extending RapidSMS.
There are other options; why do we consistently choose RapidSMS?
What is RapidSMS
First, what is RapidSMS? It's an open source package of useful tools that extend the Django web development framework to support processing text messages. It includes:
- A framework for writing code to be invoked when a text message is received and respond to it
- A set of backends - pluggable code modules that can interface to various ways of connecting your Django program to the phone network to pass text messages back and forth
- Sample applications
- Documentation
The backends are required because unlike email, there's no universal standard for sending and receiving text messages over the Internet. Often we get access to the messages via a third party vendor, like Twilio or Tropo, that provides a proprietary interface. RapidSMS isolates us from the differences among vendors.
RapidSMS is open source, under the BSD license, with UNICEF acting as holder of the contributors' agreements (granting a license for RapidSMS to use and distribute their contributions). See the RapidSMS license for more about this.
Alternatives
Here are some of the alternatives we might have chosen:
- Writing from scratch: starting each project new and building the infrastructure to handle text messages again
- Writing to a particular vendor's API: writing code that sends and receives text messages using the programming interface provided by one of the online vendors that provide that service, then building applications around that
- Other frameworks
Why RapidSMS
Why did we choose RapidSMS?
- RapidSMS builds on Django, our favorite web development framework.
- RapidSMS is at the right level for us. It provides components that we can use to build our own applications the way we need to, and the flexibility to customize its behavior.
- RapidSMS is open source, under the BSD license. There are no issues with our use of it, and we are free to extend it when we need to for a particular project. We then have the opportunity to contribute our changes back to the RapidSMS community.
- RapidSMS is vendor-neutral. We can build our applications without being tied to any particular vendor of text messaging services. That's good for multiple reasons:
- We don't have to pick a vendor before we can start.
- We could change vendors in the future without having to rewrite the applications.
- We can deploy applications to different countries that might not have any common vendor for messaging services.
It's worth noting that using RapidSMS doesn't even require using an Internet text messaging vendor. We can use other open source applications like Vumi or Kannel as a gateway to provide us with even more options:
- use hardware called a "cellular/GSM modem" (basically a cell phone with a connection to a computer instead of a screen)
- interface directly to a phone company's own servers over the Internet, using several widely used protocols
Summary
RapidSMS is a good fit for us at Caktus, it adds a lot to our projects, and we've been pleased to be able to contribute back to it.
Caktus will be leading a workshop on building RapidSMS applications during PyCon 2015 on Tuesday, April 7th 3:00-5:30.
March 16, 2015 12:00 PM
I've written in the past about my dislike for Django's Class Based Views. Django's
CBVs add a lot of complexity and verbosity, and simply get in the way of some
moderately common patterns (e.g. when you have two forms in a single view). It
seems I'm not alone
as a Django core dev who thinks that way.
In this post, however, I'll write about a different approach that I took in one
project, which can be summed up like this:
Write your own base class.
For really simple model views, Django's own CBVs can be a
time saver. For anything more complex, you will run into difficulties, and will
need some heavy documentation at the very least.
One solution is to use a simplified re-implementation of Class Based Views. My own approach is to go even further and
start from nothing, writing your own base class, while borrowing the best ideas
and incorporating only what you need.
Steal the good ideas
The as_view
method provided by the Django's View class is a great idea — while it may
not be obvious, it was hammered out after a lot of discussion as a way to help
promote request isolation by creating a new instance of the class to handle
every new request. So I'll happily steal that!
Reject the bad
Personally I dislike the dispatch method with its assumption that handling
of GET and POST is going to be completely different, when often they can
overlap a lot (especially for typical form handling). It has even introduced
bugs for me where a view rejected POST requests, when what it needed to do was
just ignore the POST data, which required extra code!
So I replaced that with a simple handle function that you have to implement
to do any logic.
I also don't like the way that template names are automatically built from model
names etc. — this is convention over configuration, and it makes life
unnecessarily hard for a maintenance programmer who greps to find out where a
template is used. If that kind of logic is used, you just Have To Know where to
look to see if a template is used at all and how it is used. So that is going.
Flatten the stack
A relatively flat set of base classes is going to be far easier to manage than a
large set of mixins and base classes. By using a flat stack, I can avoid writing
crazy hacks to subvert what I have inherited.
Write the API you want
For instance, one of the things I really dislike about Django's CBVs is the
extremely verbose way of adding new data to the context, which is something that
ought to be really easy, but instead requires 4 lines:
class MyView(ParentView):
def get_context_data(self, **kwargs):
context = super(MyView, self).get_context_data(**kwargs)
context['title'] = "My title" # This is the only line I want to write!
return context
In fact, it is often worse, because the data to add to the context may actually
have been calculated in a different method, and stuck on self so that
get_context_data could find it. And you also have the problem that it is
easy to do it wrong e.g. if you forget the call to super things start breaking in
non-obvious ways.
(In searching GitHub for examples, I actually found hundreds and hundreds of
examples that look like this:
class HomeView(TemplateView):
# ...
def get_context_data(self):
context = super(HomeView, self).get_context_data()
return context
This doesn't make much sense, until I realised that people are using boilerplate
generators/snippets to create new CBVs — such as this for emacs
and this for vim,
and this for Sublime Text.
You know when you have created an unwieldy API when people need these kinds of
shortcuts.)
So, the answer is:
Imagine the API you want, then implement it.
This is what I would like to write for static additions to the context:
class MyView(ParentView):
context = {'title': "My title"}
and for dynamic:
class MyView(ParentView):
def context(self):
return {'things': Thing.objects.all()
if self.request.user.is_authenticated()
else Thing.objects.public()}
# Or perhaps using a lambda:
context = lambda self: ...
And I would like any context defined by ParentView to be automatically
accumulated, even though I didn't explicitly call super. (After all, you
almost always want to add to context data, and if necessary a subclass could
remove specific inherited data by setting a key to None).
I'd also like for any method in my CBV to simply be able to add data to the
context directly, perhaps by setting/updating an instance variable:
class MyView(ParentView):
def do_the_thing(self):
if some_condition():
self.context['foo'] = 'bar'
Of course, it goes without saying that this shouldn't clobber anything at the
class level and violate request isolation, and all of these methods should work
together nicely in the way you would expect. And it should be impossible to
accidentally update any class-defined context dictionary from within a method.
Now, sometimes after you've finished dreaming, you find your imagined API is too
tricky to implement due to a language issue, and has to be modified. In this
case, the behaviour is easily achievable, although it is a little bit magic,
because normally defining a method in a subclass without using super means
that the super class definition would be ignored, and for class attributes you
can't use super at all.
So, my own preference is to make this more obvious by using the name
magic_context for the first two (the class attribute and the method). That
way I get the benefits of the magic, while not tripping up any maintainer — if
something is called magic_foo, most people are going to want to know why it
is magic and how it works.
The implementation
uses a few tricks, the heart of which is using
reversed(self.__class__.mro()) to get all the super-classes and their magic_context attributes,
iteratively updating a dictionary with them.
Notice too how the TemplateView.handle method is extremely simple, and just
calls out to another method to do all the work:
class TemplateView(View):
# ...
def handle(self, request):
return self.render({})
This means that a subclass that defines handle to do the actual logic
doesn't need to call super, but just calls the same method directly:
class MyView(TemplateView):
template_name = "mytemplate.html"
def handle(self, request):
# logic here...
return self.render({'some_more': 'context_data'})
In addition to these things, I have various hooks that I use to handle things
like AJAX validation for form views, and RSS/Atom feeds for list views etc.
Because I'm in control of the base classes, these things are simple to do.
Conclusion
I guess the core idea here is that you shouldn't be constrained by what Django
has supplied. There is actually nothing about CBVs that is deeply integrated
into Django, so your own implementation is just as valid as Django's, but you can
make it work for you. I would encourage you to write the actual code you
want to write, then make the base class that enables it to work.
The disadvantage, of course, is that maintenance programmers who have memorised
the API of Django's CBVs won't benefit from that in the context of a project
which uses another set of base classes. However, I think the advantages more
than compensate for this.
Feel free to borrow any of the code or ideas if they are useful!
March 16, 2015 10:35 AM

In December, I wrote that we are removing the idiosyncratic use_greenlets option from PyMongo when we release PyMongo 3.
In PyMongo 2 you have two options for using Gevent. First, you can do:
from gevent import monkey; monkey.patch_all()
from pymongo import MongoClient
client = MongoClient()
Or:
from gevent import monkey; monkey.patch_socket()
from pymongo import MongoClient
client = MongoClient(use_greenlets=True)
In the latter case, I wrote, "you could use PyMongo after calling Gevent's patch_socket without having to call patch_thread. But who would do that? What conceivable use case had I enabled?" So I removed use_greenlets in PyMongo 3; the first example code continues to work but the second will not.
In the comments, PyMongo user Peter Hansen replied,
I hope you're not saying that the only way this will work is if one uses monkey.patch_all, because, although this is a very common way to use Gevent, it's absolutely not the only way. (If it were, it would just be done automatically!) We have a large Gevent application here which cannot do that, because threads must be allowed to continue working as regular threads, but we monkey patch only what we need which happens to be everything else (with monkey.patch_all(thread=False)).
So Peter, Bernie, and I met online and he told us about his very interesting application. It needs to interface with some C code that talks an obscure network protocol; to get the best of both worlds his Python code uses asynchronous Gevent in the main thread, and it avoids blocking the event loop by launching Python threads to talk with the C extension. Peter had, in fact, perfectly understood PyMongo 2's design and was using it as intended. It was I who hadn't understood the feature's use case before I diked it out.
So what now? I would be sad to lose the great simplifications I achieved in PyMongo by removing its Gevent-specific code. Besides, occasional complaints from Eventlet and other communities motivated us to support all frameworks equally.
Luckily, Gevent 1.0 provides a workaround for the loss of use_greenlets in PyMongo. Beginning the same as the first example above:
from gevent import monkey; monkey.patch_all()
from pymongo import MongoClient
client = MongoClient()
def my_function():
# Call some C code that drops the GIL and does
# blocking I/O from C directly.
pass
start_new_thread = monkey.saved['thread']['start_new_thread']
real_thread = start_new_thread(my_function, ())
I checked with Gevent's author Denis Bilenko whether monkey.saved was a stable API and he confirmed it is. If you use Gevent and PyMongo as Peter does, port your code to this technique when you upgrade to PyMongo 3.
Image: Wingchi Poon, CC BY-SA 3.0
March 16, 2015 02:29 AM
This work is supported by Continuum Analytics
and the XDATA Program
as part of the Blaze Project
tl;dr We benchmark several options to store Pandas DataFrames to disk.
Good options exist for numeric data but text is a pain. Categorical dtypes
are a good option.
Introduction
For
dask.frame
I need to read and write Pandas DataFrames to disk. Both disk bandwidth and
serialization speed limit storage performance.
- Disk bandwidth, between 100MB/s and 800MB/s for a notebook hard drive, is
limited purely by hardware. Not much we can do here except buy better
drives.
- Serialization cost though varies widely by library and context. We can be
smart here. Serialization is the conversion of a Python variable (e.g.
DataFrame) to a stream of bytes that can be written raw to disk.
Typically we use libraries like pickle to serialize Python objects. For
dask.frame we really care about doing this quickly so we’re going to also
look at a few alternatives.
Contenders
pickle - The standard library pure Python solution
cPickle - The standard library C solution
pickle.dumps(data, protocol=2) - pickle and cPickle support multiple
protocols. Protocol 2 is good for numeric data.
json - using the standardlib json library, we encode the values and
index as lists of ints/strings
json-no-index - Same as above except that we don’t encode the index of the
DataFrame, e.g. 0, 1, ...
We’ll find that JSON does surprisingly well on pure text data.
- msgpack - A binary JSON alternative
CSV - The venerable pandas.read_csv and DataFrame.to_csv
hdfstore - Pandas’ custom HDF5 storage format
Additionally we mention but don’t include the following:
dill and cloudpickle- formats commonly used for function
serialization. These perform about the same as cPickle
hickle - A pickle interface over HDF5. This does well on NumPy data but
doesn’t support Pandas DataFrames well.
Experiment
Disclaimer: We’re about to issue performance numbers on a toy dataset. You
should not trust that what follows generalizes to your data. You should
look at your own data and run benchmarks yourself. My benchmarks lie.
We create a DataFrame with two columns, one with numeric data, and one with
text. The text column has repeated values (1000 unique values, each repeated
1000 times) while the numeric column is all unique. This is fairly typical of
data that I see in the wild.
df = pd.DataFrame({'text': [str(i % 1000) for i in range(1000000)],
'numbers': range(1000000)})
Now we time the various dumps and loads methods of the different
serialization libraries and plot the results below.

As a point of reference writing the serialized result to disk and reading it
back again should take somewhere between 0.05s and 0.5s on standard hard
drives. We want to keep serialization costs below this threshold.
Thank you to Michael Waskom for making those
charts
(see twitter conversation
and his alternative
charts)
Gist to recreate plots here:
https://gist.github.com/mrocklin/4f6d06a2ccc03731dd5f
Further Disclaimer: These numbers average from multiple repeated calls to
loads/dumps. Actual performance in the wild is likely worse.
Observations
We have good options for numeric data but not for text. This is unfortunate;
serializing ASCII text should be cheap. We lose here because we store text in
a Series with the NumPy dtype ‘O’ for generic Python objects. We don’t have a
dedicated variable length string dtype. This is tragic.
For numeric data the successful systems systems record a small amount of
metadata and then dump the raw bytes. The main takeaway from this is that you
should use the protocol=2 keyword argument to pickle. This option isn’t
well known but strongly impacts preformance.
Note: Aaron Meurer notes in the comments that for Python 3 users protocol=3
is already default. Python 3 users can trust the default protocol= setting
to be efficient and should not specify protocol=2.

Some thoughts on text
-
Text should be easy to serialize. It’s already text!
-
JSON-no-index serializes the text values of the dataframe (not the integer
index) as a list of strings. This assumes that the data are strings which is
why it’s able to outperform the others, even though it’s not an optimized
format. This is what we would gain if we had a string dtype rather than
relying on the NumPy Object dtype, 'O'.
-
MsgPack is surpsingly fast compared to cPickle
-
MsgPack is oddly unbalanced, it can dump text data very quickly but takes a
while to load it back in. Can we improve msgpack load speeds?
-
CSV text loads are fast. Hooray for pandas.read_csv.
Some thoughts on numeric data
-
Both pickle(..., protocol=2) and msgpack dump raw bytes.
These are well below disk I/O speeds. Hooray!
-
There isn’t much reason to compare performance below this level.
Categoricals to the Rescue
Pandas recently added support for categorical
data. We
use categorical data when our values take on a fixed number of possible options
with potentially many repeats (like stock ticker symbols.) We enumerate these
possible options (AAPL: 1, GOOG: 2, MSFT: 3, ...) and use those numbers
in place of the text. This works well when there are many more
observations/rows than there are unique values. Recall that in our case we
have one million rows but only one thousand unique values. This is typical for
many kinds of data.
This is great! We’ve shrunk the amount of text data by a factor of a thousand,
replacing it with cheap-to-serialize numeric data.
>>> df['text'] = df['text'].astype('category')
>>> df.text
0 0
1 1
2 2
3 3
...
999997 997
999998 998
999999 999
Name: text, Length: 1000000, dtype: category
Categories (1000, object): [0 < 1 < 10 < 100 ... 996 < 997 < 998 < 999]
Lets consider the costs of doing this conversion and of serializing it
afterwards relative to the costs of just serializing it.
|
seconds |
| Serialize Original Text |
1.042523 |
| Convert to Categories |
0.072093 |
| Serialize Categorical Data |
0.028223 |
When our data is amenable to categories then it’s cheaper to
convert-then-serialize than it is to serialize the raw text. Repeated
serializations are just pure-win. Categorical data is good for other reasons
too; computations on object dtype in Pandas generally happen at Python speeds.
If you care about performance then categoricals are definitely something to
roll in to your workflow.
Final Thoughts
- Several excellent serialization options exist, each with different
strengths.
- A combination of good serialization support for numeric data and
Pandas categorical dtypes enable efficient serialization and storage of
DataFrames.
- Object dtype is bad for PyData. String dtypes would be nice. I’d like to
shout out to DyND a possible NumPy
replacement that would resolve this.
- MsgPack provides surprisingly good performance over custom Python
solutions, why is that?
- I suspect that we could improve performance by special casing Object dtypes
and assuming that they contain only text.
March 16, 2015 12:00 AM
March 15, 2015
I am reaching the final stages of my new book. Here are few ways to stay updated about the book:

Blog posts: http://echorand.me/category/doingmathwithpython/
Facebook page: https://www.facebook.com/doingmathwithpython
G+ Community: https://plus.google.com/u/0/communities/113121562865298236232
Twitter: https://twitter.com/mathwithpython
If you are an educator/teacher, I can also try to get a sample for you to look at the current pre-released version of the book.
March 15, 2015 09:43 AM
March 14, 2015
The Python 3 OOP series of posts that you can find here is now available as a series of IPython Notebooks.
From the official site:
The IPython Notebook is a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document.
As a matter of fact, IPython Notebook is the perfect environment to teach Python itself. If you want to know more about this wonderful piece of software check the official site
You can find the notebook of each post here
or download the whole series as a zip file
As everything on this blog, notebooks are released under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). Feel free to submit corrections to the GitHub issues page.
This is a preview of the notebooks in action

March 14, 2015 09:00 PM
One of the things that is very useful about Python is its extreme
introspectability and malleability. Taken too far, it can make your code
an unmaintainable mess, but it can be very handy when trying to debug
large and complex projects.
Open edX is one such project. Its main
repository has about 200,000 lines of Python spread across 1500 files.
The test suite has 8000 tests.
I noticed that running the test suite left a number of temporary directories
behind in /tmp. They all had names like tmp_dwqP1Y, made by the tempfile
module in the standard library. Our tests have many calls to mkdtemp,
which requires the caller to delete the directory when done. Clearly, some
of these cleanups were not happening.
To find the misbehaved code, I could grep through the code for calls to
mkdtemp, and then reason through which of those calls eventually deleted
the file, and which did not. That sounded tedious, so instead I took the
fun route: an aggressive monkeypatch to find the litterbugs for me.
My first thought was to monkeypatch mkdtemp itself. But most uses of the
function in our code look like this:
from tempfile import mkdtemp
...
d = mkdtemp()
Because the function was imported directly, if my monkeypatching code ran
after this import, the call wouldn't be patched. (BTW, this is one more
small reason to prefer importing modules, and using module.function in the
code.)
Looking at the implementation of mkdtemp, it makes use of a helper function
in the tempfile module, _get_candidate_names. This helper is a generator
that produces those typical random tempfile names. If I monkeypatched that
internal function, then all callers would use my code regardless of how
they had imported the public function. Monkeypatching the internal helper
had the extra advantage that using any of the public functions in tempfile
would call that helper, and get my changes.
To find the problem code, I would put information about the caller into the
name of the temporary file. Then each temp file left behind would be a
pointer of sorts to the code that created it. So I wrote my own
_get_candidate_names like this:
import inspect
import os.path
import tempfile
real_get_candidate_names = tempfile._get_candidate_names
def get_candidate_names_hacked():
stack = "-".join(
"{}{}".format(
os.path.basename(t[1]).replace(".py", ""),
t[2],
)
for t in inspect.stack()[4:1:-1]
)
for name in real_get_candidate_names():
yield "_" + stack + "_" + name
tempfile._get_candidate_names = get_candidate_names_hack
This code uses inspect.stack to get the call stack. We slice it oddly, to
get the closest three calling frames in the right order. Then we extract
the filenames from the frames, strip off the ".py", and concatenate them
together along with the line number. This gives us a string that indicates
the caller.
The real _get_candidate_names function is used to get a generator of good
random names, and we add our stack inspection onto the name, and yield
it.
Then we can monkeypatch our function into tempfile. Now as long as this
module gets imported before any temporary files are created, the files
will have names like this:
tmp_case53-case78-test_import_export289_DVPmzy/
tmp_test_video36-test_video143-tempfile455_2upTdS.srt
The first shows that the file was created in test_import_export.py at line 289, called
from case.py line 78, from case.py line 53. The second shows that
test_video.py has a few functions calling eventually into tempfile.py.
I would be very reluctant to monkeypatch private functions inside other
modules for production code. But as a quick debugging trick, it works
great.
March 14, 2015 06:38 PM
I have a webserver with 3 WSGI applications running on different domains (1, 2, 3). All deployed with a combination of Gunicorn and NGINX. A combination that works really well, but there are two annoyances that are only going to get worse the more sites I deploy:
A) The configuration for each server resides in a different location on the filesystem, so I have to recall & type a long path to edit settings.
B) More significantly, each server adds extra resource requirements. I follow the advice of running each WSGI application with (2 * number_of_cores + 1) processes, each with 8 threads. The threads may be overkill, but that ensures that the server can use all available capacity to handle dynamic requests. On my 4 core server, that's 9 processes, 72 threads per site. Or 27 processes, and 216 threads for the 3 sites. Clearly that's not scalable if I want to host more web applications on one server.
A new feature recently added to Moya fixes both those problems. Rather than deploy a WSGI application for each site, Moya can now optionally create a single WSGI application that serves many sites. With this new system, configuration is read from /etc/moya/, which contains a directory structure like this:
|-- logging.ini
|-- moya.conf
|-- sites-available
| |-- moyapi.ini
| |-- moyaproject.ini
| `-- notes.ini
`-- sites-enabled
|-- moyapi.ini
|-- moyaproject.ini
`-- notes.ini
At the top level is “moya.conf” which contains a few server-wide settings, and “logging.ini” which contains logging settings. The directories “sites-available” and “sites-enabled” work like Apache and NGINX servers; settings for each site are read from “sites-enabled”, which contains symlinks to files in “sites-available”.
Gunicorn (or other wsgi server) can run these sites with a single instance by specifying the WSGI module as “moya.service:application”. This application object dispatches the request to the appropriate server (based on a domain defined in the INI).
Because all three sites are going through a single Gunicorn instance, only one lot of processes / threads will ever be needed. And the settings files are much easier to locate. Another advantage is that there will be less configuration required to add another site.
This new multi-server system is somewhat experimental, and hasn't been documented. But since I believe in eating my own dog-food, it has been live now for a whole hour–with no problems.
March 14, 2015 05:08 PM
This morning, PSF Director David Mertz announced on the PSF Members' mailing list the opening of a vote. For those of you who have already self-certified as voting members, or if you are already a Fellow of the Foundation, you should have received the announcement in a private email.
This is our first stab at using the voting mechanism to get a sense of the larger membership's views on an issue currently under discussion (the non-binding poll), so we urge you to take a moment and make your voice heard.
To review your eligibility to vote and to see the certification form, please see my previous blog post
Enroll as Voting Member or go to the
PSF Website.
Here is the announcement:
Membership Vote for Pending Sponsors and Non-Binding Poll
The candidate Sponsor Members listed below were recommended for approval by the Python Software Foundation Board of Directors. Following the ballot choices is a detailed description of the organization (the submit button is after the descriptions, so scroll down for it).
This election will close on 2015-03-26.
Sponsor Member Candidates
Non-Binding Poll on PyCon Video Sublicensing
Purpose: The PSF Board of Directors is seeking the collective perspective of PSF Voting Members on the appropriate handling of video recording sublicensing for presentations at PyCon US. These videos are currently made freely available on Google's YouTube, and may be incorporated into other sites through YouTube's embedding features. There are no plans to change that arrangement, but a separate question has arisen that requires determining whether it would be appropriate to exercise the sublicensing rights granted to the PSF under the PyCon US speaker agreement. This part of the poll serves as a non-binding survey of PSF Voting Members, intended to help the Directors formulate a suitable policy in this area based on the way the PyCon US speaker agreement is generally perceived, rather than based solely on what it permits as a matter of law.
Background: A request has been made to the PSF to sublicense video recordings made at PyCon of speaker presentations. The license agreement signed by speakers gives the PSF the right to grant such sublicenses, however the Board of Directors is of mixed opinion about whether we should do so. The release form (i.e. license) agreed to by speakers is at https://us.pycon.org/2015/speaking/recording/ for reference. Note that YouTube is explicitly mentioned in the release as an example of such a sublicensee, and pyvideo.org has always been given this right (although they have only exercised it thus far by embedding YouTube hosted videos, not by mirroring content, and hence are not technically a sublicensee at this point). Embedding a video does not require a sublicense, only mirroring it does.
There are two axes along which the Board is divided. On the one hand, we are not unanimous about whether we should grant a sublicense to commercial entities which may benefit financially by providing local copies of these video recordings, and may even potentially grant such local access only to subscribers in some manner. In favor of granting such access, some Directors feel that the more widespread the mirroring, the better, regardless of the commercial or non-commercial nature of the hosting (i.e. as long as the gratis access is never removed, which is not being contemplated). In opposition to granting such access, some Directors feel that for-profit sublicensees will gain unfair commercial advantage by bundling PyCon videos with other content sold for profit. Potentially the PSF may require payment, and gain revenue, for granting these sublicense rights.
On the other hand, we are also not unanimous about whether—if we do grant sublicenses—we should do so only prospectively, once we can inform speakers of our intent prior to their talks, or whether we should exercise the rights given in speaker releases even retroactively for previous PyCons. While speakers have given such rights already in a legal sense, some Directors feel they may not have fully contemplated that grant at the time, and only going forward, with more explicit information about sublicensing intents of the PSF, should sublicensing be allowed to other entities.
Bloomberg LP
As the market data and analysis industry leader, Bloomberg LP provides a broad portfolio of innovations to our clients. Bloomberg's Open Market Data Initiative is part of our ongoing efforts to foster open solutions for the the financial services industry. This includes a set of published Python modules that are freely available to our clients at http://www.bloomberglabs.com/api/libraries/. In support of promoting further Python usage within the financial services industry, we have hosted a number of free public developer-focused events to support the Python ecosystem—including the Scientific Python community. Please refer to http://go.bloomberg.com/promo/invite/bloomberg-open-source-day-scientific-python/ and https://twitter.com/Mbussonn/status/533566917727223808. By becoming a member, we wish to further increase our support of the PSF in its mission to promote, protect, and advance the Python programming language.
Fastly
Fastly provides the PSF with unlimited free CDN services, a dedicated IP block, and hosted certificates. We also provide the PSF with free Premium Support. Over the last few months, Fastly’s comped services to the PSF totalled up to ~$20,000/month. In January 2015 alone, the PSF sent 1.7 billion requests and 132 TB through Fastly.
Python is a the go-to language at Fastly for building developer tools. Python allows Fastly to rapidly prototype and deploy novel protocols and services over multiple platforms, including devices like network switches, which are traditionally not programmable. Fastly relies on Python for data analysis and to dynamically reconfigure network switching and routing to steer every request to the closest available server. These tools are instrumental in helping Fastly reliably deliver more traffic in less time.
Infinite Code
Infinite Code is a software development firm with offices in Beijing, China and Kuala Lumpur, Malaysia. We are strong believers in Free/Open Source Software and the people centric principles of Agile Development. Our language of choice is Python for software development where possible. Our recent Python developments run the range from high volume, real money gaming platforms to massively parallel data gathering and transformation for large quantities of data. Our developers have been using Python since 2001.
I would love to hear from readers. Please send feedback, comments, or blog ideas to me at msushi@gnosis.cx.
March 14, 2015 05:04 PM
The second PSF sponsored African conference I want to tell you about is Python Namibia (only a mere 3500 kilometers or 2175 miles south of Cameroon). The conference, the first ever held in Namibia, was held Feb 2 – 5, 2015 at the University of Namibia in the city of Windhoek. The PSF provided funds at the level of "Gold Sponsorship" that were used to subsidize travel for international attendees and to purchase a banner.
Photo credit to python-namibia.org
According to an email to the PSF from organizer
Daniele Procida, “. . . the event was a success, with 65 attendees for the four days, and was met with huge enthusiasm by our Namibian hosts. I hope to be back in Namibia next year for an even bigger event, organised by the newly-established Python community there.”
The official website
Python Namibia provides additional information and thanks to the conference's additional sponsors: Cardiff University in Wales (through its Phoenix Project), The University of Namibia, and the Django/Python web agency, Divio AG in Zürich.
One of the attendees was the PSF's good friend, the geologist
Carl Trachte, who sums up his reasons for attending PyCons all around the world as:
The neat thing about country/regional conferences is that you more frequently get to talk to developers or tech professionals from that place who don’t always frequent conferences outside their area. Seeing how Python (and digital technology in general) is being used in Sub-Saharan Africa (for the establishment of a wireless network, for example), learning what the average work day is like for a Pythonista in these parts of the world - those are things you really can’t get without being there.
The four days of talks, workshops, coding, collaboration and interaction engendered such enthusiasm and interest that on the last day a group of the participants self-organized to form
“PyNam, the Python Namibia Association”.
Photo Credit to python-namibia.org
We certainly look forward to more exciting projects and events coming out of this group.
March 14, 2015 05:03 PM
One of the nice things added in Zato 2.0 is
the improved ability to store code of one's API
services directly in a server's hot-deploy directory -
each time a file is saved it is uploaded on server and automatically propagated throughout
all the other nodes in a cluster the given server belongs to.
Now, this in itself had been already doable since the very 1.0 version but the newest
release added means to configure servers not to clean up hot-deploy directory after the code
was picked up - meaning anything that is saved in there stays until it's deleted manually.
Two cool things can be achieved thanks to it:
- Working in deploy-on-save mode
- Deploying code from a repository checkout
Initial steps
To make it all possible, navigate to all of the servers'
server.conf
files, find hot_deploy.delete_after_pick_up, change it from True to False
and restart all servers. This is the only time they will be restarted, promise.
Working in deploy-on-save mode
- Let's say your server is in /home/user/zato/server1
- Save your files in /home/user/zato/server1/pickup-dir now
- Each time it's saved, note in server.log how it's picked up and deployed
- This lets you make use of the service in the actual environment a moment after it's saved
Deploying code from a repository checkout
- Essentially, this is deploy-on-save described above working on a grander scale
- Instead of saving individual files, everything that is needed for a given solution is stored
in the hot-deploy's pickup directory in one go
- Can be easily plugged into Jenkins or other automation tools
- You can try it right now using this sample repository prepared for the article
- Go to a server's pickup dir
- Delete anything it already contains
- Issue the command below:
$ git clone https://github.com/zatosource/hot-deploy-sample.git .
- Witness that the two services just checked out are being nicely picked up by all servers in a cluster
- This concludes the deployment - an environment has been just updated with newest versions
of services and they are already operational, as can be confirmed in
web-admin
March 14, 2015 04:04 PM
March 13, 2015
My colleagues Steve Stagg has created a wonderful new thing – an entirely free Python jobs board. Anybody can add a new job simply by making a GitHub pull request. It’s a work of genius because it’s the absolute simplest possible solution to a problem that’s been bothering me for weeks – and the beauty of […]
March 13, 2015 10:16 PM
Happy Friday everyone,
Today we’ll take a look at some of the basic VCS features in PyCharm that can help manage different version control systems.
You may already know that PyCharm has seamless integration with major version controls like Git, GitHub, Subversion, Mercurial, Perforce (available only in PyCharm Professional Edition), and CVS. Even though all these version controls have different models and command sets, PyCharm makes life a lot easier by advocating a VCS-agnostic approach for managing them wherever possible.
So here we go:
Checking out a project from a VCS
To import a project from a version control system, click the Check out from Version Control button on the Welcome screen, or use the same VCS command from the main menu:

Version Control settings
A project’s version control settings are accessed via Settings → Version Control. You can associate any of the project folders with a repository root. These associations can be removed at any time, or you can even opt to disable the version control integration entirely:

PyCharm can handle multiple VCS repositories assigned to different folders of the project hierarchy, and perform all VCS operations on them in uniform manner.
Changes tool window and changelists
After version control is enabled for a project, you can see and manage your local changes via the Changes tool window. To quickly access the tool window, press Alt + 9 (Cmd-9 on a Mac):

All changes are organized into changelists that can be created, removed, and made active.
Quick list of VCS operations
When you need to perform a VCS operation on a currently selected file, directory, or even on the entire project, bring up the VCS operations quick-list via Alt+Back Quote (Ctrl-V on a Mac):

Show History
The history of changes is available for a set of files or directories via the VCS operations quick-list, or in the main menu VCS →<version control name> → Show History, or in the context menu → Show History:

To see all changes for a specific code snippet, use the Show History for Selection action.
Annotations
Annotations are available from the quick-list, the main menu or the context menu. They allow you to see who changed a certain line of code and when:

When you click the annotation, you will see the detailed information about the corresponding commit.
Useful shortcuts
- Commit current changelist Ctrl+K (Cmd-K on a Mac)
- Update the project Ctrl+T (Cmd-T on a Mac)
- Mark selected files and folders as added Ctrl+Alt+A (Alt-Cmd-A on a Mac)
- Mark selected files and folders as changed (checked out) via Ctrl+Alt+E (Alt-Cmd-E on a Mac)
- Show diff (available in the Changes tool window) via Ctrl+D (Cmd-D on a Mac)
- Move changes to another change list (available in the Changes tool window) via F6
- Push commits to remote repositories via Ctrl+Shift+K (Cmd-Shift-K on a Mac)
Commit options
When committing changes, PyCharm lets you perform a variety of operations:
- change the file set to commit to,
- join the changes with the previous commit by using the Amend commit option,
- reformat the changed code,
- optimize imports,
- ensure that there are no inspection warnings,
- update the copyright information,
- or even upload the changes to a remote FTP server.

Ignored files
To configure the ignored files, go to Settings → Version Control, or use the corresponding button in the Changes tool window:

The actual list of ignored files can be displayed in the Changes tool window next to the changelists by clicking the corresponding button.
Branches
With PyCharm you can easily create, switch, merge, compare and delete branches (available for Git and Mercurial only). To see a list of existing branches or create a new one, use either the Branches from the main or context menu, or the VCS operations quick-list, or the widget on the right-hand side of the status bar:

For multiple repositories, PyCharm performs all VCS operations on all branches simultaneously, so you don’t need to switch between them manually.
Shelves, stashes, and patches
Shelves and Stashes help you when you need to put away some local changes without committing them to repository, then switch to the repository version of the files, and then come back to your changes later. The difference between them is that Shelves are handled by PyCharm itself and are stored in the local file system, while Stashes are kept in a VCS repository.
Patches allow you to save a set of changes to a file that can be transferred via email or file sharing and then applied to the code. They are helpful when you’re working remotely without having a constant connection to your VCS repository and still need to contribute:

Log
To see the entire list of commits in a repository, sorted and filtered by branch, user, date, folder, or even a phrase in description, use the Log tab in the Changes tool window. This is the easiest way to find a particular commit, or to just browse through the history:

In this blog post we touched just a tip of the VCS integration iceberg. Go ahead and try this functionality in action! Here’s a tutorial that can walk you through the VCS integration features and provide additional information. And if after that you’re still craving yet more details, please see our online help.
That’s it for today. See you next week!
-Dmitry
March 13, 2015 08:37 PM
I previously posted about a wonderful education program utilizing Raspberry Pis (AstroPi). Here’s another one:
Since last May, Unicef has been using Raspberry Pis to educate Syrian children who have been displaced into Lebanon due to their country’s civil war. The program, called
Pi4Learning was developed by
James Cranwell-Ward, UNICEF Lebanon Innovation Lead, and
Eliane Metni of the International Education Association.
With approximately 300,000 Syrian school children living as refugees in Lebanon with no educational resources, Unicef’s Cranwell-Ward sought an inexpensive, ready-to-go solution that could be implemented in refugee camp environments. Already a Raspberry Pi enthusiast, he paired the device with Alex Eames' KickStarer funded
HDMIPi screens. Working with Eliane Metni, who had been piloting Raspberry Pis at
Dhour El Shweur Public Secondary School in Lebanon, they obtained free Arabic language curriculum from Khan Academy and began providing free classes to the Syrian children.
The Pi4L program is divided into learning tracks: Core Skills Modules for ages 6 – 12 (literacy, numeracy, and science, using Khan Academy content); Technology Applications for ages 5 – 18 (Learning to Code and Coding to Learn); and Continuing Education and Certification for Teachers.
Each complete computer system costs around $100 and the Khan Academy content is stored and can be delivered offline. Currently approximately 30,000 refugees are using the program, and the goal is to continue to expand.
Both Cranwell-Ward and Metni are especially excited that the program teaches kids to code and to become creative participants in an increasingly technological world community. According to Cranwell-Ward,
“The rate at which tech is being rolled out into our lives is phenomenal and coding - or the understanding of technology and how to manipulate it - is going to be a core component of our lives and our children’s lives moving forward… . “There needs to be some basic understanding of what technology is, how it can be manipulated, how we can use it to help ourselves, and not just be a consumer or slave,” quoted from the
The Guardian.
One of the students is 11-year-old Zeinab Al Jusuf. There is a video about her experiences and the Unicef project at
Unicef stories.
There is also a wealth of information online about this project, so if you’re at all interested I urge you to read more. For an excellent overview by Unicef’s Luciano Calestini, see
Innovation.
I would love to hear from readers. Please send feedback, comments, or blog ideas to me at msushi@gnosis.cx.
March 13, 2015 04:02 PM