IP Street’s application runs on Python 2.7. Earlier this week, I evaluated all our Python packages for Python 3 support, as the first step in deciding when to migrate our codebase.
Although this was the time I’ve checked our packages for Python 3 support, I expected Django to be the only one that didn’t officially support it. (Production support is slated for version 1.6, which is now in release-candidate.) But Django is the only project whose development roadmap I closely follow! D’oh! Talk about a blind spot!!
This is why it’s good to sit down and formally check each package. Make a list of every package and check each one…
If you know someone who fits the bill, send them this post!
Title: Senior Developer
Reports to: VP Engineering
About IP Street
Founded in 2009, IP Street develops and markets software to help corporations, law firms, and financial analysts better analyze patent-related information. We make IP data easy to get, use, and understand!
We’re a start-up that’s developed a new way to visualize and data-mine intellectual property. We’re small and scrappy, have an innovative engineering team, and have built the business on awesome products that companies buy!
Our technology stack is almost all open-source, with some nifty esoteric search technologies. Most of your work will be in Python and Django, in a Mac-based development environment, deploying to Linux. Other technologies include Celery, Postgres, Redis, and Solr. Our client-side code relies on Highcharts and Backbone, and supports desktop and mobile users.
This is “small b” big data, with lots of interesting challenges!
- Collaborate with others in product direction, priorities, and features
- Design, implement, and test new product (primarily but not exclusively server-side) features
- Some front-end coding and debugging, as needed
- Make the user experience as powerful, simple, and manifest as possible
- Be positive, flexible, and do what’s needed to move the company forward
- 10+ years experience in server-side development. Web development would be ideal, but it can be any kind of server-side code. We’re looking for expertise in processing pipelines or workflows, software farms, scaling, schema migration, etc. Or you’re a really smart person who loves complex software systems running on servers!
- Significant experience developing in Python or Python-based frameworks, on the order of at least 5 years or so. This must be serious development, not, “I write a 20-line script now and then.”
- Substantial experience in, and understanding of, a web framework such as Django. We’re looking for at least 3 years’ experience. Or if you don’t know Django, you’re eager to learn!
- Pluses: Significant coding experience interacting with (or experience in configuring) PostgreSQL, Solr, or another SQL or full-text search engine.
- Other pluses: Experience in or familiarity with jQuery, Backbone or equivalent technology, or client-side graphing packages. (These won’t be your focus, but the knowledge could come in handy.)
- Enthusiasm about modern approaches to software development, distributed version control, good coding and documentation practices, etc.
- You have excellent judgement in attacking complex tasks, and in balancing “good enough, now” vs. “much better, later”
- You’re self-sufficient when possible, and confident in setting standards
- You’re eager to build a small company into something insanely great!
- Excellent team and communication skills
- Bachelors Degree or equivalent in Computer Science or Software Engineering
Salary is DOE. Please send resume to john @ this-site’s-domain.
Jesse Noller, who works at Rackspace, volunteered to take a look at the underlying problems. He’s an awesome dude.
An update to an earlier post…
I’m replacing pyrax with something else in our system. The authentication errors and oddball failures still occur, and I’ve lost confidence that Rackspace will fix them in any reasonable amount of time. This is extremely frustrating.
Python-cloudfiles was way more stable, even though it wasn’t and still isn’t in active development. Maybe we’ll resume using that.
Thursday, I was irked by a bug.
I had modified a background task so it could import a range of documents from another subsystem into our datastore, instead of only one. Its parameters had included one “document id”, which identified the patent document to import. Now, it could be given that, or two document ids representing a document range.
In one instance, it reported a successful completion yet the desired patents weren’t loaded. What had gone wrong?
Multiple official and de facto formats exist for US patent application and grant document ids. To keep this simple, let’s consider US Design Patents. Their document id is a “D” followed by a number. This looks like “D4432″, or “D902″.
So if you wanted to import a range of Design Patents, you might say, “Import the patents D900 through D4000, inclusive.” “D900″ is the lower bound and “D4000″ is the upper bound. Right?
Not so fast!
>>> "D900" < "D4000"
tl;dr: Think about exceptions when writing a context manager.
I made a huge unforced error with a context manager at work.
We use Redis distributed locks for system synchronization. I wanted a context manager that acquired n locks, executed protected code, and then released the n locks in reverse order. It would be simple to use:
from common.util import Semaphore, distlock
semaphore1 = Semaphore(OwnerDisambiguationUpdate.UPDATE_LOCK)
semaphore2 = Semaphore(USMaintenanceFeeUpdate.UPDATE_LOCK)
with distlock(semaphore1, semaphore2):
(The Semaphore class does other work with aborting Celery tasks, but that’s not germane here. It’s a Redis distributed lock with extra fanciness.)
An update to an earlier post…
We’ve had problems using the pyrax SDK, mostly in account authentication.
First, it wasn’t at all clear when, or under what conditions, we had to re-authenticate our pyrax token. As documented, after you initially authenticate your credentials, pyrax handles all subsequent re-authentication under the covers. I.e., it will automatically re-authenticate the token if it ever expires.
This is kind of odd. I don’t understand why a good token should need re-authentication.
We then discovered that pyrax sometimes can’t re-authenticate our token! Every 19 hours, we hit a period of about five hours when our token won’t automatically authenticate. Why? I still don’t have a clear answer. Some authentication server, somewhere, clearly gets confused. You won’t run into this bug if you don’t have long-running processes. But, we do.
We host IP Street’s SAAS product at Rackspace. We’re finally taking the plunge and upgrading from python-cloudfiles to pyrax. We didn’t have any big issues with python-cloudfiles, but I was tiring of getting the brush-off from Rackspace when we asked for help with an API failure.
The benefits of keeping a technology up-to-date far outweighs the costs, unless you’re in an extreme corner case with a very unreliable vendor. Better performance, bug fixes, better capabilities, better support… all good stuff.
I’ve found some candidates for replacing Celery in my company’s product. (My reasons for replacing it are elucidated here, here, and here.)
I got these from web trawling, blog comments, and some e-mail. At first blush, none of the candidates have any disqualifying attributes, except for lacking subtasks. Celery is the only Python-friendly asynchronous task technology with subtask support, so I’ll need to bend on that if I want any alternatives to consider. (If I’m wrong on this point, please let me know in the comments!)
I’m not saying that these candidates will definitely satisfy all (sans subtasks) of my requirements. Right now they’ve just passed my initial sniff test. The next step will be to read documentation in detail, assess the health/activity of its community and developers, and try some sample code.
I’m ready to start looking at candidates to replace Celery in my company’s product. (The reasons are elucidated here, here, and here.)
Our SaaS product provides data mining and visualization for intellectual property. A 10-second elevator pitch is, it’s as though we attached Microsoft Excel’s chart wizard to US and international patent offices. (“As though” = “We didn’t do that, and in fact we go way beyond that, but I’m giving you a simple description.”) Our code is 100% Django and Python.
I looked at how we use Celery in our codebase. The reality of how we use it is much simpler than our ideas when we started two years ago. Combining our existing features with our product roadmap, I know with high confidence what features we need for our asynchronous tasks. And which ones are nice to have but not required, and which ones we’ll probably never need.
Commenting on my update to my Celery rant, @asksol asked me to post the Pylint results that made me question the claim of backwards compatibility.
(“@Asksol asked” — See what I did there? That’s alliteration. It’s a sign of a quality blog post. Ask for it by name.)
Again for the record, @asksol is a smart and friendly person. I know I wouldn’t last a day supporting a project the way he has supported Celery over multiple years. I’ve calmed down since yesterday, and I hope that something good results from my rant — if not for me, then for a future Celery user needing upgrade help. In his reply to my rant, @asksol describes some history and rationale for how he manages code change, and I encourage you to read it.
Here we go: