We had more fun with a vendor today.
We license a vendor’s services for corporate information, like annual revenue and office locations. Their name shall be kept confidential. I’ve written about them before.
About two weeks ago, we noticed a slowdown in our API calls into their system.
We asked them about it, and they replied that they would take a look. A bit later, they said they had found the problem and were working on a solution.
Today, after working on new code, I ran my unit tests. A few tests make calls to this vendor. (Yeah, I could have mocked out the calls. But there are good reasons to not mock out calls in unit tests.) I was surprised to see those tests now fail.
Curiously, they failed because the API calls returned the response, “Customer Disabled”.
I switched to a browser window and tried a part of our product that used their API. I found that our product now failed with the same error. Uh oh.
I e-mailed the vendor and asked what’s up. Their answer:
We found that our service was being slowed down by your API calls. So we disabled your API key.
I am not kidding. Continue reading after you’ve caught your breath.
We went to DEFCON 2. And eventually pieced together this story:
- A little over two weeks ago, we added a new feature that sent this vendor more requests containing Unicode characters. Most likely (I still don’t have all the details) we’re now asking for information for more international companies, which can contain Unicode characters in their names, addresses, or city names.
- We knew about our code change, of course. But since our requests were within our SLA of five queries per second, we didn’t give it a second thought. We didn’t collect statistics on the types of names sent to this vendor, because there wasn’t any reason to.
- But it caused their system to choke. It’s still not clear why. We’re throttled to 5 QPS, and their service is advertised for having foreign company information. Sending them more company names or addresses with Unicode characters should be NBD.
- Whatever the true cause, our calls affected their response time to other customers’ API calls. (!)
- They decided to fix this by disabling our API key. Without telling us.
There were so many unforced errors here that I almost don’t know where to begin.
- Their system is designed such that one customer can significantly affect their other customers. We weren’t doing a DoS attack — we were using their system as documented, and within our QPS limit. This is a system with no headroom or scalability.
- They didn’t know about the performance problem until we told them. So they have no system or application monitoring.
- They concluded Unicode characters were the cause. If this conclusion is wrong, there’s an even more intense failure here. If it’s right, then their system for returning information on international companies chokes on Unicode characters.
- They’ve previously asked us to change our application to reduce their system load. We found these requests odd, but we made the changes; and in all fairness, our application wound up better for the changes anyway. We’ve always been easy to contact, responded quickly, and cooperated successfully with them.
- But they concluded this time that they didn’t want to talk to us.
- So they disabled our API key without warning.
- And they didn’t tell us after the fact, either. We found out only because I saw a problem at our end.
Late today during a concall, their primary technical support person repeatedly tried justifying their actions. He wouldn’t admit that what they did was unprofessional.
We’re all still in a bit of shock over this. I hope we’ll get more details tomorrow.
3 thoughts on “How to hose a customer with your API”
Wow. I just have to ask, are these people the only vendor? Why the lock in?
These aren’t the only source of this kind of information. Being a start-up, we have limited resources, so we’ll have to prioritize finding alternatives against all the other things we want to do.
But — Our first goal is to return to the status quo. Late yesterday, they turned on our API key, but in return we disabled the part of our system that was the “problem.” Our first priority has to be “fixing” things so we can enable it again.
They promised to send us logs of the requests that choked their system. Can’t wait to see those.
The irony is that you sent them a complaint about the service being too slow. They used the data they were able to gather from your complaint to realize that their services were slow and shut you out. You intended goal was to reduce the response time from 15 sec to 5 sec, but instead they increased it from 15 sec to infinity!