We had more fun with a vendor today.
We license a vendor’s services for corporate information, like annual revenue and office locations. Their name shall be kept confidential. I’ve written about them before.
About two weeks ago, we noticed a slowdown in our API calls into their system.
We asked them about it, and they replied that they would take a look. A bit later, they said they had found the problem and were working on a solution.
Today, after working on new code, I ran my unit tests. A few tests make calls to this vendor. (Yeah, I could have mocked out the calls. But there are good reasons to not mock out calls in unit tests.) I was surprised to see those tests now fail.
Curiously, they failed because the API calls returned the response, “Customer Disabled”.
I switched to a browser window and tried a part of our product that used their API. I found that our product now failed with the same error. Uh oh.
I e-mailed the vendor and asked what’s up. Their answer:
We found that our service was being slowed down by your API calls. So we disabled your API key.
I am not kidding. Continue reading after you’ve caught your breath.
We went to DEFCON 2. And eventually pieced together this story:
- A little over two weeks ago, we added a new feature that sent this vendor more requests containing Unicode characters. Most likely (I still don’t have all the details) we’re now asking for information for more international companies, which can contain Unicode characters in their names, addresses, or city names.
- We knew about our code change, of course. But since our requests were within our SLA of five queries per second, we didn’t give it a second thought. We didn’t collect statistics on the types of names sent to this vendor, because there wasn’t any reason to.
- But it caused their system to choke. It’s still not clear why. We’re throttled to 5 QPS, and their service is advertised for having foreign company information. Sending them more company names or addresses with Unicode characters should be NBD.
- Whatever the true cause, our calls affected their response time to other customers’ API calls. (!)
- They decided to fix this by disabling our API key. Without telling us.
There were so many unforced errors here that I almost don’t know where to begin.
- Their system is designed such that one customer can significantly affect their other customers. We weren’t doing a DoS attack — we were using their system as documented, and within our QPS limit. This is a system with no headroom or scalability.
- They didn’t know about the performance problem until we told them. So they have no system or application monitoring.
- They concluded Unicode characters were the cause. If this conclusion is wrong, there’s an even more intense failure here. If it’s right, then their system for returning information on international companies chokes on Unicode characters.
- They’ve previously asked us to change our application to reduce their system load. We found these requests odd, but we made the changes; and in all fairness, our application wound up better for the changes anyway. We’ve always been easy to contact, responded quickly, and cooperated successfully with them.
- But they concluded this time that they didn’t want to talk to us.
- So they disabled our API key without warning.
- And they didn’t tell us after the fact, either. We found out only because I saw a problem at our end.
Late today during a concall, their primary technical support person repeatedly tried justifying their actions. He wouldn’t admit that what they did was unprofessional.
We’re all still in a bit of shock over this. I hope we’ll get more details tomorrow.