RSS

API Observability News

These are the news items I've curated in my monitoring of the API space that have some relevance to the API observability conversation and I wanted to include in my research. I'm using all of these links to better understand how the space is testing their APIs, going beyond just monitoring and understand the details of each request and response.

Cautiously Aware Of How APIs Can Be Used To Feed Government Privatization Machinations

I’m working to define APIs at the Department of Veterans Affairs (VA) on several fronts right now. I’ve provided not just one, but two detailed reponses to their Lighthouse API platform RFIs. Additionally, I’m providing guidance to different teams who are working on a variety of projects occurring simultaneously to serve veterans who receive support from the agency.

As I do this work, I am hyper aware of the privatization machinations of the current administration, and the potential for APIs to serve these desires. APIs aren’t good, bad, nor are they neutral. APIs reflect the desires of their owners, and the good or bad they can inflict is determined by their consumers. Because APIs are so abstract, and are often buried within web and mobile applications, it can be difficult to see what they are truly doing on a day to day basis. This is why we have API management solutions in place to help paint a picture of activity in real time, but as we’ve seen with the Facebook / Cambridge Analytica, it requires the API provider to be paying attention, and to possess incentives that ensure they will actually care about paying attention, and responding to negative events.

By re-defining existing VA internal systems as APIs, we are opening up the possibility that digital resources can be extracted, and transferred externally. Of course, if these systems remain the sources of data, VA power and control can be retained, but it also opens up the possibilities for power to eventually be transferred externally–reversing the polarities of government power and putting it into private hands. Depending on your politics, this could be good, or it could be bad. Personally, I’m a fan of a balance being struck, allowing government to do what it does best, and allowing private, commercial entities to help fuel and participate in what is happening–both with proper oversight. This reflects the potential benefits APIs can bring, but this good isn’t present by default, and could shift one way or the other at any moment.

Early on, APIs appeared as a promising way to deliver value to external developers. They held the promise of delivering unprecedented resources to developers, thing they would not normally be able to get access to. Facebook promised democratization using it’s social platform. While silently mining and extracting value from it’s application end-users, executed by often unknowing, but sometimes complicit developers. A similar transfer of power has occurred with Google Maps, which promised unprecedented access to maps for developers, as well as web and mobile maps for cities, businesses, and other interests. Over time we’ve seen that Google Maps APIs have become very skilled at extracting value from cities, business, and end-users, again executed unknowingly, and knowingly by developers. This process has been playing out over and over throughout the last decade of the open data evolution occurring across government at all levels.

Facebook and Google have done this at the platform level, extracting value and maintaining control within their operations, but the open data movement, Facebook and Google included, have been efficiently extracting value from government, and giving as little back as it possibly can. Sure, one can argue that government benefits from the resulting applications, but this model doesn’t ensure that the data and content stewards within government retain ownership or control over data, and benefit from revenue generated from the digital resources–leaving them struggling with tight budgets, and rarely improving upon the resources. This situation is what I fear will happen at the Department of Veterans Affairs (VA), allowing privatization supporters to not actually develop applications that benefit veterans, but really focusing on extracting value possessed by the VA, capturing it, and generating revenue from it, without ever giving back to the VA–never being held accountable for actually delivering value to veterans.

I have no way of testing the political motivations of people that I am working with. I also don’t want to slow or stop the potential of APIs being used to help veterans, and ensure their health and insurance information can become more interoperable. I just want to be mindful of this reality, and revisit it often. Openly discussing it as I progress in my work, and as the VA Lighthouse platform evolves. Sometimes I regret all the API evangelism I have done, as I witness the damage they can do. However, the cat is out of the bag, and I’d rather be involved, part of the conversation, and pushing for observability, transparency, and open dialogue about the realities of opening up government digital resources to private entities via APIs. Allowing people who truly care about serving veterans to help police what is going on, maximize the value and control over digital resources into the hands of people who are truly serving veteran’s interests, and minimizing access by those who are just looking to make money at veteran’s expense.


Facebook And Twitter Only Now Beginning To Police Their API Applications

I’ve been reading about all the work Facebook and Twitter have been doing over the last couple of weeks to begin asserting more control over their API applications. I’m not talking about the deprecation of APIs, that is a separate post. I’m focusing on them reviewing applications that have access to their API, and shutting off access to the ones who are’t adding value to the platform and violating the terms of service. Doing the hard work to maintain a level of quality on the platform, which is something they should have been doing all along.

I don’t want to diminish the importance of the work they are doing, but it really is something that should have been done along the way–not just when something goes wrong. This kind of behavior really sets the wrong tone across the API sector, and people tend to focus on the thing that went wrong, rather than the best practices of what you should be doing to maintain quality across API operations. Other API providers will hesitate launching public APIs because they’ll not want to experience the same repercussions as Facebook and Twitter have, completely overlooking the fact that you can have public APIs, and maintain control along the way. Setting the wrong precedent for API providers to emulate, and damaging the overall reputation of operating public APIs.

Facebook and Twitter have both had the tools all along to police the applications using their APIs. The problem is the incentives to do so, and to prioritize these efforts isn’t there, due to an imbalance with their business model, and a lack of diversity in their leadership. When you have a bunch of white dudes with a libertarian ethos pushing a company towards profitability with a advertising driven business model, investing in quality control at the API management layer just isn’t a priority. You want as may applications, users, and activity as you possibly can, and when you don’t see the abuse, harassment, and other illnesses, there really is no problem from your vantage point. That is, until you get called out in the press, or are forced to testify in front of congress. The reasons us white dudes get away with this is that there are no repercussions, we just get to ignore until it becomes a problem, apologize, perform a little bit to show we care, and wait until the next problem occurs.

This is the wrong API model to put out there. API providers need to see the benefits of properly reviewing applications that want access to their APIs, and the value of setting a higher bar for how applications use the API. There should be regular reviews of active APIs, and audits of how they are accessing, storing, and putting resources to work. This isn’t easy work, or inexpensive to do properly. It isn’t something you can put off until you get in trouble. It is something that should be done from the beginning, and conducted regularly, as part of the operations of a well funded team. You can have public APIs for a platform, and avoid privacy, security, and other shit-shows. If you need an example of doing it well, look at Slack, who has a public API that is successful, even with a high level of bot automation, but somehow manages to stay out of the spotlight for doing dumb things. It is because their API management practices are in better alignment with their business model–the incentives are there.

For the next 3-5 years I’m going to have to hear from companies who aren’t doing public APIs, because they don’t want to make the same mistake as Facebook and Twitter. All because Facebook and Twitter have been able to get away with such bad behavior for so long, avoid doing the hard work of managing their API platforms, and receive so much bad press. All in the name of growth and profits at all cost. Now, I’m going to have to write a post every six months showcasing Facebook and Twitter as pioneers for how NOT to run your platforms, explaining the importance of healthy API management practices, and investing in your API teams so they have the resources to do it properly. I’d rather have positive role models to showcase rather than poorly behaved role models who I have to work overtime to change perception and alter API provider’s behavior. As an API community let’s learn from what has happened and invest properly in our API management layers, properly screen and get to know who is building application on our resources, and regularly tune into and audit their behavior. Yes, it takes more investment, time, and resources, but in the end we’ll all be better off for it.


Government Has Benefitted From Lack Of Oversight At The Social API Management Layers As Well

There are many actors who have benefitted from Facebook not properly management their API platform. Collecting so many data points, tracking so many users, and looking the other way as 3rd party developers put them to use in a variety of applications. Facebook did what is expected of them, and focused on generating advertising revenue from allowing their customers to target the many data points that are generated by the web and mobile applications developed on top of their platform.

I hear a lot of complaining about this being just about democrats being upset that it was the republicans who were doing this, when in reality there are so many camps that are complicit in this game, if you are focusing in on any single camp you are probably looking to shift attention and blame from your own camp, in my opinion. I’m all for calling everyone out, even ourselves for being so fucking complicit in this game where we allow ourselves to be turned into a surveillance marketplace. Something that benefits the Mark Zuckerbergs, Cambridge Analyticas, and the surveillance capitalist platform, but also providing a pipeline to government, and law enforcement.

If government is going to jump on the bandwagon calling for change at Facebook, they need to acknowledge that they’ve been benefitting from the data free for all that is the Facebook API. We’ve heard stories about 3rd party developers building applications for law enforcement to target Black Lives Matters, all the way up to Peter Thiel’s (a Facebook Investor) Palantir who actively works with NSA, and other law enforcement. If we are going to shine a light on what is happening on Facebook, let’s make it a spotlight and call out ALL of the actors who have been sucking user’s data out of the API, and better understand what they have been doing. I’m all for a complete audit of how the RNC, DNC, and EVERY other election, policing, and government entity has been doing with social data they have access to.

Let’s bring all of the activity out of the shadows. Let’s take EVERY Facebook application and publish as part of a public database, with an API for making sense of who is behind each one. Let’s require every application share the name’s of individuals and business interests behind their applications, as well as their data retention, storage, and sharing practices. Let’s make the application database observable by the public, journalists, and researchers. If we are going to understand what is going on with our data, we need to turn the lights on in the application warehouse that is built on the Facebook API. Let’s not just point the finger at one group, let’s shine a spotlight on EVERYONE, and make sure we are actually being honest at this point in time, or we are going to see this same scenario play out every fiver years or so.


Facebook, Cambridge Analytica, And Knowing What API Consumers Are Doing With Our Data

I’m processing the recent announcement by Facebook to shut off the access of Cambridge Analytica to it’s valuable social data. The story emphasizes the importance of real time awareness and response to API consumers at the API management level, as well as the difficulty in ensuring that API consumers are doing what they should be with the data and content being made available via APIs. Access to platforms using APIs is more art than science, but there are some proven ways to help mitigate serious abuses, and identify the bad actors early on, and prevent their operation within the community.

While I applaud Facebook’s response, I’m guessing they could have taken more action earlier on. Their response is more about damage control to their reputation, after the fact, than it is about preventing the problem from happening. Facebook most likely had plenty of warning signs regarding what Aleksandr Kogan, Strategic Communication Laboratories (SCL), including their political data analytics firm, Cambridge Analytica, were up to. If they weren’t than that is a problem in itself, and Facebook should be investing in more policing of their API consumers activity, as they claim they are doing in their release.

If Aleksandr Kogan has that many OAuth tokens for Facebook users, then Facebook should be up in his business, better understanding what he is doing, where his money comes from, and who is partners are. I’m guessing Facebook probably had more knowledge, but because it drove traffic, generated ad revenue, and was in alignment with their business model, it wasn’t a problem. They were willing to look the other way with the data sharing that was occurring, until it became a wider problem for the election, our democracy, and in the press. Facebook should have more awareness, oversight, and enforcement at the API management layer of their platform.

This situation I think highlights another problem of doing APIs, and ensuring API consumers are behaving appropriately with the data, content, and algorithms they are accessing. It can be tough to police what a developer does with data once they’ve pulled from an API. Where they store it, and who they share it with. You just can’t trust that all developers will have the platform, and it’s end-users best interest in mind. Once the data has left the nest, you really don’t have much control over what happens with it. There are ways you can identify unhealthy patterns of consumption via the API management layer, but Aleksandr Kogan’s quizzes probably would appear as a normal application pattern, with no clear signs of the relationships, and data sharing going on behind the scenes.

While I sympathize with Facebook’s struggle to police what people do with their data, I also know they haven’t invested in API management as much as they should have, and they are more than willing to overlook bad behavior when it supports their bottom line. The culture of the tech space supports and incentivizes this type of bad behavior from platforms, as well as consumers like Cambridge Analytica. This is something that regulations like GDPR out of the EU is looking to correct, but the culture in the United States is all about exploitation at this level, that is until it becomes front page news, then of course you act concerned, and begin acting accordingly. The app, big data, and API economy runs on the generating, consuming, buying, and selling of people’s data, and this type of practice isn’t going to go away anytime soon.

As Facebook states, they are taking measures to reign in bad actors in their developer community by being more strict in their application review process. I agree, a healthy application review process is an important aspect of API management. However, this does not address the regular review of applications usage at the API management level, assessing their consumption as they accumulate access tokens, to more user’s data, and go viral. I’d like to have more visibility into how Facebook will be regularly reviewing, assessing, and auditing applications. I’d even go so far as requiring more observability into ALL applications that are using the Facebook API, providing a community directory that will encourage transparency around what people are building. I know that sounds crazy from a platform perspective, but it isn’t, and would actually force Facebook to know their customers.

If platforms truly want to address this problem they will embrace more observability around what is happening in their API communities. They would allow certified and verified researchers and auditors to get at application level consumption data available at the API management layer. I’m sorry y’all, self-regulation isn’t going to cut it here. We need independent 3rd party access at the platform API level to better understand what is happening, otherwise we’ll only see platform action after problems occur, and when major news stories are published. This is the beauty / ugliness of APIs. The cats out of the bag, and platforms need them to innovate, deliver resources to web, mobile, and device applications, as well as remain competitive. APIs also provide the opportunity to peek behind the curtain, and better understand what is happening, and profile the good and the bad actors within each ecosystem–let’s take advantage of the good here, to help regulate the bad.


Facebook Cambridge Analytica And Knowing What Api Consumers Are Doing With Our Data

404: Not Found


Insecurity Around Providing Algorithmic Transparency And Observability Using APIs

I’m working on a ranking API for my partner Streamdata.io to help quantify the efficiencies they bring to the table when you proxy an existing JSON web API using their service. I’m evolving an algorithm they have been using for a while, wrapping it in a new API, and applying it across the APIs I’m profiling as part of my API Stack, and the Streamdata.io API Gallery work. I can pass the ranking API any OpenAPI definition, and it will poll and stream the API for 24 hours, and return a set of scores regarding how real time the API is, and what the efficiency gains are when you use Streamdata.io as a proxy for the API.

As I do this work, I find myself thinking more deeply about the role that APIs can play in helping make algorithms more transparent, observable, and accountable. My API ranking algorithm is pretty crude, but honestly it isn’t much different than many other algorithms I’ve seen companies defend as intellectual property and their secret sauce. Streamdata.io is invested in the ranking algorithm and API being as transparent as possible, so that isn’t a problem here, but each step of the process allows me to think through how I can continue to evangelize other algorithm owners to use APIs, to make their algorithms more observable and accountable.

In my experience, most of the concerns around keeping algorithms secret stem from individual insecurities, and nothing actually technical, mathematical, or proprietary. The reasons for the insecurities are usually that the algorithm isn’t that mathematically sophisticated (I know mine isn’t), or maybe it is pretty flawed (I know mine is currently), and people just aren’t equipped to admit this (I know I am). I’ve worked for companies who venomously defend their algorithms and refuse to open them up, because in the end they know they aren’t defensible on many levels. The only value the algorithm possesses in these scenarios is secrecy, and the perception that there is magic going on behind the scenes. When in reality, it is a flawed, crude, simple algorithm that could actually be improved upon if it was opened up.

I’m not insecure about my lack of mathematical skills, or the limitations of my algorithm. I want people to point out its flaws, and improve upon my math. I want the limitations of the algorithm to be point out. I want API providers and consumers to use the algorithm via the API (when I publish) to validate, or challenge the algorithmic assumptions being put forth. I’m not in the business of selling smoke and mirrors, or voodoo algorithmics. I’m in the business of helping people understand how inefficient their API responses are, and how they can possibly improve upon them. I’m looking to develop my own understanding of how can make APIs more event-driven, real time, and responsive. I’m not insecure about providing transparency and observability around the algorithms I develop, using APIs–all algorithm developers should be as open and confident in their own work.


More Outputs Are Better When It Comes To Establishing An API Observability Ranking

I’ve been evolving an observability ranking for the APIs I track on for a couple years now. I’ve bene using the phrase to describe my API profiling and measurement approach since I first learned about the concept from Stripe. There are many perspectives floating around the space about what observability means in the context of technology, however mine is focused completely on APIs, and is more about communicating with external stakeholders, more than it is just about monitoring of systems. To recap, the Wikipedia definition for observability is:

Formally, a system is said to be observable if, for any possible sequence of state and control vectors, the current state can be determined in finite time using only the outputs (this definition is slanted towards the state space representation). Less formally, this means that from the system’s outputs it is possible to determine the behavior of the entire system. If a system is not observable, this means the current values of some of its states cannot be determined through output sensors.

Most of the conversations occurring in the tech sector are focused on monitoring operations, and while this is a component of my definition, I lean more heavily on the observing occurring beyond just internal groups, and observability being about helping keep partners, consumers, regulators, and other stakeholders be more aware regarding how complex systems work, or do not work. I feel that observability is critical to the future of algorithms, and making sense of how technology is impacting our world, and APIs will play a critical role in ensuring that the platforms have the external outputs required for delivering meaningful observability.

When it comes to quantifying the observability of platforms and algorithms, the more outputs available the better. Everything should have APIs for determining the inputs and outputs of any algorithm, or other system, but there should also be operational level APIs that give insight into the underlying compute, storage, logging, DNS, and other layers of delivering technological solutions. There should also be higher level business layer APIs surrounding communication via blog RSS, Twitter feeds, and support channels like email, ticketing, and other systems. The more outputs around platform operations there are, the more we can measure, quantify, and determine how observable a platform is using the outputs that already exist for ALL the systems in use across operations.

To truly measure the observability of a platform I need to be able to measure the technology, business, and politics surrounding its operation. If communication and support exist in the shadows, a platform is not observable, even if there are direct platform APIs. If you can’t get at the operational layer like logging, or possibly Github repositories used as part of continuous integration or deployment pipelines, observability is diminished. Of course, not all of these outputs should all be publicly available by default, but in many cases, there really is no reason they can’t. At a minimum there should be API access, with some sort of API management layer in place, allowing for 3rd party auditors, and analysts like me to get at some, or all of the existing outputs, allowing us to weigh in on an overall platform observability workflow.

As I continue to develop my API observability ranking algorithm, the first set of values I calculate are the number of existing outputs an API has. Taking into consideration the scope of core and operational APIs, but also whether I can get at operations via Twitter, Github, LinkedIn, and other 3rd party APIs. I find many of these channels more valuable for understanding platform operations, than the direct APIs themselves. Chatter by employees, and commits via Github can provide more telling signals about what is happening, than the intentional signals emitted directly by the platform itself. Overall, the more outputs available the better. API observability is all about leveraging existing outputs, and when companies, organizations, institutions, and government agencies are using solutions that have existing APIs, they are more observable by default, which can have a pretty significant impact in helping us understand the impact a technological solution is having on everyone involved.


APIs And Other Ways Of Serving Up Machine Learning Models

As with most areas of the tech sector, behind the hype there are real world things going on, and machine learning is one area I’ve been studying, learning, and playing withd what is actually possible when it comes to APIs. I’ve been studying the approach across each of the major cloud platforms, including AWS, Azure, and Google t push forward my understanding of the ML landscape. Recently the Google Tensorflow team released an interesting overview of how they are serving up Tensorflow models, making machine learning accessible across a wide variety of use cases. Not all of these are API specific, but I do think they are should be considered equally as part of the wider machine learning (ML) application programming interface (API) delivery approach.

Over the past year and half, with the help of our users and partners inside and outside of Google, TensorFlow Serving has advanced performance, best practices, and standards for ML delivery:

  • Out-of-the-box Optimized Serving and Customizability - We now offer a pre-built canonical serving binary, optimized for modern CPUs with AVX, so developers don’t need to assemble their own binary from our libraries unless they have exotic needs. At the same time, we added a registry-based framework, allowing our libraries to be used for custom (or even non-TensorFlow) serving scenarios.
  • Multi-model Serving - Going from one model to multiple concurrently-served models presents several performance obstacles. We serve multiple models smoothly by (1) loading in isolated thread pools to avoid incurring latency spikes on other models taking traffic; (2) accelerating initial loading of all models in parallel upon server start-up; (3) multi-model batch interleaving to multiplex hardware accelerators (GPUs/TPUs).
  • Standardized Model Format - We added SavedModel to TensorFlow 1.0, giving the community a single standard model format that works across training and serving.
  • Easy-to-use Inference APIs - We released easy-to-use APIs for common inference tasks (classification, regression) that we know work for a wide swathe of our applications. To support more advanced use-cases we support a lower-level tensor-based API (predict) and a new multi-inference API that enables multi-task modeling.

I like that easy-to-use APIs are on the list. I predict that ease of use, and a sensible business model will provide to be just as important as the ML model itself. Beyond these core approaches, the Tensorflow team is also exploring two experimental areas of ML delivery:

  • Granular Batching - A key technique we employ to achieve high throughput on specialized hardware (GPUs and TPUs) is “batching”: processing multiple examples jointly for efficiency. We are developing technology and best practices to improve batching to: (a) enable batching to target just the GPU/TPU portion of the computation, for maximum efficiency; (b) enable batching within recursive neural networks, used to process sequence data e.g. text and event sequences. We are experimenting with batching arbitrary sub-graphs using the Batch/Unbatch op pair.
  • Distributed Model Serving - We are looking at model sharding techniques as a means of handling models that are too large to fit on one server node or sharing sub-models in a memory-efficient way. We recently launched a 1TB+ model in production with good results, and hope to open-source this capability soon.

I’m working to develop my own ML models, and actively using a variety of them available over at Algorithmia. I’m pushing forward a project to help me quantify and categorize ML solutions that I come across, and when they are delivered using an API, I am generating an OpenAPI, and applying a set of common tags to help me quantify, categorize, and index all the ML APIs. While I am focused on the technology, business, and politics of ML APIs, I’m interested in understanding how ML models are being delivered, so that I can help keep pace with exactly how APIs can augment these evolving practices, making ML more accessible, as well as observable.

We are in the middle of the algorithmic evolution of the API space, where we move beyond just data and content APIs, and where ML, AI, and other algorithmic voodoo is becoming more mainstream. I feel pretty strongly that APIs are an import piece of this puzzle, helping us quantify and understand what ML algorithms do, or don’t do. No matter how the delivery model for ML evolves, batches, distributes, or any other mutation, there should be an API layer for accessing, auditing, and shining a light on what is going on. This is why I regularly keep an eye on what teams like Tensorflow are up to, so that I can speak intelligently to what is happening on this evolving ML front.


Hiding APIs In Plain Sight

I’m always surprised by how secretive folks are. I know that it is hard for many folks to be as transparent as I am with my work, but if you are doing public APIs, I have a basic level of expectation that you are going to be willing to talk and share stories publicly. I regularly have conversations with enterprise folks who are unwilling to talk about what they are doing on the record, or allow me to share stories about their PUBLIC API EFFORTS!!! I get the super secret internal stuff. I’ve worked for the government. I don’t have a problem keeping things private when they should be, but the secretive nature of companies around public API efforts continues to keep me shaking my head.

People within enterprise groups almost seem paranoid when it comes to people keeping an eye on what they are up to. I don’t doubt their competitors keep an eye on what they are doing, but thinking that people are watching every move, everything that is published, and will be able to understand what is going on, and be able to connect the dots is borderline schizophrenic. I publish almost everything I do public by default on Github repositories, and my readers, clients, and other folks still have trouble finding what I am doing. You can Google API Evangelist + any API topic and find what I’m working on each day, or you can use the Github search to look across my repositories, and I still have an inbox and social messaging full of requests for information.

My public by default stance has done amazing things for my search engine and social media presence. I don’t even have to optimize things, and I come up for almost every search. I get regular waves of connections from folks on topics ranging from student art to the government, because of my work. The schema, API definitions, documentation, tests, and stories for any API I am working on is public from day one of the life cycle. Until I bring it together into a single landing page, or public URL, the chances someone will stumble across it, and leverage my work before I’m ready for it to be public is next to zero. The upside is that when I’m ready to go public with a project, by the time I hit publish, and make available via the primary channels for my network, things are well indexed, and easily found via the usual online channels.

I feel like much of the competitive edge that enterprise groups enjoy is simply because they are secretive. There really isn’t a thing there. No secret sauce. Just secret. I find that secret tends to hide nothing, or at least is hiding incompetency, or shady behavior. I was talking with a big data group the other day that was looking for a contractor that was skilled in their specific approach to doing APIs. I asked for a link to their public APIs, so I could assess this specific approach, and they declined, stating that things private. Ok, so you want me to help find someone who knows about your API data thing, is well versed in your API data thing, but I can’t find this API data thing publicly, and neither can anyone else? I’m sorry, this just doesn’t make sense. How can your API data thing ever really be a thing, if nobody knows about it? It just all seems silly to me.

Your API, it’s documentation, and other resources can be public, without the access to your API being public. Even if someone can mimic your interface, they still don’t have all your data, content, and algorithmic solutions. You can vet every consumer of your API, and monitor what they are doing, this is API management 101. You can still protect your valuable digital assets while making them available for discovery, and consideration by potential consumers. You can make controlled sandbox environments available for talent acquisition, building of prototypes, crafting visualizations, doing analysis, and other ways businesses benefit from doing APIs. My advice to companies, institutions, organizations, and government agencies looking to be successful with APIs is stop being so secretive, and start hiding everything you are doing with your public APIs out in plain sight.


Algorithmic Observability Should Work Like Machine Readable Food Labels

I’ve been doing a lot of thinking about algorithmic transparency, as well as a more evolved version of it I’ve labeled as algorithmic observability. Many algorithmic developers feel their algorithms should remain black boxes, usually due to intellectual property concerns, but in reality the reasons will vary. My stance is that algorithms should be open source, or at the very least have some mechanisms for auditing, assessing, and verifying that algorithms are doing what they promise, and that algorithms aren’t doing harm behind the scenes.

This is a concept I know algorithm owners and creators will resist, but algorithms observability should work like food labels, but work in a more machine readable way, allowing them to be validated by other external (or internal) systems. Similar to food you buy in the store, you shouldn’t have to give away the whole recipe and secret sauce behind your algorithm, but there should be all the relevant data points, inputs, outputs, and other “ingredients” or “nutrients” that go into the resulting algorithm. I talked about algorithm attribution before, and I think there should be some sort of algorithmic observability manifest, which provides the “label” for an algorithm in a machine readable format. It should give all the relevant sources, attribution, as well as input and outputs for an algorithm–with different schema for different industries.

In addition to there being an algorithmic observability “label” available for all algorithms, there should be live, or at least virtualized, sandboxed instances of the algorithm for verification, and auditing of what is provided on the label. As we saw with the Volkswagen emissions scandal, algorithm owners could cheat, but it would provide an important next step for helping us understand the performance, or lack of performance when it comes to the algorithms we are depending on. Why I call this algorithmic observability, instead of algorithmic transparency, is each algorithm should be observable using it’s existing inputs and outputs (API), and not just be a “window” you can look through. It should be machine readable, and audit-able by other systems in real time, and at scale. Going beyond just being able to see into the black box, but also be able to assess, and audit what is occurring in real time.

Algorithmic observability regulations would work similar to what we see with food and drugs, where if you make claims about your algorithms, they should have to stand up to scrutiny. Meaning there should be standardized algorithmic observability controls for government regulators, industry analysts, and even the media to step up and assess whether or not an algorithm lives up to the hype, or is doing some shady things behind the scenes. Ideally this would be something that technology companies would do on their own, but based upon my current understanding of the landscape, I’m guessing that is highly unlikely, and will be something that has to get mandated by government in a variety of critical industries incrementally. If algorithms are going to impacting our financial markets, elections, legal systems, education, healthcare, and other essential industries, we are going to have to begin the hard work of establishing some sort of framework to ensure they are doing what is being sold, and not hurting or exploiting people along the way.


Learning About API Governance From Capital One DevExchange

I am still working through my notes from a recent visit to Capital One, where I spent time talking with Matthew Reinbold (@libel_vox) about their API governance strategy. I was given a walk through their approach to defining API standards across groups, as well as how they incentivize, encourage, and even measure what is happening. I’m still processing my notes from our talk, and waiting to see Matt publish more on his work, before I publish too many details, but I think it is worth looking at from a high level view, setting the bar for other API governance conversations I am engaging in.

First, what is API governance. I personally know that many of my readers have a lot of misconceptions about what it is, and what it isn’t. I’m not interesting in defining a single definition of API governance. I am hoping to help define it so that you can find it a version of it that you can apply across your API operations. API governance is at its simplest form, about ensuring consistency in how you do API across your development groups, and a more robust definition might be about having an individual or team dedicated to establishing organization-wide API standards, helping train, educate, enforce, and in the case of capital one, measure their success.

Before you can begin thinking about API governance, you need to start establishing what your API standards are. In my experience this usually begins with API design, but should also quickly also be about consistent, API deployment, management, monitoring, testing, SDKs, clients, and every other stop along the API lifecycle. Without well-defined, and properly socialized API standards, you won’t be able to establish any sort of API governance that has any sort of impact. I know this sounds simple, but I know more API providers who do not have any kind of API design, or other guide for their operations, than I know API providers who have consistent guides to design, and other stops along their API lifecycle.

Many API providers are still learning about what consistent API design, deployment, and management looks like. In the API industry we need to figure out how to help folks begin establishing organizational-wide API design guides, and get them on the road towards being able to establish an API governance program–it is something we suck at currently. Once API design, then deployment and management practices get defined we can begin to realize some standard approaches to monitoring, testing, and measuring how effective API operations are. This is where organizations will begin to see the benefits of doing API governance, and it not just being a pipe dream. Something you can’t ever realize if you don’t start with the basics like establishing an API design guide for your group. Do you have an API design guide for your group?

While talking with Matt about their approach at Capital One, he asked if it was comparable to what else I’ve seen out there. I had to be honest. I’ve never come across someone who had established API design, deployment, and management practices. Were actively educating and training their staff. Then actually measuring the impact and performance of APIs, and the teams behind them. I know there are companies who are doing this, but since I tend to talk to more companies who are just getting started on their API journey, I’m not seeing anyone organization who is this advanced. Most companies I know do not even have an API design guide, let alone measuring the success of their API governance program. It is something I know a handful of companies would like to strive towards, but at the moment API governance is more talk than it is ever a reality.

If you are talking API governance at your organization, I’d love to learn more about what you are up to. No matter where you are at in your journey. I’m going to be mapping out what I’ve learned from Matt, and compare with I’ve learned from other organizations. I will be publishing it all as stories here on API Evangelist, but will also look to publish a guide and white papers on the subject, as I learn more. I’ve worked with some universities, government agencies, as well as companies on their API governance strategies. API governance is something that I know many API providers are working on, but Capital One was definitely the furthest along in their journey that I have come across to date. I’m stoked that they are willing to share their story, and don’t see it as their secret sauce, as it is something that doesn’t just need sharing, it is something we need leaders to step up and show everyone else how it can be done.


API Providers Should Provide Observability Into Government Developer Accounts

I’ve talked about this before, but after reading several articles recently about various federal government agencies collecting, and using social media accounts for surveillance lately, it is a drum I will be beating a lot more regularly. Along with the transparency reports we are beginning to see emerge from the largest platform providers, I’d like to start seeing more observability regarding which accounts, both user and developer are out of government agencies. Some platforms are good at highlighting how government of all shapes and sizes are using their platform, and some government agencies are good at showcasing their social media usage, but I’d like to understand this from purely an API developer account perspective.

I’d like to see more observability into which government agencies are requesting API keys. Maybe not specific agencies ad groups, and account details, although that would be a good idea as well down the road. I am just looking for some breakdown of how many developer accounts on a platform are government and law enforcement. What does their API consumption look like? If there is Oauth via a platform, is there any bypassing of the usual authentication flows to get at data, any differently than regular developers would be accessing, or requiring user approval? From what I am hearing, I’m guessing that there are more government accounts out there than platforms either realize, or are willing to admit. It seems like now is a good time to start asking these questions.

I would add on another layer to this. If an application developer is developing applications on behalf of law enforcement, or as part of a project for a government agency, there should be some sort of disclosure at this level as well. I know I’m asking a lot, and a number of people will call me crazy, but with everything going on these days, I’m feeling like we need a little more disclosure regarding how government(s) are using our platforms, as well as their contractors. The transparency disclosure that platforms have been engaging is a good start to the legal side of this conversation, but I’m looking for the darker, more lower level surveillance that I know is going on behind the scenes. The data gathering on U.S. citizens that doesn’t necessarily violate any particular law, because this is such new territory, and the platform terms of service might sanction it in some loopholy kind of way.

This isn’t just a U.S. government type of thing. I want this to be standard practice for all forms of government on the platforms we use. A sort of UN level, General Data Protection Regulation (GDPR). Which reminds me. I am woefully behind on what GDPR outlines, and how the rolling out of it is going. Ok, I’ll quick ranting now, and get back to work. Overall, we are going to have to open up some serious observability into how the online platforms we are depending are being accessed and use by the government, both on the legal side of things, as well as just general usage. Seems like the default on the general usage should always be full disclosure, but I’m guessing it isn’t a conversation anyone is having yet, which is why I bring up. Now we are having it. Thanks.


Clearly Designate API Bot Automation Accounts

I’m continuing my research into bot platform observability, and how API platforms are handling (or not handling) bot automation on their platforms, as I try to make sense of each wave of the bot invasion on the shores of the API sector. It is pretty clear that Twitter and Facebook aren’t that interested in taming automation on their platforms, unless there is more pressure applied to them externally. I’m looking to make sure there is a wealth of ideas, materials, and examples of how any API driven platform can (are) control bot automation on their platform, as the pressure from lawmakers, and the public intensifies.

Requiring users clearly identify automation accounts is a good first place to start. Establishing a clear designation for bot users has its precedents, and requiring developers to provide an image, description, and some clear check or flag that identifies an account as automated just makes sense. Providing a clear definition of what a bot is, with articulate rules for what bots should and shouldn’t be doing is next up on the checklist for API platforms. Sure, not all users will abide by this, but it is pretty easy to identify automated traffic versus human behavior, and having a clear separation allows accounts to automatically turned off when they fit a particular fingerprint, until a user can pass a sort of platform Turing test, or provide some sort of human identification.

Automation on API platforms has its uses. However, unmanaged automation via APIs has proven to be a problem. Platforms need to step up and manage this problem, or the government eventually will. Then it will become yet another burdensome regulation on business, and there will be nobody to blame except for the bad actors in the space (cough Twitter & Facebook, cough, cough). Platforms tend to not see it as a problem because they aren’t the targets of harassment, and it tends to boost their metrics and bottom line when it comes to advertising and eyeballs. Platforms like Slack have a different business model, which dictates more control, otherwise it will run their paying customers off. The technology, and practices already exist for how to manage bot automation on API platforms effectively, they just aren’t ubiquitous because of the variances in how platforms generate their revenue.

I am going to continue to put together a bot governance package based upon my bot API research. Regardless of the business model in place, ALL API platforms should have a public bot automation strategy in place, with clear guidelines for what is acceptable, and what is not. I’m looking to provide API platforms with a one-stop source for guidance on this journey. It isn’t rocket science, and it isn’t something that will take a huge investment if approached early on. Once it gets out of control, and you have congress crawling up your platforms ass on this stuff then it probably is going to get more expensive, and also bring down regulatory hammer on everyone else. So, as API platform operators let’s be proactive and take the problem of bot automation on directly, and learn from Twitter and Facebook’s pain.

Photo Credit: ShieldSquare


Temporal Logic of Actions For APIs

I’m evolving forward my thoughts on algorithmic observability and transparency using APIs, and I was recently introduced to TLA+, or the Temporal Logic of Actions. It is the closest I’ve come to what I’m seeing in my head when I think about how we can provide observability into algorithms through existing external outputs (APIs). As I do with all my work here on API I want to process TLA+ as part of my API research, and see how I can layer it in with what I already know.

TLA+ is a formal specification language developed by Leslie Lamport, which can be used to design, model, document, and verify concurrent systems. It has been described as exhaustively-testable pseudocode which can provide a blueprint for software systems. In the context of design and documentation, TLA+ can be viewed as informal technical specifications. However, since TLA+ specifications are written in a formal language of logic and mathematics it can be used to uncover design flaws before system implementation is underway, and are amenable to model checking for finding all possible system behaviours up to some number of execution steps, and examines them for violations. TLA+ specifications use basic set theory to define safety (bad things won’t happen) and temporal logic to define liveness (good things eventually happen).

TLA+ specifications are organized into modules.Although the TLA+ standard is specified in typeset mathematical symbols, existing TLA+ tools use symbol definitions in ASCII, using several terms which require further definition:

  • State - an assignment of values to variables
  • Behaviour - a sequence of states
  • Step - a pair of successive states in a behavior
  • Stuttering Step - a step during which variables are unchanged
  • Next-State Rlation - a relation describing how variables can change in any step
  • State Function - an expression containing variables and constants that is not a next-state relation
  • State Predicate - a Boolean-valued state function
  • Invariant - a state predicate true in all reachable states
  • Temporal Formula - an expression containing statements in temporal logic

TLA+ is concerned with defining the correct system behavior, providing with a set of operators for working through what is going on, as well as working with data structures. There is tooling that has been developed to support TLA+ including an IDE, model checker, and proof system. It is all still substantially over my head, but I get what is going on enough to warrant moving forward, and hopefully absorbing more on the subject. As with most languages and specifications I come across it will just take some playing with, and absorbing the concepts at play, before things will come into focus.

I’m going to pick up some of my previous work around behavior driven assertions, and how assertions can be though of in terms of the business contracts APIs put forward, and see where TLA+ fits in. It’s all still fuzzy, but API assertions and TLA+ feels like where I want to go with this. I’m thinking about how we can wrap algorithms in APIs, write assertions for them, and validate across the entire surface area of an algorithm, or stack of API exposed algorithms using TLA+. Maybe I’m barking up the wrong tree, but if nothing else it will get me thinking more about this side of my API research, which will push forward my thoughts on algorithmic transparency, and audit-able observability.


I Do Not Fear AI, I Fear The People Doing AI

There is a lot of FUD out there when it comes to artificial intelligence (AI) and machine learning (ML). The tech press enjoy yanking people’s chain when it comes to the dangers of artificial intelligence. AI is coming for your jobs. AI is racist, sexist, and biased. AI will be lead to World War III. AI will secure and protect us from the bad out there. AI will be the source of all of our worries, and the solution to all of our worries. I’m interested in the storytelling around all of this, and I’m fascinated by the distracting quality of technology when it comes to absolving the humans behind of doing bad things.

We have the technology to make this black boxes more observability and accountable. The algorithms feeding us news, judging us in courtrooms, and deciding if we are insurable or a risk, can all be wrapped with APIs, and made more accountable. However, there are many human reasons why we don’t do this. Every AI out there can be held accountable, it isn’t rocket science. The technology exists to keep AI from hurting us, judging us, and impacting our lives in negative ways. However, it is the people behind who do not want it, otherwise their stories won’t work. Their stories won’t have the desired effect and control over our lives.

APIs are the layer being wielded for good and for bad on the Internet. Facebook, Twitter, and Reddit, all leverage APIs to be available on our mobile phones. APIs are how people automate, advertise, and fund their activities on their platforms. APIs are how AI and ML are being exposed, wielded, and leveraged. The technology is already there to make them more accountable, we just don’t have the human will to use the technology we have. There is more money to be made in telling wild stories about what is possible. Telling stories that make folks afraid, and in awe of what is possible with technology. APIs are used to tell you the stories, while also making the fire shoot from the stage, and the smoke and the mirrors operate, instead of helping us see, understand, and verify what is going on behind the scenes.

We rarely discuss the fact that AI isn’t coming for our jobs. It is the people behind the AI, at the companies developing, deploying, and operating AI that are coming for our jobs. AI, like APIs, are neither good, nor bad, nor neutral–they are a tool. They are technology, and anything they do is because of us humans. I don’t fear AI. I only fear the people doing AI. The people who tell the stories. The people who are believers. I don’t fear technology because I know we have the tools to do what is right, and hold the people who are using technology in bad ways accountable. I’m afraid because we don’t seem to have the will to look behind the curtain. We hold up many of the people telling stories about AI as visionaries, leaders, and truth tellers. I don’t fear AI, I only fear its followers.


An OpenAPI Contract For The Freedom Of Information

Today’s stories are all based around my preparation for providing some feedback on the next edition of the FOIA.gov API. I have a call with the project team, and want to provide ongoing feedback, so I am loading the project up in my brain, and doing some writing on the topic. The first thing that I do when getting to know any API project, now matter where it is at in it’s lifecycle, is craft an OpenAPI, which will act as a central contract for discussions. Plus, there is no better way, short of integration, to get to know an API than crafting a complete (enough) OpenAPI definition.

After looking through the FOIA recommendations for the project, I took the draft FOIA API specification and crafted this OpenAPI definition:

The specification is just for a single path, that allows you to POST a FOIA request. I made sure I thought through the supporting schema that gets posted, flushing out using the definitions (JSON schema) portion of the OpenAPI. This helps me see all the moving parts, and connect the dots between the API request and response, complete with definition for three HTTP status codes (200,404,500)–just the basics. Now I can see the technical details of a FOIA request in my head, preparing me for my discussion with the project owners.

After loading the technical details in my head, I always like to step back and think about the business, political, and ultimately human aspects of this. This is a Freedom of Information Act (FOIA) API, being used by U.S. citizens to request that information within the federal government be freed. That is pretty significant, and represents why I do API Evangelist. I enjoy helping ensure APIs like this exist, are usable, and become a reality. It is interesting to think of the importance of this OpenAPI contract, and the potential it will have to make information in government more accessible. Providing a potential blueprint that can be used by all federal agencies, establishing a common interface for how the public can engage with government when it comes to holding it more accountable.


My Focus On Public APIs Also Applies Internally

A regular thing I hear from folks when we are having conversations about the API lifecycle, is that I focus on public APIs, and they are more interested in private APIs. Each time I hear this I try to take time and assess which parts of my public API research wouldn’t apply to internal APIs. You wouldn’t publish your APIs to pubic API search engines like APIs.io or ProgrammableWeb, and maybe not evangelizing your APIs at hackathons, but I’d say 90% of what I study is applicable to internal APIs, as well as publicly available APIs.

With internal APIs, or private network partner APIs you still need a portal, documentation, SDKs, support mechanisms, and communication and feedback loops. Sure, how you use the common building blocks of API operations that I track on will vary between private and public APIs, but this shifts from industry to industry, and API to API as well–it isn’t just a public vs. private thing. I would say that 75% of my API industry research is derived from public API operations–it is just easier to access, and honestly more interesting to cover than private ones. The other 25% of internal API conversation I’m having, always benefit from thinking through the common API building blocks of public APIs, looking for ways they can be more successful with internal and partner APIs.

I’d say that a significant part of the mindshare for the microservices philosophy is internally focused. I think this is something that will come back to hurt some implementations, cutting out many of the mechanisms and common elements required in a successful API formula. Things like portals, documentations, SDKs, testing, monitoring, discovery, support, communications all contribute to an API working, or not working. I’ve said it before, and I’ll say it again. I’m not convinced that there is any true separation in public vs private APIs, and there remains to be a great deal we can learn from public API providers, and put to work across internal operations, and with our most trusted partners.


Observability In Botnet Takedown By Government On Private Infrastructure

I'm looking into how to make [API security](http://security.apievangelist.com) more transparent and observable lately, and looking for examples of companies, institutions, organizations, politicians, and the government are calling for observability into wherever APIs are impacting our world. Today's example comes out of [POLITICO's Morning Cybersecurity email newsletter](http://www.politico.com/tipsheets/morning-cybersecurity), which has become an amazing source of daily information for me, regarding transparency around the take down of bot networks. _"If private companies cooperate with government agencies - for example, in the takedown of botnets using the companies' infrastructure - they should do so as publicly as possible, [argued the Center for Democracy & Technology]( https://www.ntia.doc.gov/files/ntia/publications/cdt-ntia-nistcommentsbotnetsfinal.pdf) . "One upside to compulsory powers is that they presumptively become public eventually, and are usually overseen by judges or the legislative branch," CDT argued in its filing. "Voluntary efforts run the risk of operating in the dark and obscuring a level of coordination that would be offensive to the general public. It is imperative that private actors do not evolve into state actors without all the attendant oversight and accountability that comes with the latter."_ I've been tracking on the transparency statements and initiatives of all the API platforms. At some point I'm going to assemble the common building blocks of what is needed for executing platform transparency, and I will be including these asks of the federal government. As the Center for Democracy & Technology states this relationship between the public and private sector when it comes to platform surveillance needs to be more transparent and observable in all forms. Bots, IoT, and the negative impacts of API automation needs to be included in the transparency disclosure stack. If the government is working with platform to discover, surveil, or shutdown bot networks there should be some point in which operations should be shared, including the details of what was done. We need platform [transparency](http://transparency.apievangelist.com) and [observability](http://observability.apievangelist.com) at the public and private sector layer of engagement. Sure, this sharing of information would be time sensitive, respecting any investigations and laws, but if private sector infrastructure is being used to surveil and shut down U.S. citizens there should be an accessible, audit-able log for this activity. Of course it should also have an API allowing auditors and researchers to get all relevant information. Bots are just one layer of the API security research I'm doing, and the overlap in the bot world when it comes to API transparency, observability, and security is an increasingly significant vector when it comes to policing, [surveillance](http://surveillance.apievangelist.com), but also when it comes to protecting the privacy and safety of platform people (citizens).


Learning About Real-Time Advertising Bidding Transparency Using Ads.txt

I was learning about real-time bidding transparency using Ads.txt from Lukasz Olejnik. The mission of the ads.txt project is to “increase transparency in the programmatic advertising ecosystem. Ads.txt stands for Authorized Digital Sellers and is a simple, flexible and secure method that publishers and distributors can use to publicly declare the companies they authorize to sell their digital inventory.” While Ads.txt isn’t an API, it is an open, machine readable definition that is working to make advertising more transparent and observable to everyone, not just people in the ad-tech space.

Ads.txt works similar to robots.txt, and is a simple text file that lives in the root of a domain, listing the companies that have permission to sell advertising. The format is new, so there isn’t a lot of adoption yet, but you can see one in action over at the Washington Post. Helping make platforms observable is why I perform as the API Evangelist. I see them as one of the important tools we have for making systems, algorithms, and platforms more observable, and less of a black box. I see ads.txt having similar potential for the world of advertising, and something that eventually could have an API, for helping make sense of the very distributed, brokered, and often dark world of online advertising.

Honestly, I know almost nothing about online advertising. I have a basic level of understanding of Google Analytics, Adwords, and Adsense, as well as reading the blogs, and documentation for many advertising APIs I come across in my regular monitoring of the API space. I am just interested in ads.txt as an agent of observability, and pulling back the current on who is buying and selling our digital bits online. I am adding ads.txt to my API definitions research. This will allow me to keep an eye on the project, see where it goes, and think about the API level for aggregation of the ad network data, on maybe Github or something–caching ads.txt on a daily basis. Who knows where it will go. I’ll just keep learning, and playing around until it comes into a little more focus.

I’m guessing that not all companies will want to be transparent like this. Folks who have the upper hand, or are just doing shady things, rarely like to have the curtain pulled back. I’ll keep an eye ads.txt adoption, and also spend some time thinking thinking about transparency and observability at the API monetization and plans level, going beyond just advertising, which is the staple of the web, and think about how we can keep things open when it comes to API business models.


Requiring ALL Platform Partners Use The API So There Is A Registered Application

I wrote a story about Twitter allowing users to check or uncheck a box regarding sharing data with select Twitter partners. While I am happy to see this move from Twitter, I feel the concept of information sharing being simply being a checkbox is unacceptable. I wanted to make sure I praised Twitter in my last post, but I’d like to expand upon what I’d like to see from Twitter, as well as ALL other platforms that I depend on in my personal and professional life.

There is no reason that EVERY platform we depend on couldn’t require ALL partners to use their API, resulting in every single application of our data be registered as an official OAuth application. The technology is out there, and there is no reason it can’t be the default mode for operations. There just hasn’t been the need amongst platform providers, as as no significant demand from platform users. Even if you don’t get full access to delete and adjust the details of the integration and partnership, I’d still like to see companies, share as many details as they possibly can regarding any partner sharing relationships that involve my data.

OAuth is not the answer to all of the problems on this front, but it is the best solution we have right now, and we need to have more talk about how we can make it is more intuitive, informative, and usable by the average end-users, as well as 3rd party developers, and platform operators. API plus OAuth is the lowest cost, widely adopted, standards based approach to establishing a pipeline for ALL data, content, and algorithms operate within that gives a platform the access and control they desire, while opening up access to 3rd party integrators and application developers, and most importantly, it gives a voice to end-users–we just need to continue discussing how we can keep amplifying this voice.

To the folks who will DM, email, and Tweet at me after this story. I know it’s unrealistic and the platforms will never do business like this, but it is a future we could work towards. I want EVERY online service that I depend on to have an API. I want all of them to provide OAuth infrastructure to govern identify and access management for personally identifiable information. I want ALL platform partners to be required to use a platforms API, and register an application for any user who they are accessing data on behalf. I want all internal platform projects to also be registered as an application in my OAuth management area. Crazy talk? Well, Google does it for (most of) their internal applications, why can’t others? Platform apps, partner apps, and 3rd party apps all side by side.

The fact that this post will be viewed as crazy talk by most who work in the technology space demonstrates the imbalance that exists. The technology exists for doing this. Doing this would improve privacy and security. The only reason we do not do it is because the platforms, their partners and ivnestors are too worried about being this observable across operations. There is no reason why APIs plus OAuth application can’t be universal across ALL platforms online, with ALL partners being required to access personally identifiable information through an API, with end-uses at least involved in the conversaiton, if not given full control over whether or not personally identifiable information is shared, or not.


Diagramming The Components Of API Observability

I created a diagram of the politics of APIs sometime ago that has really held true for me, and is something I’ve continue to reference as part of my storytelling. I wanted to do a similar thing to help me evolve my notion of API observability. Like the politics of APIs, observability overlaps many areas of my API life cycle research. Also like the politics of APIs, observability involves many technical, business, and legal aspects of operating a platform online today.

Here is my first draft of a Venn diagram beginning to articulate what I see as the components of API observability:

The majority of the API observability conversation in the API space currently centers around logging, monitoring, and performance–driven by internal motivations, but done in a way that is very public. I’m looking to push forward the notion of API observability to transcend the technical, and address the other operational, industry, and even regulatory concerns that will help bring observability to everyone’s attention.

I do not think we should always be doing API, AI, ML and the other tech buzzwords out there if we do not have to–saying no to technology can be done. In the other cases where the answer is yes, we should be doing API, AI, and ML in an observable way. This is my core philosophy. The data, content, algorithms, and networks we are exposing using APIs, and using across web, mobile, device, and network applications should be observable by internal groups, as well as partners, and public stakeholders as it makes sense. There will be industry, community, and regulatory benefits for sectors that see observability as a positive thing, and go beyond just the technical side of observability, and work to be more observable in all the areas I’ve highlight above.


Bot Observability For Every Platform

I lightly keep an eye on the world of bots, as APIs are used to create them. In my work I see a lot of noise about bots usually in two main camps: 1) pro-bot - bots are the future, and 2) anti-bot - they are one of the biggest threats we face on the web. This is a magical marketing creating formula, which allows you to sell products to both sides of the equation, making money off of bot creation, as well as bot identification and defense–it is beautiful (if you live by disruption).

From my vantage point, I’m wondering why platforms do not provide more bot observability as a part of platform operations. There shouldn’t be services that tell us which accounts are bots, the platform should tell us by default, which users are real and which are automated (you know you know). Platforms should embrace automation and providing services and tooling to assist in their operation, which includes actual definitions of what is acceptable, and what unacceptable bot behavior. Then actually policing this behavior, and being observable in your actions around bot management and enforcement.

It feels like this is just another layer of technology that is being bastardized by the money that flow around technology so easily. Investment in lots of silly useless bots. Investment in bot armies that inflate customer numbers, advertising, and other ways of generating attention (to get investment), and generate revenue. It feels like Slack is the only leading bot platform that has fully embraced the bot conversation. Facebook and Twitter lightly reference the possibilities, and have made slight motions when it comes to managing the realities of bots, but when you Google “Twitter Bots” or “Facebook Bots”, neither of them dominate the conversation around what is happening–which very telling around how they view the world of bots.

Slack has a formal bots directory, and has defined the notion of a bot user, separating them from users–setting an example for bot developers to disclose who is bot, and who is not. They talk about bot ethics, and rules for building bots, and do a lot of storytelling about their vision for bots. Providing a pretty strong start towards getting a handle on the explosion of bots on their platform–taking the bull by the horns, owning the conversation, and setting the tone.

I’d say that Slack has a clearer business model for bots–not that people are actually going to pay for your bot (they aren’t), but a model is present. You can some smell of revenue strategies on Facebook, but it just feels like all roads lead to Facebook, and advertising partners there. I’d say Twitter has no notion of a botlike business model for developers. This doesn’t mean that Facebook and Twitter bots don’t generate revenue for folks targeting Facebook and Twitter, or play a role in influencing how money flows when it comes to eyeballs and clicks. Indirectly, Twitter and Facebook bots are making folks lots of money, it is just that platforms have chosen not to observable when it comes their bot practices and ecosystems.

Platform observability makes sense for not just platform, and bot developers, as Slack demonstrates it makes sense for end-users. Incentivizing bots generating value, instead of mayhem. I’m guessing advertising-driven Facebook and Twitter have embraced the value of mayhem–with advertising being the framework for generating their revenue. Slack has more of a product, with customers they want to make happy. With Facebook and Twitter the end-users are the product, so the bot game plays to a different tune.


Algorithmic Observability In Predictive Policing

As I study the world of APIs I am always on the lookout for good examples of APIs in action so that I can tell stories about them, and help influence the way folks do APIs. This is what I do each day. As part of this work, I am investing as much time as I can into better understanding how APIs can be used to help with algorithmic transparency, and helping us see into the black boxes that often are algorithms.

Algorithms are increasingly driving vital aspects of our world from what we see in our Facebook timelines, to whether or not we would commit a crime in the eyes of the legal system. I am reading about algorithms being used in policing in the Washington Monthly, and I learned about an important example of algorithmic transparency that I would like to highlight and learn more about. A classic argument regarding why algorithms should remain closed is centered around intellectual property and protecting the work that gives you your competitive advantage–if you share your secret algorithm, your competitors will just steal it. While discussing the predictive policing algorithm, Rebecca Wexler explores the competitive landscape:

But Perlin’s more transparent competitors appear to be doing just fine. TrueAllele’s main rival, a program called STRmix, which claims a 54 percent U.S. market share, has an official policy of providing defendants access to its source code, subject to a protective order. Its developer, John Buckleton, said that the key to his business success is not the code, but rather the training and support services the company provides for customers. “I’m committed to meaningful defense access,” he told me. He acknowledged the risk of leaks. “But we’re not going to reverse that policy because of it,” he said. “We’re just going to live with the consequences.”

And remember PredPol, the secretive developer of predictive policing software? HunchLab, one of PredPol’s key competitors, uses only open-source algorithms and code, reveals all of its input variables, and has shared models and training data with independent researchers. Jeremy Heffner, a HunchLab product manager, and data scientist explained why this makes business sense: only a tiny amount of the company’s time goes into its predictive model. The real value, he said, lies in gathering data and creating a secure, user-friendly interface.

In my experience, the folks who want to keep their algorithms closed are simply wanting to hide incompetence and shady things going on behind the scenes. If you listen to individual companies like Predpol, it is critical that algorithms stay closed, but if you look at the wider landscape you quickly realize this is not a requirement to stay competitive. There is no reason that all your algorithms can’t be wrapped with APIs, providing access to the inputs and outputs of all the active parts. Then using modern API management approaches these APIs can be opened up to researchers, law enforcement, government, journalists, and even end-users who are being impacted by algorithmic results, in a secure way.

I will be continuing to profile the algorithms being applied as part of predictive policing, and the digital legal system that surrounds it. As with other sectors where algorithms are being applied, and APIs are being put to work, I will work to find positive examples of algorithmic transparency like we are seeing from STRmix and HunchLab. I’d like to learn more about their approach to ensuring observability around these algorithms, and help showcase the benefits of transparency and observability of these types of algorithms that are impacting our worlds–helping make sure everyone knows that black box algorithms are a thing of the past, and the preferred approach of snake oil salesman.


Algorithmic Observability In Predictive Policing

As I study the world of APIs I am always on the lookout for good examples of APIs in action so that I can tell stories about them, and help influence the way folks do APIs. This is what I do each day. As part of this work, I am investing as much time as I can into better understanding how APIs can be used to help with algorithmic transparency, and helping us see into the black boxes that often are algorithms.

Algorithms are increasingly driving vital aspects of our world from what we see in our Facebook timelines, to whether or not we would commit a crime in the eyes of the legal system. I am reading about algorithms being used in policing in the Washington Monthly, and I learned about an important example of algorithmic transparency that I would like to highlight and learn more about. A classic argument regarding why algorithms should remain closed is centered around intellectual property and protecting the work that gives you your competitive advantage–if you share your secret algorithm, your competitors will just steal it. While discussing the predictive policing algorithm, Rebecca Wexler explores the competitive landscape:

But Perlin’s more transparent competitors appear to be doing just fine. TrueAllele’s main rival, a program called STRmix, which claims a 54 percent U.S. market share, has an official policy of providing defendants access to its source code, subject to a protective order. Its developer, John Buckleton, said that the key to his business success is not the code, but rather the training and support services the company provides for customers. “I’m committed to meaningful defense access,” he told me. He acknowledged the risk of leaks. “But we’re not going to reverse that policy because of it,” he said. “We’re just going to live with the consequences.”

And remember PredPol, the secretive developer of predictive policing software? HunchLab, one of PredPol’s key competitors, uses only open-source algorithms and code, reveals all of its input variables, and has shared models and training data with independent researchers. Jeremy Heffner, a HunchLab product manager, and data scientist explained why this makes business sense: only a tiny amount of the company’s time goes into its predictive model. The real value, he said, lies in gathering data and creating a secure, user-friendly interface.

In my experience, the folks who want to keep their algorithms closed are simply wanting to hide incompetence and shady things going on behind the scenes. If you listen to individual companies like Predpol, it is critical that algorithms stay closed, but if you look at the wider landscape you quickly realize this is not a requirement to stay competitive. There is no reason that all your algorithms can’t be wrapped with APIs, providing access to the inputs and outputs of all the active parts. Then using modern API management approaches these APIs can be opened up to researchers, law enforcement, government, journalists, and even end-users who are being impacted by algorithmic results, in a secure way.

I will be continuing to profile the algorithms being applied as part of predictive policing, and the digital legal system that surrounds it. As with other sectors where algorithms are being applied, and APIs are being put to work, I will work to find positive examples of algorithmic transparency like we are seeing from STRmix and HunchLab. I’d like to learn more about their approach to ensuring observability around these algorithms, and help showcase the benefits of transparency and observability of these types of algorithms that are impacting our worlds–helping make sure everyone knows that black box algorithms are a thing of the past, and the preferred approach of snake oil salesman.


Observability Is Needed to Quantify A DDoS Attack

The FCC released a statement from the CIO's office about a Denial-of-Service Attack on the FCC comment system, after John Oliver directed his viewers to go there and "express themselves". Oliver even published a domain (gofccyourself.com) that redirects you to the exact location of the comment system form, saving users a number of clicks before they could actually submit something. I am not making any linkage between what John Oliver did, and the DDoS attack claims from the FCC but would like to just highlight the complexity of what is DDoS, and how it's becoming an essential tool in our Cybersecurity Theater toolbox.

According to Wikipedia, "a denial-of-service attack (DoS attack) is a cyber-attack where the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host connected to the Internet. Denial of service is typically accomplished by flooding the targeted machine or resource with superfluous requests in an attempt to overload systems and prevent some or all legitimate requests from being fulfilled." It is a pretty straightforward way of taking down a website, application, and increasingly devices, but it is one that is often more theater than reality.

There are two sides of the DDoS coin: 1) how many requests an attacker can make, and 2) how many requests an attack receiver can handle. If a website, form or another service can only handle 100 requests in any second, it doesn't take much to become a DDoS attack. I worked at a company once, where the IT director claimed to be under sustained DDoS attack for weeks, crippling business, but after a review, it turned out he was running some really inefficient services, in an under-resourced server environment. My point is, that there is always a human making the decision about how many requests we should handle before things are actually are crippled, either by limiting the resources available before an attack occurs or by cutting off scaling up existing infrastructure because it would cost too much to achieve.

There are variations of the DDoS attacks, sometimes called a "cash overflow" attack, where a website operates in a scalable cloud, and can handle a large volume of requests, but eventually will cost a provider too much, and they will cut if off because they can't afford to pay the bill. A DDoS attack can be successful for a variety of reasons. Sometimes providers don't have the infrastructure to support and scale to the number of requests, sometimes providers can't afford to scale infrastructure to support, and other times a provider just makes the decision that a website, form, or device isn't worth scaling to support any level of demand beyond what is politically sensible.

I'm sure that many DDoS attacks are legitimate, but I know personally that in some cases they are also a theater skit performed by providers who are looking to cry foul or stimulate a specific type of conversation or response from a specific audience. I just think it is important to remember the definition of what a DDoS attack is, and always think a little more deeply about the motivations of both the DDoS attacker, as well as those under attack, and the political motivations of everyone involved, as well as the resource they have to contribute to the two-way street that is a distributed denial of service attack (DDoS)


If you think there is a link I should have listed here feel free to tweet it at me, or submit as a Github issue. Even though I do this full time, I'm still a one person show, and I miss quite a bit, and depend on my network to help me know what is going on.