Skip to main content

The Power of a Command-Line Processor


A command-line processor, as its name implies, is a software appliance intended to be executed from a command-line in pipelined fashion. Most operating systems are equipped with bunch of utilities that can be ingeniously combined to create powerful mini programs for transforming data. We will focus our attention here on jq specialized to mangle JSON data similarly how sed crunches textual content. You can easily start using it by issuing brew install jq on macOS (or download it for other operating systems). Nonetheless, even without placing anything on your machine, there is also a nice playground for trying out things online.

The following example illustrates what sort of actions could be crafted into a unified instruction, i.e., mini program that may be reused as a whole:

> echo "A test string." | jq -R "ascii_upcase | gsub(\"STRING\"; \"CONTENT\")"
"A TEST CONTENT."

The input is piped into jq as an ordinary string (this is hinted by the argument -R) and is turned into upper case by executing the ascii_upcase function whilst also changing STRING into CONTENT using gsub. Apparently, jq is applicable to pure textual content, too. Furthermore, observe that the same pipelining mechanism at the OS level is smoothly incorporated into jq itself. This lowers the cognitive load since everything revolves around the well-known Pipe and Filters pattern.

JQ on Steroids

Command-line usage is great, but jq, and similar processors, truly shine after being integrated into bigger systems. One notable example is Port, a SaaS variant of an internal developer portal; look here for an introduction into Port. At the time of this writing, Port leverages jq for specifying derived properties (like calculation) and migrating (evolving) data. In the rest of this text, I assume that you've created an account in Port and signed in.

Sample Blueprint

Create a new blueprint by selecting JSON editing mode as shown on the next figure. 

Afterward copy paste the content below:

{
  "identifier": "service",
  "description": "A dummy blueprint for showcasing JQ.",
  "title": "Service",
  "icon": "Service",
  "schema": {
    "properties": {
      "documentation": {
        "type": "array",
        "title": "Documentation",
        "description": "A list of URLs toward various documentations.",
        "icon": "Book",
        "items": {
          "type": "string",
          "format": "url"
        }
      },
      "in_production": {
        "title": "In production",
        "description": "Flag whether this service is in production.",
        "type": "boolean",
        "default": false,
        "icon": "Flag"
      }
    },
    "required": [
      "in_production"
    ]
  },
  "mirrorProperties": {},
  "calculationProperties": {
    "number_of_documents": {
      "title": "Number of documents",
      "description": "Number of associated documents.",
      "calculation": ".properties.documentation | length",
      "type": "number"
    }
  },
  "aggregationProperties": {},
  "relations": {}
}

Notice that it already contains a calculation property holding the number of listed document URLs. It is defined as a JQ query via the length function. At this point, you may want to produce couple of instances of this blueprint switching over into the Catalog section in Port.

Evolution of Data

Over time both your data model and entities (data instances) will need to undergo transformations and expansions. Port has an excellent support for this both at the UI and API level. We will invoke the data migration feature from the UI, as depicted in the image below.

You will next see a new dialog box comprised of 4 parts:

  1. The example section where you can provide test input.
  2. The selection box for the target blueprint (in our case it will be the same as the source).
  3. The JQ mapping definition that contains rules how to transform your data.
  4. The output section which shows the result of transformation.

Enter the following sample document into the top section:

{
  "identifier": "1234",
  "title": "Sample Service for Testing JQ",
  "team": [],
  "icon": "DefaultBlueprint",
  "properties": {
    "documentation": [
      "http://example.com/doc1.pdf",
      "https://example.com/doc2.pdf",
      "http://example.com/doc3.pdf"
    ],
    "in_production": true
  },
  "relations": {}
}

The above snippet tries to demonstrate a contrived use case of switching over to HTTPS everywhere. Two URLs are accessed via HTTP and these must be mapped to secure connection. The JQ mapping for accomplishing this is shown next utilizing the map function:

{
  "filter": ".properties.in_production == true",
  "entity": {
    "properties": {
      "documentation": ".properties.documentation | map(gsub(\"http://\"; \"https://\"))"
    }
  }
}

The filter controls what entities will be impacted. Here, we want to touch only data for services in production. The only property relevant is documentation. All the others that are omitted will retain their original value.

When to explicitly use the filter facility?

As usual, you can achieve the same effect in many ways. For example, in the above mapping the filtering logic could be directly embedded inside the transformation by using the if-then-else JQ flow control. Nevertheless, obeying the separation of concerns principle may make your code more readable and less error prone. This is especially important when the same filtering condition applies to multiple properties impacted by the transformation, thus helping attain the DRY principle.

Here is the transformed output after pressing the Test button in the dialog box:

{
  "identifier": "1234",
  "title": "Sample Service for Testing JQ",
  "team": [],
  "icon": "DefaultBlueprint",
  "properties": {
    "documentation": [
      "https://example.com/doc1.pdf",
      "https://example.com/doc2.pdf",
      "https://example.com/doc3.pdf"
    ],
    "in_production": true
  },
  "relations": {}
}

If you are happy with your mapping, you can apply it to your catalog by pressing the Migrate button. The audit log (accessible from the Builder tab) will contain details about updated entities.

Conclusion

You have witnessed the power of jq especially in the context of a larger system. Being acquainted with command-line tools and concomitant patterns is indispensable for a professional software engineer. This blog has only scratched the surface what is possible to achieve in this manner. For a teaser, below is the JQ mapping that pours only valid data from a deprecated property into a new one. The combined expression ensures that a new tag can be maximum 40 characters long and consisted of only lowercase letters and digits separated by single dashes.

{
  "filter": ".properties.tag_old | (test(\"^[a-z0-9]+(?:-[a-z0-9]+)*$\"; \"s\") and length <= 40)",
  "entity": {
    "properties": {
      "tag_new": ".properties.tag_old"
    }
  }
}

Comments

  1. Interesting article. But i wonder why split the filter and transformation into two? why not: if .properties.in_production then .properties.documentation |= map(gsub("http://"; "https://")) end for example?

    ReplyDelete
    Replies
    1. Thanks for the feedback and an opportunity to make improvement to the article based on it! I have added a small subsection inside the blog answering your remark, as I think it is an important information for other readers, too.

      Delete

Post a Comment

Popular posts from this blog

Brief Introduction to the JSON API v1 Specification

JSON API provides a systematic method of specifying Level 3 REST APIs in JSON. It has a registered hypermedia type of application/vnd.api+json issued by the Internet Assigned Numbers Authority (IANA). The next list briefly enrolls the most salient features and benefits of JSON API (see my book Creating Maintainable APIs for more information): It is a hypermedia-driven conventional interface : This is a fundamental difference between JSON API and other pure message formats. The JSON API, besides establishing a shared convention regarding message formats, also encompasses rules for how to interact with services (it specifies an assumed API). For example, JSON API contains instructions on how services should fetch, create, update, and delete resources, specify pagination, pass query parameters, and so on. This trait of JSON API is tightly related to the concept of openness in distributed systems. An open distributed system mandates certain rules regarding how services should be implem

Invention as a Reincarnation of an Old Idea

In all creative disciplines fundamental ideas reoccur in different shapes. This cannot be truer in computer science and software engineering. Advances in technology even enables old ideas to appear as pure novelties. In this article we will focus on an intellectual toolbox associated with morphing existing ideas, concepts, and technique to fit new scenarios. This is tightly connected with generalization and pattern matching, which are both prominent ingredients in achieving success in professional practice and various competitive activities. The latter is examined in more detail in my article about competitive software engineering . To make the point clear let us use a concrete example from Petlja , a website maintained by the Serbian Mathematical Society, which also organizes official competitions in mathematics and programming. The problem we will use is called Mingo . Below is its description translated to English. Problem Description Now, after completing the competition, Miroslav