A command-line processor, as its name implies, is a software appliance intended to be executed from a command-line in pipelined fashion. Most operating systems are equipped with bunch of utilities that can be ingeniously combined to create powerful mini programs for transforming data. We will focus our attention here on jq specialized to mangle JSON data similarly how sed crunches textual content. You can easily start using it by issuing brew install jq on macOS (or download it for other operating systems). Nonetheless, even without placing anything on your machine, there is also a nice playground for trying out things online.
The following example illustrates what sort of actions could be crafted into a unified instruction, i.e., mini program that may be reused as a whole:
> echo "A test string." | jq -R "ascii_upcase | gsub(\"STRING\"; \"CONTENT\")"
"A TEST CONTENT."
The input is piped into jq as an ordinary string (this is hinted by the argument -R) and is turned into upper case by executing the ascii_upcase function whilst also changing STRING into CONTENT using gsub. Apparently, jq is applicable to pure textual content, too. Furthermore, observe that the same pipelining mechanism at the OS level is smoothly incorporated into jq itself. This lowers the cognitive load since everything revolves around the well-known Pipe and Filters pattern.
JQ on Steroids
Command-line usage is great, but jq, and similar processors, truly shine after being integrated into bigger systems. One notable example is Port, a SaaS variant of an internal developer portal; look here for an introduction into Port. At the time of this writing, Port leverages jq for specifying derived properties (like calculation) and migrating (evolving) data. In the rest of this text, I assume that you've created an account in Port and signed in.
Sample Blueprint
Create a new blueprint by selecting JSON editing mode as shown on the next figure.
Afterward copy paste the content below:
{
"identifier": "service",
"description": "A dummy blueprint for showcasing JQ.",
"title": "Service",
"icon": "Service",
"schema": {
"properties": {
"documentation": {
"type": "array",
"title": "Documentation",
"description": "A list of URLs toward various documentations.",
"icon": "Book",
"items": {
"type": "string",
"format": "url"
}
},
"in_production": {
"title": "In production",
"description": "Flag whether this service is in production.",
"type": "boolean",
"default": false,
"icon": "Flag"
}
},
"required": [
"in_production"
]
},
"mirrorProperties": {},
"calculationProperties": {
"number_of_documents": {
"title": "Number of documents",
"description": "Number of associated documents.",
"calculation": ".properties.documentation | length",
"type": "number"
}
},
"aggregationProperties": {},
"relations": {}
}
Notice that it already contains a calculation property holding the number of listed document URLs. It is defined as a JQ query via the length function. At this point, you may want to produce couple of instances of this blueprint switching over into the Catalog section in Port.
Evolution of Data
Over time both your data model and entities (data instances) will need to undergo transformations and expansions. Port has an excellent support for this both at the UI and API level. We will invoke the data migration feature from the UI, as depicted in the image below.
You will next see a new dialog box comprised of 4 parts:
- The example section where you can provide test input.
- The selection box for the target blueprint (in our case it will be the same as the source).
- The JQ mapping definition that contains rules how to transform your data.
- The output section which shows the result of transformation.
Enter the following sample document into the top section:
{
"identifier": "1234",
"title": "Sample Service for Testing JQ",
"team": [],
"icon": "DefaultBlueprint",
"properties": {
"documentation": [
"http://example.com/doc1.pdf",
"https://example.com/doc2.pdf",
"http://example.com/doc3.pdf"
],
"in_production": true
},
"relations": {}
}
The above snippet tries to demonstrate a contrived use case of switching over to HTTPS everywhere. Two URLs are accessed via HTTP and these must be mapped to secure connection. The JQ mapping for accomplishing this is shown next utilizing the map function:
{
"filter": ".properties.in_production == true",
"entity": {
"properties": {
"documentation": ".properties.documentation | map(gsub(\"http://\"; \"https://\"))"
}
}
}
The filter controls what entities will be impacted. Here, we want to touch only data for services in production. The only property relevant is documentation. All the others that are omitted will retain their original value.
When to explicitly use the filter facility?
As usual, you can achieve the same effect in many ways. For example, in the above mapping the filtering logic could be directly embedded inside the transformation by using the if-then-else JQ flow control. Nevertheless, obeying the separation of concerns principle may make your code more readable and less error prone. This is especially important when the same filtering condition applies to multiple properties impacted by the transformation, thus helping attain the DRY principle.
Here is the transformed output after pressing the Test button in the dialog box:
{
"identifier": "1234",
"title": "Sample Service for Testing JQ",
"team": [],
"icon": "DefaultBlueprint",
"properties": {
"documentation": [
"https://example.com/doc1.pdf",
"https://example.com/doc2.pdf",
"https://example.com/doc3.pdf"
],
"in_production": true
},
"relations": {}
}
If you are happy with your mapping, you can apply it to your catalog by pressing the Migrate button. The audit log (accessible from the Builder tab) will contain details about updated entities.
Conclusion
You have witnessed the power of jq especially in the context of a larger system. Being acquainted with command-line tools and concomitant patterns is indispensable for a professional software engineer. This blog has only scratched the surface what is possible to achieve in this manner. For a teaser, below is the JQ mapping that pours only valid data from a deprecated property into a new one. The combined expression ensures that a new tag can be maximum 40 characters long and consisted of only lowercase letters and digits separated by single dashes.
{
"filter": ".properties.tag_old | (test(\"^[a-z0-9]+(?:-[a-z0-9]+)*$\"; \"s\") and length <= 40)",
"entity": {
"properties": {
"tag_new": ".properties.tag_old"
}
}
}
Interesting article. But i wonder why split the filter and transformation into two? why not: if .properties.in_production then .properties.documentation |= map(gsub("http://"; "https://")) end for example?
ReplyDeleteThanks for the feedback and an opportunity to make improvement to the article based on it! I have added a small subsection inside the blog answering your remark, as I think it is an important information for other readers, too.
Delete