Ojitha Hewa Kumanayaka

Posts

AWS Cloudformation to create AWS VPC

June 26, 2021

This blog will build AWS Virtual Private Cloud (VPC) creation using AWS Cloudformation template, step by step. However, it is necessary to plan your network with Classless Inter-Domain Routing (CIDR). Simplest VPC Planning Attach AWS Internet Gateway and Route Table Testing the VPC Simplest VPC Here this simplest workable CFN: Fig 1: Creates only VPC This creates only the VPC, as shown in the above diagram. AWSTemplateFormatVersion: "2010-09-09" Description: My VPC example Parameters: EnvironmentName: Description: prefix for the resources Type: String Default: oj-test Resources: VPC: Type: AWS::EC2::VPC Properties: CidrBlock: 10.192.0.0/24 EnableDnsSupport: true EnableDnsHostnames: true Tags: - Key: Name Value: !Ref EnvironmentName As shown in the CFN, the CIDR block is the same as Network Address Block in Fig 2. To create the stack aws cloudformation create-stack --template-body file://test.y...

Continue »

AWS Glue Workflow: Getting started

June 09, 2021

Create a fundamental Glue workflow using the AWS Cloudformation template. The Glue workflow replaces the use of the Step functions, which have been used to maintain Glue flow states. However, if you plan to automate your build deployment, here is the blog post 1 to help you. In this post, I completely ignore the AWS BuildPipeline, which is the recommended CI/CD pipeline explained in the above post. AWS Cloudformation for workflow Run the workflow Query in Athena Cleanup AWS Cloudformation for workflow CFN stack with the workflow As shown in the above diagram, trigger action the Glue Crawler. The CFN template is as follows: AWSTemplateFormatVersion: '2010-09-09' # Sample CFN YAML to demonstrate creating a crawler # # Parameters section contains names that are substituted in the Resources section # These parameters are the names the resources created in the Data Catalog Parameters: GlueWorkflowName: Type: String Description: workflow...

Continue »

Java Streaming API recipes

March 18, 2021

In Java, elements in the stream are subdivided into subsets (can be subdivided into further) which are processed in parallel using different cores of the CPU. Therefore, only stateless, non-interfering and associative subsets are eligible to run parallel. These subsets are combined for the short-circuit/terminal process for the final result. The sequential(default) or parallel mode depends on the last used method in the pipeline. Fig.1: Stream Streams API Short-Circuit terminal operations Stream Aggregation Collectors Streams API In the blog post Use of default and static methods , I have introduce how to write a factorial lambda functions using UnaryOperator . You can use the java.util.stream.IntStream.reduce function to calculate the factorial value in the stream as follows. var r =IntStream.range(1, 10).reduce(1,(a,b) -> a*b); Streams handling interfaces: use Java generics. BaseStream - defines core stream behaviours, such as managing the stre...

Continue »

Java 9 Parallelism

January 01, 2021

A current computer programming paradigm is not object-oriented anymore; it is about parallelism 1 . In the programming perspective, concurrency is the composition of independently executing processes. Parallelism is the simultaneous execution of computations. Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once. Concurrency is about structure , parallelism is about execution . Introduction Executor CompletableFuture Fig.1: Threading vs parallel tasking Java Futures are the way to support asynchronous operations. The Java fork/join framework and parallel streams divide a task into multiple subtasks and perform those subtasks parallelly on different cores in a single machine. This will avoid blocking a thread and wasting CPU resources. Introduction The key goal of parallelism is to efficiently partition tasks into sub-tasks(fork) and combine results(join). That will improve the performance in the way...

Continue »

Python my workflow

September 29, 2020

My Flow I combined two softwares using pyenv-virtualenv : pyenv manages multiple versions of Python itself virtualenv ( Python Virtual Environments: A Primer ) manages virtual environments for a specific Python version. pyenv-virtualenv manages virtual environments for across varying versions of Python. Here the way to create virtualenv pyenv virtualenv 3.7.2 p3 To activate the environment pyenv activate p3 To deactivate anytime pyenv deactivate To uninstall the virtualenv pyenv uninstall my-virtual-env Create a project Now we have virtual env p3 for example. Now need to create auto activating environment for the project myproject as follows mkdir myproject cd muproject pyenv local p3 Here the complete story Python 3 use of venv If you want to setup project with venv, then first you have to set the python version to 3 using pyenv pyenv global 3.8.0 Then create your project python -m venv project To activate the environment, move to the project directory s...

Continue »

Structured data meaning

August 17, 2020

Google uses structured data such as RDFa, Microdata or JSON-LD to understand the contents of the page. RDFa is based on the RDF and HTML5 extension. JOSN-LD can be created from the same RDF Turtle. Learn RDF Turtle to create structured data for SEO. RDFa 1 (Resource Description Framework in Attributes) is attribute-based, for example, a , href in the HTML/XHTML. RDFa does not affect the HTML code which appears in the HTML. If you violate General structured data guidelines 2 ; as a result, your page ranked lower, therefore, it is important to know how to use structured data properly. You can use Google Rich Result test 3 to test and verify your page. Valid structured data can be eligible to visible in graphical search results. RDFa can have the number of vocabularies (as I show in this blog), but currently, Google support only schema.org 4 . ⚠️ Currently google permits only to use the schema.org vocabulary for the structured data. Google has recommended to use JSON-LD. Turtle is...

Continue »

Apache Jena to learn RDF and SPARQL

August 14, 2020

RDF is one of the semantic web technology as well as the foundation for Turtle, N-Triples including JSON-LD. SPARQL is the query language for RDF. Use Apache Jena tools to learn RDF. For example, if you see the web page, that is human readable, because the end-user is human. However, there are search engines who choose the page on behalf of the consumer. Therefore, a search engine is a machine who wants to read the web page metadata. There should be well-structured data in the web page to be understood by the search engines by semantic parsing. GitHub Introduction to RDF Triples of the Data Model RDF family Embedding Turtle in HTML Some of the machine-readable metadata are: meta tag Microsdta Microformats RDFa JSON-LD It is necessary to know where the Resource Description Framework(RDF) 1 and SPARQL fit in the semantic web 2 . The semantic web is Web of data. RDF provides a foundation for publishing and linking data of all OWL 3 , SKOS, RDFS 4 and so on. If t...

Continue »