Notes of Maks Nemisj

Experiments with JavaScript

In this article I will explain “Why should you test react.js components”, “How can you do testing” and “What are the problems you might come across”. Testing solution which I use, doesn’t rely on the Test-utils of React and on DOM implementation ( so that you can do the node.js testing ).

Why/How to test React components?

The main question you might have with react components – “Why to test react components and what exactly to test?”

Setup

Before I will continue with the explanation I would like to show you a sample setup we will use. I will have two components ParentComponent.jsx and ChildComponent.jsx. In render() of ParentComponent.jsx ChildComponent will be rendered based on a ‘standalone’ property.

Definition of ParentComponent.jsx:

var React = require('react');

var Child = require('./ChildComponent.jsx');

module.exports = React.createClass({
  displayName: 'ParentComponent',

  renderChildren: function() {

    if (this.props.standalone === true) {
      return null;
    } else {
      return <Child />;
    }

  },

  render: function() {
    return (
      <div className="parent-component">
        {this.renderChildren()}
      </div>
    );
  }
});

ParentComponent.jsx

Here is definition of the ChildComponent.jsx:

var React = require('react');

module.exports = React.createClass({
  displayName: 'ChildComponent',

  render: function() {
    return <div className="child-component"></div>;
  }
});

ChildComponent.jsx

Code can be found at https://github.com/nemisj/react-mock-testing with the history following this article.

Unit testing

In my opinion whenever you have some logic in the code and it depends on any circumstances it’s important to test it, in order to make sure that what you expect is always valid. While this is obvious for ‘usual’ code, it’s not always clear what to test in react components.

If you look at the example above you will see that render() method of the ParentComponent.jsx will return the following html:

<div class="parent-component">
  <div class="child-component"></div>
</div>

And renderChildren() will return in its turn:

  <div class="child-component"></div>

When seeing this, you might start asking.

Do I have to test HTML…?

That’s a valid question and I was also asking it myself. The answer is – YES you DO have to test it and NO you DON’T have to test it.

Yes, you DO have to test HTML

The reason for that is simple. Think about the HTML asif it’s a return type. Imagine if renderChildren() would return you a real instance of ChildComponent. You wouldn’t even ask whether to test it or not. Because it’s an instance you would just do some kind of instanceof and that’s it. But because in react a markup (HTML) is returned, it feels like a different story. Nevertheless HTML is the only medium there is, so we have to deal with it as it is.

No, you DON’T have to test HTML

When testing component we are not interested in the content of HTML itself. It does not matter what kind of node component returns, either it’s a <span> with a CSS-class or a <div> with an attribute. What is important, is to test what certain element means to us inside our application/code.

Take for example markup of ChildComponent<div class="child-component"></div>.

Whenever it is returned by ParentComponent this markup means to us that there is a ChildComponent instance which is returned. Not a <div> element with css-class “child-component”, but instance of ChildComponent. That’s the reason why we DON’T test HTML as a browser language, but we DO test HTML as an instance definition.

Implicit instance checking

In it’s simplest form in order to test our logic of ParentComponent, we have to test that its render() method returns HTML which will contain <div class="child-component"></div>. Suchwise we can identify that actually ChildComponent was instantiated inside ParentComponent and not something else.

It feels like an implicit instance checking, since we deal not directly with instance, but with the ‘representations’ of it.

Representation Type
<div class=”child-component”></div> ChildComponent.jsx
<div class=”parent-component”></div> ParentComponent.jsx

Writing tests

Let’s look at the possible tests for ParentComponent. I’ve used (React.renderToStaticMarkup)[https://facebook.github.io/react/docs/top-level-api.html#react.rendertostaticmarkup] to do assertions based on string. This method will return string value of rendered component.

In addition such approach allows to run tests in node.js environment without need for having any DOM implementation available inside environment.

Below is the test test/ParentComponent.test.js written using (mocha)[http://mochajs.org/] testing framework.

var React = require('react');
var expect = require('chai').expect;

var ParentComponent = require('../ParentComponent.jsx');

describe('ParentComponent', function() {

  var childType = '<div class="child-component"></div>';

  it('should render with child', function() {
    var markup = React.renderToStaticMarkup(<ParentComponent />);

    expect(markup).to.contains(childType);

  });

  it('should render without child', function() {
    var markup = React.renderToStaticMarkup(<ParentComponent standalone={true} />);

    expect(markup).to.not.contains(childType);
  });

});

test/ParentComponent.test.js

As you can see, there is one test ‘should render with child’ for testing the existence of the ChildComponent inside the html and another test ‘should render without child’ for testing that child component is not returned.

While this solution is working, it has one big disadvantage.

To see this, let’s imagine that definition of the ChildComponent.jsx will change to the following form:

var React = require('react');

module.exports = React.createClass({
  displayName: 'ChildComponent',

  render: function() {
    return (
      <div className="child-component">
        Inner Text
      </div>
    );
  }
});

ChildComponent.jsx

Because the content of the ChildComponent.jsx is changed to <div class="child-component">Inner text</div>, our test will fail.

AssertionError: expected '<div class="parent-component"><div class="child-component">Inner Text</div></div>' to include '<div class="child-component"></div>'

This is the reason why testing HTML feels so wrong at the beginning, because the implementation of the component has dependency on the test of the ParentComponent and deeper nesting will mean bigger change in returned HTML. But, bear with me a little bit more.

Mock

As I told, we are not interested in the HTML itself, but only in the fact that this HTML represents a certain type. If we will mock the ChildComponent with our own definition, then we could abstract the implementation of the child away from the parent.

To do mocking I’ve used rewire library, but you can use the one which better fits your architecture and needs. It’s also possible that you use a Dependency Injection library in your architecture and need another way of mocking.

Rewire library allows patching of the private variables in the module. Just require a module using rewire method and then use __set__ on the module. Let’s look at the example:

var rewire = require('rewire');
var ParentComponent = rewire('./ParentComponent.jsx');

var ChildComponentMock = {};

ParentComponent.__set__('Child', {})

rewire-example.js

In this example Child variable is replaced with an empty object.

This leads my story to the next point.

We can create a mock component and replace the real one. By doing so mock representation will be used, whenever ParentComponent will render. To do comparison we can render this mock separatly and use in assertion.

Below is an implementation of test case together with mock:

var React = require('react');
var expect = require('chai').expect;
var rewire = require('rewire');

var ParentComponent = rewire('../ParentComponent.jsx');

var ChildMock = React.createClass({
  render: function () {
    return <div className="child-mock-component" />;
  }
});

ParentComponent.__set__('Child', ChildMock);

describe('ParentComponent', function() {

  var childType = React.renderToStaticMarkup(<ChildMock />);

  it('should render with child', function() {
    var markup = React.renderToStaticMarkup(<ParentComponent />);

    expect(markup).to.contains(childType);

  });

  it('should render without child', function() {
    var markup = React.renderToStaticMarkup(<ParentComponent standalone={true} />);

    expect(markup).to.not.contains(childType);
  });

});

test/ParentComponent.test.js

Let’s walk through this code.

  • First of all rewire module is required.
  • After that ChildMock component is created. This component will represent our ChildComponent type.
  • Using __set__ method of the rewire, replace the real component with the mock
  • Compare whether the ParentComponent contains ChildComponent mock representation.

As you can see using a mock for ChildComponent we can test whether the ParentComponent uses the correct component.

Small optimization

We can abstract the creation of the mock into a separate function, and make component distinguishable based on the custom tag and not css-class. Using React.createElement we can make custom tags.

function getMock(componentName) {
  return React.createClass({
    render: function () {
      return React.createElement(componentName);
    }
  });
}

var ChildMock = getMock('ChildComponent');

test/ParentComponent.test.js

ChildMock represenation will look like <ChildComponent></ChildComponent>

Testing this.props

Components can be parameterized via the props definition. Now that it’s possible to express type of the component via its HTML representation, let’s think how can we test component. Since it’s an input for our component, it’s vital to test it as well. Imagine that ChildComponent will use property “childName” to render it inside node. If ParentCompoentn passes wrong value to it, we will have incorrect screen.

I have simplified ParentComponent.jsx, removed if statement and added childName property when rendering ChildComponent in the code below:

var React = require('react');

var Child = require('./ChildComponent.jsx');

module.exports = React.createClass({
  displayName: 'ParentComponent',

  renderChildren: function() {
    return <Child childName="Name of the child"/>;
  },

  render: function() {
    return (
      <div className="parent-component">
        {this.renderChildren()}
      </div>
    );
  }
});

ParentComponent.jsx

If we use current implementation of the mock, we will never find out what properties have been passed to the ChildComponent, because they will be dropped in the representation. By slightly modifying our mock component, we could serialize the properties into the HTML and make them comparable. React.createElement can help us with this, because second argument of React.createElement will be converted into attributes of the node. In this way we could pass received properties to it.

function getMock(componentName) {
  return React.createClass({
    render: function() {
      return React.createElement(componentName, this.props);
    }
  });
}

var ChildMock = getMock('ChildComponent');

The only problem with this solution, is that React will skip attributes which don’t belong to HTML, unless they are prefixed with “data-“. This means that we have to iterate over all the properties and append “data-” prefix to all custom attributes. I say to all custom attributes, because we don’t want to prefix attributes which are native to react, like className, disable, etc. We can use DOMProperty.isStandardName object from “react/lib/DOMProperty.js” to find our which properties are native. Also names, must be lowercased, otherwise React will gives us an error that attributes must be all lowercase.

var DOMProperty = require('react/lib/DOMProperty.js');

var createAttributes = function (props) {
  var attrs = {};

  Object.keys(props).forEach(function (key) {
    var attrName = DOMProperty.isStandardName[key] ? key : ('data-' + key.toLowerCase());
    attrs[attrName] = props[key];
  });

  return attrs
}

function getMock(componentName) {
  return React.createClass({
    render: function() {
      var attrs = createAttributes(this.props);
      return React.createElement(componentName, attrs);
    }
  });
}

test/ParentComponent.js

Now, if we instantiate ChildMock with attribute childName, it will additionally have the childName property serialized into the HTML, like this:

<ChildComponent data-childname="Name of the child"></ChildComponent>

In this way we can check both and the type os the returned component and properties which are passed to it.

Shallow rendering

Instead of writing own mockShallow rendering feature of the Test-Utils of the react could be used.

It allows to do the type checking of the return markup like we do. Shallow rendering will return the first level of the components. Unfortunately the problem relyies in the fact, that it gives back the first elements of the given element and in this approach there is a problem.

Nowdays, when React is going away from the mixins because of ES6 classes, people found out another way of getting mixing into play. What they do is wrapping components with virtual components. Virtual components are components which are not visible and have no own representation, but they bring some mixable functionality in wrapped component. For example fluxible is one of this libraries doing that.

To make it clear, look at the following code:

var ParentComponent = React.createClass({
  render: function() {
    return <div/>;
  }
});

function handleMixin(Component) {
  return React.createClass({
    render: function() {
      return React.createElement(Component, objectAssign({}, this.props, this.state));
    }
  });
}

ParentComponent = handleMixin(ParentComponent);

Because components are wrapped around with another components, the first elements for shallow renderer will become not the nested child, but the parent componnent itself. And this is really a problem, since we would like to test the children of this parent, and they’re not available there.

That’s it for today. You can find source code of this article at https://github.com/nemisj/react-mock-testing.

, , , ,

Sometimes easy things appear to be more complicated, than initially thought. For example conditional IE comments in HTML, which I had to add today to a code I write.

At my work we have to support Internet Explorer browser version 9 an higher . In order to use media-queries we decided to use https://github.com/weblinc/media-match polyfill library.

What can be easy than that. Just add a conditional comment <!–[if lte IE 9]> and that’s it.

 <!--[if lte IE 9]>
    <script src="/public/media.match.js">
 <![endif]-->

But things appear to be a bit more complicated, due to the React.js and isomorphic SPA which we build.

Unfortunately React.js doesn’t render HTML comments if you put them inside jsx file. In our architecture we have main HTML.jsx component which renders the whole HTML page on the server. So the solution I tried as the first one just didn’t work out:

renderHead: function() {
  return (
    <head>
      <!--[if lte IE 9]>
      <script src="/public/media.match.js"/>
      <![endif]-->
    </head>
  );
}

index.jsx

The only possible way to render HTML comments within jsx was to use dangerouslySetInnerHTML attribute of rect.’s and put that comment in there:

renderHead: function() {
  return (
    <head dangerouslySetInnerHTML={{__html: '<!--[if lte IE 9]><script src="/public/media.match.js"></script><![endif]-->'}}>
    </head>
  );
}

index.jsx

Also note that actually <script> tag must be closed separately, since the shortened version will not work correctly in the rect.’s and you will receive js errors.

The last but not least problem is that this will not work, if you have more than one item inside the <head> tag. Ofcourse you could put the whole html string also inside dangerouslySetInnerHTML, but imho that looks lame.

That’s why I’ve abused the forgiveness of the browser’s html parser and placed the conditional comment inside the <meta> tag, which works perfectly now.

It does respect the configuration of the npm if it points to the different location then “/tmp”. And it cleansup only the folder current “npm install” command has created.

If you don’t like oneliners, you also can create separate **”npm-tmp-clean.sh”** file showhere in projects’ bin folder and execute it through the “scripts”:

This script is a bit different then the one liner because npm runs the script in a separate process and finding the PID of the npm becomes a bit different.

I would strongly advice to use this oneliner in any project you run with npm, since you never may be sure, that your dependencies are not leaking into the /tmp folder. Or at least, make the cron job on the server to cleanup the /tmp folder if this is the preferred way of cleaning up things.

Important to note that this solution doesn’t work when using package on Windows.

, ,

How often do you wanted to go to the root of the git repository? If not very often, then I do it quite often.

Mercurial has this nice command hg root, but git not. In order to do it in git you have to use some long command 'rev-parse --show-toplevel' which I even can’t remember. Fortunately git has aliases which can be used to define custom commands, like git root:

git config --global alias.root 'rev-parse --show-toplevel'

Now, if I want to go to the root of the project, only what I have to do is to type in bash:

cd $(git root)

,

How often have you done this in you code?

var zork;

Instead of this?

var zork = null;

While it might be used to accomplish the same result, the code above has some hidden pitfall in it. Image situation that you always use the code above, also inside your for statements.

for (code) {
    var zork;
    if (zork) { }
}

If you wouldn’t be careful enough, you might trap in to unwanted situation. Look at the code below:

function tester(makeit) {                                                                                                                                                                                                          
     var length = 5;
     for (var i=0,l=length;i<l;i++){
         var zork;
         if (makeit && 2 == i) {
             zork = 'Defining zork ' + i;
         }
         console.log(i + ':' + zork);
     } 
};
tester(true);

Instead of nulling `zork` variable on every iteration, old value stays untouched, which can bring unexpected situation. Like checking for null value of the `zork` and instead of getting the correct result like:

0:null
VM327:9 1:null
VM327:9 2:Defining zork 2
VM327:9 3:null
VM327:9 4:null

You will get:

0:undefined
VM254:9 1:undefined
VM254:9 2:Defining zork 2
VM254:9 3:Defining zork 2
VM254:9 4:Defining zork 2

Always try to define your variables on top and be explicit, if you want to null it inside loop, then null it.

,

I wrote a vim script which can help you to do ‘gf’ on require statements when coding node.js modules. It uses node.js module resolution algorithm to add all the node_modules folders into the vim path variable, so that you can do ‘gf’ on modules which are not relative to the current folder of the file. Which means that doing ‘gf’ on require(‘express’) will open folder of the express source code.

Why.
Currently in my tests I don’t use relative paths to the source code, but instead module name and path, like:

require('my-module/lib/some-unit-to-test.js');

so that I don’t have a path dependency of the test relative to the source code. This script helps me to jump into the source code from the test file. Might be useful for other people who also do vim and node.js.

https://raw.github.com/nemisj/vimfiles/master/local-config/after/ftplugin/javascript.vim

Put it in vimfiles/after/after/ftplugin/ and it will do the trick.

, ,

Hi all, today is another javascript experiment where I use Array’s iteration methods forEach and map to look how call function is implemented and how we can use call to implement something ‘not-standard’.

You know often I use forEach or map on an array to execute one method of the instances in this array, e.g.,

var json = arr.map(function (obj) {
    return obj.serialize();
});

Today I thought: “hey, but actually, that would be cool to execute "serialize" method not directly in a anonymous function, but pass it as a parameter to the map method in a more declarative style”. Something like this one:

// arr is an instance of Array field with Zork's
var json = arr.map(obj.serialize);

To say frankly, one of the reasons why I love javascript so much, is because there is always a way to implement something in an obscure manner and I think this case is not any different.

So, let’s start our small journey and imagine that objects in the arrvariable are of the type Zork.

When I first thought about how would I achieve this, my first thought was: “Ok, that’s nice, but where do I take the reference to the serialize method from if it’s on every obj, which is available at the moment of iteration? But it’s avaliable also on the prototype of the obj instance, on the prototype of the Zork“, pseudo-code:

var json = arr.map( Zork.prototype.serialize );

But, we all know that it wont work, since map will pass instance of every object as a first argument and scope of serialize method would be wrong.

Luckily, javascript has the ability to change scope of any function/method to the desired one. This can be achieved by using call or apply method which is a member of any normal function in javascript.

Call” method receives scope as the first argument, this means that passing any Zork instance to the call will change scope and execute serialize with a correct scope. Here the example code:

var json = arr.map(function (obj) {
    return Zork.prototype.serialize.call(obj);
});

Check, that’s exactly what I need. So, theoretically this means, that I can also pass “call” function directly to the map

var json = arr.map( Zork.prototype.serialize.call );

When I tried to execute the above function Firfox gave me an error saying:

TypeError: Function.prototype.call called on incompatible undefined

This is not very explanatory message, but after making a couple of tests the answer was a bit shocking for me. I will show you this tests later on, but first my conclusion: “It appears that call function is using the same context resolution principles as for normal “method” calls, but instead call uses its ‘this’ variable for resolving referenced function and not associated object.”

Before I continue let’s remain the basics how javascript resolves this variable for functions.

Function can be called at least in five different ways: “invoke as method”, “invoke as baseless function”, “invoke using “Function.protoype.call”, “function.prototype.apply” and “invoke a constructor using new”. Currently we are interested in two of them: “invoke as method” and “invoke as baseless function”.

When we are caching method and invoking it separatly we are dealing with “invoke as baseless function”. At this moment, execution context ( with other words this variable ) is resolved to the global. Here is example code:

var myObject = {
    some_method: function () {
        return this;
    };
};
//
var cached = myObject.some_method;
cached();

In this code this variable doesn’t point to the myObject object, but instead points to the global something. This is due to the fact, that called function is not called as property of myObject, but instead as anonymous baseless function.

But when function is invoked as a property of the object (“invoke as method”), this variable will point to that object. To say with other words: when function is called “this variable always points to the object to which this function belongs as a property at the moment of the execution. Example to make it clear:

var myObject = {
    some_method: function () {
        return this;
    };
};
//
var second_object = {};
second_object.newMethod = myObject.some_method;
second_object.newMethod();

In the code above “this” variable will point to the second_object, because at the moment of the execution newMethod function was property of second_object object.

Now, after we remembered how context resolution works in javascript, let’s return to our call story.

As I stated, call function is using the same resolution principles and doesn’t have tight reference to it’s associated function and uses this variable to execute associated function. Interesting, isn’t it? In the normal flow execution this variable points to the object, but in our case it points to the function. In pseudo-code you could write call implementation something like this:

var call = function () {
    this();
}

To test my thoughts I’ve created a test snippets which will show below. My snippets uses Zork prototype which is defined as follows:

function Zork(name) {
    this.name = name;
}
Zork.prototype.serialize = function () {
    console.log(this.name);
    return 'My name is ' + this.name;
}

I propose to run call function as anonymous (baseless) function and ensure that it throws the same error as above:

var serializeCall = Zork.prototype.serialize.call;
serializeCall({
    name: 'Nemisj'
});

Console gave back exactly the same error as above.

TypeError: Function.prototype.call called on incompatible undefined

Now let’s execute “call” as a property of “serialize” method, so that we can see whether “call” function is using serialize method as its execution context:

var serialize = Zork.prototype.serialize;
serialize.call({
    name: 'Nemisj'
});

After executing that code it has printed “Nemisj” which proved that “call” is not bound to the referenced function, but defines it at the moment of the execution. If my assumption is correct it also means that I can invoke call function by using its own call method, only instead of passing object to the second call I will pass serialize method.

var serializeCall = Zork.prototype.serialize.call;
serializeCall.call( Zork.prototype.serialize, { name: 'Nemisj' });

This might look a bit weired, but it’s still working :)

This information led me to the next conclusion: In normal situation, if I want this variable to point to the correct object, I use “bind“. But does this mean, that I can use bind function to preserve the associated method for call function? Let’s try it out:

var serializeCallBinded = Zork.prototype.serialize.call.bind( Zork.prototype.serialize );
serializeCallBinded({
    name: 'Nemisj'
});

And the answer is: YES :) After executing this code, it still prints “Nemisj”. Great, now we know how to execute call as anonymous function, let’s move on with our achievement.
First implementation with bind:

var serializeCallBinded = Zork.prototype.serialize.call.bind( Zork.prototype.serialize );
var json = arr.map( serializeCallBinded );

But, any iteration method of an Array supports second parameter, which defines the SCOPE of the executed function and since call uses it’s scope to reference the parent function it means, that I can pass the “serialize” method as the second argument and have one nice one-liner:

var json = arr.map( Zork.prototype.serialize.call, Zork.prototype.serialize );

Run…and…check ( http://jsfiddle.net/vNRjd/2 )…code is working :) Well, mission is accomplished I guess.

Except that it does look a bit weird. Passing the same function twice, writing the long Zork.prototype twice..mhee…not nice :) Let’s add some sugar to it.

Because “call” function is isolated and is not bound to its parent function until the invocation, we can pass any native call method to the map as the first argument:

var json = arr.map( Function.prototype.call, Zork.prototype.serialize );

It’s not necessary to give the call method of the prototype of a function, it also can be any call method from any method/function, even the evil “eval” one:

var json = arr.map( eval.call, Zork.prototype.serialize );

But the Function constructor also has a direct “call” method, which simplifies our invocation even more:

var json = arr.map( Function.call, Zork.prototype.serialize );

This looks much prettier and almost human readable: map an array by calling the serialize of Zork :)

I don’t advocate using this code in your production environment. It’s not very self explanatory and it’s a bit slow :) For those who like graphs here is one http://jsperf.com/combine-foreach-and-call. There are more reasons why you shouldn’t write such unreadable code, but I leave it up to you. By making such obscure implementations I understand the working of javascript more and more and I hope you do too.

Have a happy coding.

, , , , , , ,

I bet you know the technique of caching “this” scope in order to use it in a closure.

var self = this;
setTimeout(function () {
    self.doSomething();
}, 100);

Of course it’s not a nice way of writing your code, but sometimes it’s just a fast shortcut, especially if you’re working with an older browser which doesn’t support “bind” feature.

Nevertheless, even when I do use it I stopped using “self” as a variable name for “this” cache. Anything else _this, that, This, whatever_cache_name, but NO self.

And a reason is very simple. self is always available at run-time, even if you move your code or refactor it. It will be always a defined variable, no matter what, leaving you without a nice error like “ReferenceError: self is not defined”. Meaning that forgotten `var self = this` will be sitting as a spy in your code till the last moment before it will break.

Recently I was doing string casting to a number and while you think it’s simple as it is, it took me a bit to make it fully functional.

My first approach was to use a Number constructor for casting, which would return number or NaN if value was not parsable to the number. But for empty strings, boolean values and null values, this approach gave unsatisfying result.

   Number("zork123") // NaN
   Number("")) // 0
   Number(" "))  // 0
   Number(null) // 0
   Number(false) // 0
   Number(true) // 1

Then, I decided to use isNaN to check the string before applying Number constructor. But isNaN was for no help either. It looks like underneath isNaN uses Number implementation and gives the same result as above :(

   isNaN("zork123")) // true
   isNaN("")) // false
   isNaN(" "))  // false
   isNaN(0))  // false
   isNaN(null) // false
   isNaN(false) // false

After my isNaN attempt faild my next idea was to use parseFloat. But problem with parseFloat is that strings like “123aa” are parsed into 123.

   parseFloat('zork123') // 123
   parseFloat("")) // NaN
   parseFloat(" "))  // NaN
   parseFloat(null) // NaN
   parseFloat(false) // NaN
   parseFloat(true) // NaN
   parseFloat('11111') // 11111
   parseFloat('1.2') // 1

The one solution which is left, was to cast value to the String by using String constructor and then pass it to the Number constructor, which would make boolean and null in a string representation and would result in NaN value for bad strings. Great.

   Number(String('zork123') // NaN
   Number(String("")) // 0
   Number(String(null)); // NaN
   Number(String(false)); // NaN
   Number(String('1.2')) // 1.2

Of course empty string is a breaker here, so time test string before using for emptiness:

    Number(String.trim(value) === '' ? NaN : String(value))

Note: dont’ forget that trim ( [https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/String/Trim](https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/String/Trim) ) works only in IE9+ and normal modern browsers

JavaScript keeps surprising me over and over again. I Love this language, but some of the aspects of it, like this one, are just nuts :)

Related StackOverflow question:

[http://stackoverflow.com/questions/825402/why-does-isnan-equal-false](http://stackoverflow.com/questions/825402/why-does-isnan-equal-false)

, ,

^

This article is dedicated to regular expressions in python based on javascript knowledge. This is the follow up to the article “Javascript to Python API reference guide”.

There is not much of python RegExps that you could directly map to javascript. For this reason i’ve decided to write an introduction article to regexp in python, by using your current knowledge of JavaScript RegExps. It would still be possible to reverse-engineer this article to get the knowledge about JavaScript based on Python examples ;). But enough talking, let’s start diving.

(Main)

Working with regexp begins with importing the “re” module.

    import re

Now you are ready to perform searches, but forget about the lovely inline RegExp literals like /zork/gi. Python leaves us with normal, boring strings e.g., "zork" and flags also are not the part of the expression, but are defined in a different place. As you will shortly see for yourself.

Due to the fact that the expression is a string, you will quickly get tired of using escapes in your regexp. The good news is that python has raw-strings, which can help you with this problem. Just prepend the ‘r’ symbol before a string and backslashes will be interpreted as characters and not escapes:

    # python:
    str = r"regexp\."

Now the basics are covered, it’s time to start. I propose we start this journey with a simple regexp test, which most of us perform everyday.

    // js:
    var r = /bork/.test("dork bork fork zork");
    (r === true);

Python doesn’t provide any direct equivalent instead, the search method can be used to accomplish the desired task. This method returns None if nothing is found and this is exactly what we can use for our test. I will explain about the search method a bit later, but for now this will work as a test equiualent.

    # python:
    r = (re.search(r"regexp", "someString") != None);

Well, a test is a fine start, but a regexp is often used for real searching. And while you would expect that I will now tell you more about the search method, you will be disappointed :)

Next, we are going to search for all ( using GLOBAL flag ) occurrences of a regular expression in a string, but with a different method. If you have noticed, I’ve put an extra accent the GLOBAL word because the match method in javascript returns different information whenever the global flag and groups are used. Example:

    // javascript:
    "dork bork fork zork".match(/(b|z)ork/g) == [ "bork", "zork" ]
    "dork bork fork zork".match(/(b|z)ork/) == [ "bork", "b" ]

From this snippet you can see that match omits groups and only entire matches are returned when global is on and shows group information when g is on.

On the other hand python is “more” consistent with return values. It doesn’t omit groups, instead it ONLY returns them. Which means that if you want to have a simple array of all matches, groups SHOULD be uncaptured ( by using ?: after parenthesis ).

    # python
    import re
    re.findall(r"(?:b|z)ork", "dork bork fork zork") == [ "bork", "zork" ]
    re.findall(r"(?:b|z)ork", "dork") == [ ]

Please also take a look at the return value of the last call. When no match is found empty array is returned and NOT NULL like in javascript.

UPDATE: In python an empty array evaluates to false, which means you can use the if construct:

   # python
   if []: 
       print "Will never be called"

Forgotten uncaptured groups provide us with different results:

    # python:
    import re
    re.findall(r"(b|z)ork", "dork bork fork zork") == [ "b", "z" ]

See? This result has only groups in it and not an entire match.

Sometimes I use a workaround which gives me more powerful version – wrap entire regexp in a group and it will give the whole match as a first item in your tuple. Since it’s a quirk there is also a normal way of doing this in python, but I will tell about it later. First, the quirk example:

    # python:
    import re
    re.findall(r"((b|z)ork)", "dork bork fork zork") == [('bork', 'b'), ('zork', 'z')]

That’s fun, isn’t it, javascript has no direct mapping to such an extended result, you could achieve the same with replace, but that’s a different story. Still there are a couple of methods to go.

The next question is, how would you, in a pythonic way, get groups and the whole match result. Let’s start with a simple, non global version. In javascript it’s a matter of taking away the ‘g’ flag, right?

    // javascript:
    var result = "dork bork fork zork".match(/(b|z)ork/);
    result == [ "bork", "b" ];
    // full match of the regexp
    var match  = result[0]; // equals "bork"
    var group1 = result[1]; // equals "b"
    // var groupN = result[N];

In python you would use search as an equivalent. There is also the match method, which is similar to search, but it’s slightly limited. You can read more about it here

    # python:
    import re
    result = re.search(r"(b|z)ork","dork bork fork zork")
    # result == MatchObject instance
    # full match of the regexp
    match  = result.group(0) # 0 can be omitted  - result.group() will do the same
    group1 = result.group(1)
    # groupN = result.group(n)
    re.search(r"(b|z)ork","dork") == None

While you are used to working with arrays of strings in javascript, python gives you access to the MatchObject itself. This object has a lot of extra information which you can use when doing regexp matches, just read the manual.

I think your next question is, what is a pythonic way of doing this for all matches? As you know javascript doesn’t have one and often the replace method is used for such situations.

    // javascript :
    "dork bork fork zork".match(/(b|z)ork/g, function(match, group1 ...groupN, pos, full_str) {
        // use position, group information, etc
    });

To achieve all group matches in python you can use the finditer method, which will return an iterator with MatchObject instances.

    # python:
    iter = re.finditer(r"(b|z)ork", "dork bork fork zork");
    for result in iter:
        match  = result.group()
        group1 = result.group(1)

Okidoki. The basics are covered. Now for the last part: flags, where do you put them and how do you use them.

Normally, regexps in python are compiled before being used. I haven’t used this feature ’cause I wanted examples to be as close as possible to the javascript ones. When regexps are compiled they have the same methods which I’ve already covered.

    # python
    import re
    p = re.compile(r"(b|z)ork", re.IGNORECASE)
    p.search("dork bork fork zork")

That finished my introduction. Before I go, I would like to give some advice about the findall method. Despite the fact that it’s easier to map it to your javascript knowledge, I would recommend that you use finditer instead of findall. First of all, you will save yourself time by not fixing captured groups all the time and the second reason is that finditer is much more powerfull and can be used for a broader scope of problems.

Thank you for your attention.

Links to read:

$

, ,

Previous Posts

Theme created by thememotive.com. Powered by WordPress.org.