Cite Lines

When referencing patents, deposition testimony transcripts, and trial testimony transcripts, those documents contain easy and accurate ways to cite a single line or a range of lines.

However, citing lines in electronic text documents, like source code, can be tricky. For example, electronic text documents have the following characteristics:

  • A line can be terminated with:
    • a pair of “carriage return” and “line feed” characters (convention on Windows),
    • a single “line feed” character (convention on Linux/UNIX/macOS), or
    • a single “carriage return” character (convention on Mac OS prior to Mac OS X).
  • The last line might not be terminated with any of the above line termination characters.
  • Depending upon how the electronic text document is rendered, a long line might wrap to two or more additional lines.

I have seen source code files where some lines are terminated with a pair of “carriage return” and “line feed” characters and other lines in the same file are terminated with a single “line feed” character.

I have seen a source code file in which a line contained multiple instances of a single “carriage return” character, not as part of line termination, but embedded in a character string in the source code.

Source code citations must match source code printouts

Often citations in the expert report that refer to lines in an electronic text source code document must be cross-referenced to the printed copy of that document that is used as an exhibit when questioning a deponent or trial witness. Therefore, the citations made by the forensic software expert for inclusion in the expert report must match the line numbers in the printed copy of that document.

For example, the Microsoft Windows tool, Notepad++, is often used to print electronic text documents for use as exhibits when questioning deponents and trial witnesses; see the following post: Source Code Printouts.

By default, Notepad++ will separate lines at every occurrence of:

  • a pair of “carriage return” and “line feed” characters,
  • a single “line feed” character (not preceded by a “carriage return” character), and
  • a single “carriage return” character (not followed by a “line feed” character)

Therefore, in the example I gave above, Notepad++ printed multiple extra lines for that source code file which contained multiple single “carriage return” character. I had to adjust my citations to the lines that followed that string to match what Notepad++ printed, even though the Integrated Development Environment (IDE) I was using showed a single line for the line containing the multiple “carriage return” characters.

In one case, the only “IDE” that was installed on the review computer was Notepad++, so I had no conflict between the tool I used to identify source code citations and the tool I used to generate the source code printouts!

In another case, Notepad++ was not available on the review computer, but Cygwin was, so I had to prepare source code printouts using “cat -n”.

Line termination details

Of the 7 individual Unicode characters and 1 pair of Unicode characters that might be interpreted as line terminators, the default configuration of Notepad++ identifies a new line at every occurrence of a pair of “carriage return” and “line feed” characters, every occurrence of a single “line feed” character (not preceded by a “carriage return” character), and every occurrence of a single “carriage return” character (not followed by a “line feed” character).

Even though you might expect it, Notepad++ does not break a new line at occurrences of the “vertical tab”, “form feed”, “next line”, “line separator”, or “paragraph separator” characters.

LF:	Line Feed, U+000A
VT:	Vertical Tab, U+000B
FF:	Form Feed, U+000C
CR:	Carriage Return, U+000D
NEL:	Next Line, U+0085
LS:	Line Separator, U+2028
PS:	Paragraph Separator, U+2029
CR+LF:	Carriage Return, U+000D followed by Line Feed, U+000A

Here is an example of what Notepad++ prints when a file consists of lines which contain each of the above characters or pair of characters. Note the extra lines for the LF, CR, PS, and CR-LF characters or pair of characters.

Source code line counts must match source code printouts

Another type of analysis that involves lines, counts lines in a file. When counting lines is used to justify damage calculations, a forensic software expert must be exact. 

Even if being inexact only changes your side’s damage calculations by an insignificant amount of dollars, any differences between the lines counted for damage calculations versus the number of lines that the triers-of-fact can see on a printed page will call your processes into question.

The way that tools like Notepad++ identify line endings when printing text files may not match the way that the following batch tools identify line endings: “wc ‑l”, “cat ‑n”, “grep ‑n”, “nl”, “grep ‑c”, “awk’s NR variable”, “sed”, and “cloc”.

For example, most of these batch tools do not consider a single “carriage return” character as a line ending the way the Notepad++ does. Therefore, always cross-check the line counts generated by these tools with the line counts on the printouts of those files.

As another example, “wc -l” will not count the last line if it is not followed by a line termination. In particular, it counts only the number of “line feed” characters in the file, which may not be the number of lines if the file does not use a “line feed” character in its line terminators.

Further, any purpose-built tools which count lines, like python or perl scripts you might write, need to count lines based on the same conventions used by the tool you use to generate the printouts. In particular, python, in its attempt to be cross-platform with the “newline” attribute of its “open” function, allows for different line ending combinations.

Conclusion

While it is easy for the triers-of-fact to identify lines and count lines on source code printouts you generate for exhibits used when questioning a deponent or trial witness, be careful to correlate those lines and line counts with the tools and processes you use to cite to and count those same source code lines.