Skip to content

Conversation

@erik-krogh
Copy link
Contributor

@erik-krogh erik-krogh commented Oct 7, 2020

I recommend trying out the printAst feature (see next section) before reviewing the code.

This PR adds a printAst.ql query that can be used for viewing the AST for JavaScript/TypeScript, HTML, JSON, and YAML.


Recommended usage instructions:

  • Enable AST viewer by adding "codeQL.experimentalAstViewer": true to your settings in VSCode
    (Ctrl + Shift + P: Open Settings (JSON)).
  • Open some JavaScript/TypeScript file from some database (this step is intentionally vague).
  • Open the CodeQL view in VSCode and click the new View AST button in the bottom.
  • Navigate to some Function/InvokeExpr to see what the aggregate nodes look like.
  • Repeat the above with a HTML file containing inline scripts.

The PR was created iteratively by looking at the output from some representative file, improving the PrintAST feature, and repeating.
After a number of iterations I then went through the changed files and added more cases to getAPrimaryQlClass.

I've attempted to implement the getAPrimaryQlClass predicate such that it is useful without being overly specific.
For example: the expressions ++i, i--, etc. are all identified as UpdateExpr, instead of the more specific PreIncExpr, PostDecExpr, etc.).

For most of the nodes in the JavaScript AST the default ASTNode::getChild relation can be used to build a useful child-parent relation in the AST-viewer.
However for ASTNodes like Function/InvokeExpr having all the arguments/parameters/type-arguments as direct children look well in the AST viewer, and special aggregate nodes are used to make that look better (try out the PrintAST feature to see the end result).

The other languages (HTML/JSON/YAML) generally have simpler implementations, but still use aggregate nodes where they make sense.

@github-actions github-actions bot added the JS label Oct 7, 2020
@erik-krogh erik-krogh force-pushed the printAST branch 6 times, most recently from e1766f2 to 157665b Compare October 8, 2020 19:16
@erik-krogh erik-krogh marked this pull request as ready for review October 8, 2020 19:17
@erik-krogh erik-krogh requested a review from a team as a code owner October 8, 2020 19:17
@aeisenberg
Copy link
Contributor

Thanks for taking this on, Erik. It's looking great so far. I played around with it in the extension's AST Viewer and have some comments. To get the AST viewer working in vscode, make sure you add "codeQL.experimentalAstViewer": true to your settings.json.

I'm pretty sure you can see this with any javascript/typescript file, but I used https://lgtm.com/projects/g/sindresorhus/got/ (you can add this through the vscode command CodeQL: Download Database from LGTM.

  • Parameters list source locations seem to include the function body itself (both arrow and traditional functions)
  • For an Import declaration eg import {URL} from 'url';, order is swapped so that the literal url is on top of the import specifier.
  • I see a require statement like this: import Benchmark = require('benchmark'); have a node labeled [???].
  • DeclStatements and have a node label that is difficult to read.
    • It includes whitespace escape chars (\t\n).
    • It includes a little text from the start of the statement and a little text from the end, but not enough of either to make it readable
    • Instead of including escape chars, perhaps you can convert all groups of whitespace into a single space char.
    • But, it might be even more useful if you could extract the variable names being declared and use that as the label of the DeclStatement.
  • I see some situations where the (Arguments) node includes the call itself. There might be several location issues here. It's clear what the problem is if you look at it yourself. I used index.ts starting after the // Benchmarking comment of the got project I linked above.

@aeisenberg
Copy link
Contributor

So far, I've only looked at javascript. Even though the file above uses a .ts extension, there is no typescript syntax in it. Let me play around with HTML, JSON, YAML, and proper typescript as well.

@aeisenberg
Copy link
Contributor

Looking at the .travis.yml file of the got project now. Some comments:

  1. I wonder if we can do better with the naming of (Mapping n) nodes. Could we include the key as part of the label. Eg- (Mapping 0) could become (Mapping 0) Language
  2. I also notice that the mapping nodes have the same source locations as the YAMLScalar nodes. Should the mapping nodes' locations be the entire key value pair?

@aeisenberg
Copy link
Contributor

Looking at the package.json of the same project now.

  1. The labels for JSON objects and arrays are not very readable. They include whitespace escape chars. See the comment about DeclStatements above. I wonder, though, if we can do better. Maybe JSON objects can (and possibly object literals in regular JS) could have a label like { prop: ... } where prop is the first property declared on the object. We'd need to be able to handle computed properties and spread properties, but this is a start (damn...this is complex).
  2. With longer JSONStrings, I see a few chars from the start of the string, then ellipses, then a few chars from the end. I think I'd prefer just seeing the start of the string (though, I could be convinced that what you have now is fine).
  3. I don't see the keys of JSONObjects in the AST. Do we want to keep them?

@aeisenberg
Copy link
Contributor

HTML now.

  1. All of the html tags end with ...</>. I wonder if we could just remove that part and leave the tag name like <script>.

I haven't tried a very complex html file. Nor have I looked at JSX. I'll look at that later.

@aeisenberg
Copy link
Contributor

Looking at JSX https://lgtm.com/projects/g/nukeop/react-ui-cards?mode=list

  • JSX (Body) looks like the source location spans the entire node. I would expect it to only span the location inside the open and close tags. Similar with (Attributes). It spans the entire HTML element, but it should only span the inside of the open tag. And since both of these are synthetic nodes, it is fine if they have no source location at all.
  • I see some JSX Literals that are just whitespace and escape chars. We probably don't want to see all the escape chars. I think having the label simply be [Literal] or [Literal] (whitespace) would be better.
  • There is some oddness around source locations of literals when they are only whitespace. For example, in index.jsx, in the AST Viewer, click on the (Attributes) on line 46. Notice how the selection jumps to the [Literal] \n on line 45. I think this is because there is an overlap between the source locations for the two nodes.

@aeisenberg
Copy link
Contributor

Typescript, Got project from above. Looking at create.ts

  • line 79. The readonly type expression comes after the ArrayExpr in the tree, but before in the text.

@erik-krogh
Copy link
Contributor Author

Parameters list source locations seem to include the function body itself (both arrow and traditional functions)

There isn't a location specific for the parameters, so I had re-used the function location.
Sounds like its better to have no location at all in that case, so I'll do that for arguments/parameters.

For an Import declaration eg import {URL} from 'url';, order is swapped so that the literal url is on top of the import specifier.

I've changed the default sorting order to be location based.
This also fixes your comment about TypeScript.

I see a require statement like this: import Benchmark = require('benchmark'); have a node labeled [???].

Fixed 👍

DeclStatements and have a node label that is difficult to read.
It includes whitespace escape chars (\t\n).

I've replaced all the printed whitespace with just a single space.

I wonder if we can do better with the naming of (Mapping n) nodes. Could we include the key as part of the label. Eg- (Mapping 0) could become (Mapping 0) Language

👍

I also notice that the mapping nodes have the same source locations as the YAMLScalar nodes. Should the mapping nodes' locations be the entire key value pair?

We don't have a location for the entire key-value pair in the database.
I'll remove the location from the (mapping X) print node.

I don't see the keys of JSONObjects in the AST. Do we want to keep them?

There is no QL class corresponding to a JSON object key, so I would prefer not creating a node in the print-ast.

@aeisenberg
Copy link
Contributor

Thanks for addressing my comments. I hope to get a chance to look today.

There is no QL class corresponding to a JSON object key, so I would prefer not creating a node in the print-ast.

That's surprising. It seems like it would be something useful to select. But, I think it's best to follow what the actual AST is doing.

@aeisenberg
Copy link
Contributor

aeisenberg commented Oct 15, 2020

Thanks. Looks a lot better. Though, now I'm finding a bunch of errors being thrown (likely invalid locations, or locations being included when they shouldn't be).

I get errors when I click on JSX (Attributes) and (Body) , yaml (Mapping 0) xxx nodes, and (Parameter) nodes. Error looks like this:

cannot open codeql-zip-archive://0-183/Users/andrew.eisenberg/Library/Application%20Support/Code/User/workspaceStorage/600c0c6f9b1cb74bf3beb8a937563b05/GitHub.vscode-codeql/javascript-3/nukeop_react-ui-cards_c97e7f4/src.zip/. 

Detail: Unable to read file 'codeql-zip-archive://0-183/Users/andrew.eisenberg/Library/Application Support/Code/User/workspaceStorage/600c0c6f9b1cb74bf3beb8a937563b05/GitHub.vscode-codeql/javascript-3/nukeop_react-ui-cards_c97e7f4/src.zip/' 

(EntryIsADirectory (FileSystemError): codeql-zip-archive://0-183/Users/andrew.eisenberg/Library/Application Support/Code/User/workspaceStorage/600c0c6f9b1cb74bf3beb8a937563b05/GitHub.vscode-codeql/javascript-3/nukeop_react-ui-cards_c97e7f4/src.zip/)

The location is malformed since there is no path into the zip file.

@erik-krogh
Copy link
Contributor Author

I get errors when I click on JSX (Attributes) and (Body) , yaml (Mapping 0) xxx nodes, and (Parameter) nodes. Error looks like this:

None of those have a result for getLocation, so I'm not sure what I can do about that.

@aeisenberg
Copy link
Contributor

I'm happy with this implementation, but someone on the JavaScript team should approve.

Copy link
Contributor

@asgerf asgerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just noticed one little thing

exists(ClassDefinition c, ConstructorDeclaration constructor |
constructor = c.getConstructor() and
constructor.isSynthetic() and
el = constructor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be a good idea to render the implicit constructor anyway? It's part of the AST.

asgerf
asgerf previously approved these changes Oct 19, 2020
@codeql-ci codeql-ci merged commit d644a30 into github:main Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants