Get all used variable names from a script

As in the check "Check usage of restricted variable names" I want to check the names of variables used in a script, only against our more explicit naming conventions. But using symvar also returns keywords like "function", "if" or "end" and also, what is much worse, any word found in comments and even "-delimited strings. Is there any function that can return me all variable names used in a script file or string, but nothing else?
Or to be a bit more precise, as Stephen Cobeldick correctly hinted to the dynamic execution nature of scripting languages: variable names, that are explicitly used in a function header as input or output variables (not varargin, varargout), and variable names explicitly used as left hand arguments in assignments like a = <some expression> or [a, b] = <expression>. That certainly would be sufficient, as the execution context here is eml, so apart from local variables data flow is pretty much under control with signal i/o and data store memory requiring registration as Stateflow.Data objects.

1 Comment

Stephen23
Stephen23 on 7 May 2021
Edited: Stephen23 on 7 May 2021
"Is there any function that can return me all variable names used in a script file or string, but nothing else?"
No.
Variables can be created dynamically, even by functions called from your script/function (or functions that they call...). Function scope can also change dynamically, so which functions get called can also change (or even deciding if something is a function or a variable). Only actually running the code can resolve this stack: static code analysis is not sufficient.
It might be possible to provide an "estimate" based on static code analysis, but on the understanding that it can diverge from what variables are "used" when the code is actually run.

Sign in to comment.

Answers (1)

Jan
Jan on 7 May 2021
Edited: Jan on 7 May 2021
It is hard to parse the code exhaustively for names of variables:
  • Mask strings and char's. This is not trivial:
'"asd"', '"asd', "'asd'", "'asd", "asd"', 'asd''', ...
  • Recognize and remove comments. This inlcudes block comments between %{ and %} as well as "..." .
  • Distinguish the creation of indexed variables from function calls:
f(1);
f(1) = 0;
v = f(1);
v = f ...
(1);
  • Cope with eval, evalin, assignin
  • If you are talking of scripts instead of functions, it is hard to identify if sum(1:5) means the built-in function or if another script has redefined sum as avariable before.
Maybe the best is to run the code and update a list of variables after each line of code:
function Out = TrackVariables(mFile, Data)
% USAGE:
% If you really want a hardcore debugging:
% 1. TrackVariables('D:\MatlabCodes\yourFcn.m')
% This injects a DBSTOP in each line of the code, which calls the
% function TrackVariables with the output of WHOS as 2nd input.
% You can do this for multiple functions at the same time.
% 2. Call yourFcn() or the main routine.
% After each line the output of WHOS is forwarded to TrackVariables and
% the names are stored persistently. If you want, you can expand this
% to store the sizes or types of the variables also.
% 3. Request the collected data by:
% List = TrackVariables();
% 4. Clean up brutally:
% dbclear all
%
% This is NOT a recommendation for using this function to control the
% quality of code, but a brute hack only. If you can identify a
% miss-spelled variable, it was useful.
% Advantage: It tracks even the evil dynamic creation of variables.
% Limitations: The code execution is slowed down. It tracks only branches
% of the code, which actually run, so this might remain invisible:
% if rand < 0.001; KILLER = 17; end
%
% Use MLINT for a smart code analysis.
%
% (C) 2021, Jan, Heidelberg, License: CC BY-SA 3.0
persistent List
if isempty(List)
List = struct();
end
switch nargin
case 1 % Inject a dbstop in each line:
[~, mName] = fileparts(mFile);
Cmd = sprintf('TrackVariables(''%s'', whos)', mName);
Str = strsplit(fileread(mFile), '\n');
for k = 1:numel(Str)
if ~isempty(Str{k})
dbstop('in', mName, 'at', sprintf('%d', k), 'if', Cmd)
end
end
List.(mName) = {};
case 2 % Called for collecting variables:
List.(mFile) = unique(cat(2, List.(mFile), {data.name}));
Out = false; % Do not stop the debugger
case 0 % Flush the list:
Out = List;
List = [];
end
end
Call this as:
TrackVariables('YourFunc.m');
YourFunc % Or the main program
List = TrackVariables;
This does not consider, if the variable is created in subfunctions or nested functions.
I do not trust such meta-programming techniques. Exhaustive unit-testing is more powerful. Most of all, avoid scripts, if you need reliable code.

6 Comments

Robert
Robert on 7 May 2021
Edited: Robert on 7 May 2021
Jan, thanks for your efforts. Writing my own kind of m-script preprocessor is my plan b, anyway, though rather as mex eating char by char, and not per line, as this makes context specific multiline stuff ('...' or arrays with line breaks, %{-comments) rather complicated.
Only I thought there must be some API providing such information, as Matlab has its internal parser anyway, and not only the whole mlint runs on information available after parsing, but also functionalities like "Check usage of restricted variable names" and other MAB 5.0 checks, so there must be something doing the parsing for you...
P.S.: About using mlint: can I write my own check for mlint? That would really solve the problem!
P.P.S.: Just from the top of your head, do you know, what mlint(..., '-cyc') does with recursive functions? In my case the function is rather shallow with max 3 cascade levels (for, if, if), but mlint returns a 6...
A simple example:
% data-mat file contains variable a == 1 only
% Script file:
load data
function1();
run('script2');
b = a(1) * 3;
disp(b) % What do you get?
function function1
assignin('caller', 'a', @(x) 2)
end
% 2nd script file:
b = a(1) * 3;
Scripts and the dynamic creation of variables are a shot in your knee. There are no magic tools to parse this without running the code. So controlling the naming conventions for scripts is not reliable, while using functions is safe, secure and helps to write efficient code. The possibility of unit testing and re-using established functions is too valuable to work with scripts.
Thanks again, Jan. I edited my question adding a more specific paragraph about what exactly I need from a function. It is roughly what symvar provides, only without the words from all sorts of comments including text following ... and words from ""-strings. Maybe I need to write my preprocessing then and pass the result to symvar. I think for the eml context this should be sufficient then, where stuff like assignin will be prohibited anyway..
I have a function, which is more powerful than SYMVARs, but it depends on multiple further functions for masking Strings, CHARs and comments at first. It is not bullet-proof, e.g. if the indexing is separated by a ... from the variable. Distinguishing function calls from variables with a static text analysis is not reliable also. It fails to handle this correctly: a = cos(1); cos = 2 . Therefore I hesitate to publish the code. Fragile methods are not sufficient to control code stability.
Robert
Robert on 7 May 2021
Edited: Robert on 7 May 2021
Hi Jan, what code do you mean? The C-mex code of my parser to come? I'd tink I'd publish that. But generally my target is to identify any explicit variable name as described in my reply to Stephen Cobeledick's comment above. The object might be any code liable to be typed into an eml-function-block. My parsing-mex should 'mask' (or rather eliminate) all occurrences of comments and strings. Anyways, if there is no means of identifying explicit variable names as described before by some API-function, I'll stick to my own implementaion and will let you know, when I'm at some point of publishing (if you're interested).
I meant a parser, which I have written as M-function.

Sign in to comment.

Categories

Find more on MATLAB in Help Center and File Exchange

Products

Release

R2018b

Asked:

on 7 May 2021

Commented:

Jan
on 8 May 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!