rights of reproduction on request

Navigation in Large VR Urban Models

Vassilis Bourdakis

Centre for Advanced Studies in Architecture, University of Bath, UK.


The presentation slides are also available here.

Abstract. The aim of this research project is to utilise VR models in urban planning in order to provide easy-to-use visualisation tools that will allow non-experts to understand the implications of proposed changes to their city. In this paper, the navigation problems identified whilst working on large VR city models are discussed and a "fly" based navigation mode is proposed and evaluated.

1. Background

The Centre for Advanced Studies in Architecture (CASA) has been involved in three-dimensional (3D) computer modelling for the last six years. In 1991 CASA received a grant to construct a 3D computer model of Bath [2]. The project was supported by Bath City Council and since its completion the model has been used by the city planners to test the visual impact of a number of proposed developments in the city. The model was created from aerial photographs of the city in 1:1250 scale using a stereo digitiser. It is accurate to 0.5 metre and covers the whole historic city centre, an approximate area of 2.5x3.0 km. During 1995 and using similar techniques, the core of London’s West End covering 1.0X0.5km was also modelled followed by Gloucester city centre in 1997.

Following the hardware and software developments of the last few years, the expansion of the Internet and the World Wide Web (WWW) and current trends in the industry, the Bath and London model were translated and are used non-immersively in VRML as well as custom made VR applications [2][3]. The initial problem faced was how VR applications would scale and adopt to visualising a whole city of over three million triangles as in the case of Bath. The Bath database is, to the author’s knowledge, the largest and most detailed one produced as yet; the UCLA Dept. of Architecture and Urban Design (AUD) is currently building a model of the entire Los Angeles basin covering an area in excess of 10000 square miles as part of "The Virtual World Data Server Project" but it is still under construction.

2. Utilising VR Urban Models

The computer models created in CASA demonstrate how computers will be used in the near future by engineers as part of their everyday practice, creating, modifying and improving our cities online using centrally stored sharable databases. The aim is not to create yet another Geographical Information System (GIS) although GIS features can be incorporated. Using Internet compatible programming languages such as Java, Cobra or ActiveX, VR urban models can replace a dedicated GIS system for certain applications (London’s West End). It should be noted that due to the nature of the proposed use of the models, the low polygon count fully texture mapped model approach adopted by more commercial/advertising oriented projects (Virtual Soma, Virtual LA etc.) was not feasible.

The Bath model has been used in a variety of ways since it was originally constructed. To date, development control has been the main use with a number of schemes being considered. These are normally at the instigation of the local authority who recommend that schemes are modelled in order to facilitate discussions during the design phase and for presentation to the planning committee. In addition to its use in development control, the model has also been used to widen the public debate on how the city should develop in the future.

The model of London’s West End is of lower level of detail and was initially used for transmitters' signal propagation experiments by British Telecom (BT). CASA has been using it as a database front end to navigating and mapping information on the city thus creating the Map of the Future. Gloucester’s city centre model which was commissioned by the Gloucester City Council is used for planning control similarly to the Bath one.

3. Navigation Problems

Early in the process of creating VR models, the problem of navigation was identified together with inconsistencies on texture mapping, instancing, materials, indexing, etc. Although most of the problems were solved [9], navigation remains a troublesome experience not only to occasional users of the VR models but to the creators as well.

The main problem in exploring, navigating and generally working with urban models in a VR environment (both immersive and non-immersive) is orienting oneself; being able to identify areas, streets, buildings etc. Therefore, a great deal of effort has been put towards making urban models more recognisable. It should be noted that there is a great difference between pursuing realism and aiming for a recognisable virtual environment (VE); a VE does not necessarily imitate reality [5]. Creating realistic VE of such scale is still not feasible and in many cases both pointless and inappropriate.

In real life, when a person becomes disoriented in a urban environment, the natural action taken is to scan the surroundings searching for information. As Lynch [14] explains, there is a consistent use and organisation of definite sensory cues from the external environment.

Assuming no external help is requested, this activity comprises of three steps:

Failure of the above three steps means that more drastic ones must be taken. This typically involves interaction with other people asking shop owners, pedestrians, car drivers, etc. for the relevant information. Alternatively, one could check road names against a map, in extreme cases even consult a compass.

3.1. Current Trends

There have been many attempts, both commercial and theoretical, to create and interact with informational spaces [6]. Techniques have been developed based on both "realistic" and "unrealistic" concepts but it seems there is no consensus as yet. Putting aside the information space metaphors used, the navigation metaphors employed can be classified in two main categories; the screen and the world based ones. Among the former is sliding, rolling and examining whereas the latter include walking with or without physical constrains (gravity and collision detection), flying above the landscape and above the roof line utilising a bird’s eye view of the city and borrowing from cinematic terminology, panning and tilting.

Game development is an area that one can review design decisions; a very efficient, demand driven, competitive environment enforcing quick, effective development cycles.

Early 3D games like Doom and Descent, where the focus was on speed and interactivity, featured fairly simple environments. Each stage had an average of 10-20 rooms/spaces with quite different shapes, textures and themes in general. Furthermore, the overall settings for these games were not real 3D; players could not share plan co-ordinates (XY) with different elevations. The cues facilitating navigation were plenty, with the different types of enemies and, in some games, even their corpses adding to the amount of visual information available. Auditory cues were vital in sensing events that took place in secluded areas or simply behind the players' back. In such games a 2D wireframe plan of the site was the only navigation hint given to the players and was usually enough to get them back in course.

In the more recent "action" games like Tomb Raider and Tie Fighter, the following important differences can be identified:

The scale issue addressed with the full avatar presence is the most notable improvement together with the use of perspective and camera movement to follow the action. The above mentioned points can be and in a few cases are already employed in VR urban models’ interface. Information providing with position and orientation of the user visible as well as custom dashboards featuring compasses are possible. Additionally, audio data, animated objects can be incorporated.

3.2. "Flying" Modes of Interaction

In this paper, the focus is on the implications of employing a "flying" based navigation mode. It is argued that due to the lack of sufficient detail at street level, the "identity" and "structure" of the urban image is much "stronger" from an elevated position, when more distant cues are visible. This is especially true in the CASA built models considering the fact that the 3D models were created from mainly aerial information; facade surveys were carried out but the main core of information came from stereo pairs of aerial photographs. The roofline is very accurately represented in geometry and colour without the need for textures or complicated geometrical hierarchies. In many cases, roof level detail had to be eliminated in order to keep an overall balance within the VE.

Among the problems linked to the "fly" mode is that it effectively removes any degree of immersion by switching the person to map reading mode (some will argue that it furthermore defies the reason of having a VE in the first place). Prior knowledge of the city is advantageous whereas there is a distinct lack of sense of time and effort needed to travel over a VE.

Furthermore, according to Ihde [12] there is evidence that there are important connections between the bodily-sensory perception, cultural perceptions and the use of maps as navigational aid. He relates the bird’s eye view of flying over a VE to the God’s eye map projection identified in the literary cultures. This mode of interaction has definite advantages although it also introduces an often-unknown perspective of the city. This perspective is more accessible to engineers and architects who are used to working with scale models of sites which inherently introduce the bird’s eye view. According to Tweed [18], the relationship between flying and walking modes should reflect to orientation of the body position, making flying modes more akin to immersive VR systems.

It should be noted that it is possible to add street level detail in order to improve the amount of information -sensory cues available by adding:

However, more often than not, time and cost constrains prevail (surveying and modelling are both costly and time consuming) not to mention the hardware platform limitations [13]. Adding street furniture, landscaping and animated elements puts the burden on both hardware and software. Frame rate is an issue; from experiments carried out already in CASA, the indications are that 5-6Hz is the lowest acceptable level for such applications assuming the key issues addressed above are catered for. However, frame rate is not the most important variable in urban scale models—a 30Hz VE of an inferior model lacking the key issues is not a satisfactory solution.

It should be noted that urban VE rely very heavily on visual cues in many cases ignoring the fact that aural cues could be more powerful and helpful [7]. This can be credited to the fact that most such models are created and used by architects and engineers in general who are traditionally not taught to understand and appreciate the importance of sound in an environment, even more so in a VE.

3.3. Limitations of Non-Immersive VR

VR models that are used in a team evaluation environment are usually non-immersive. The main reasons are practical; if all the committee members are to be gathered together in one room, the amount of hardware and cables running on the floor, the size of the room and the overall support required is unfeasible. The cost of the necessary hardware to provide a fully immersive experience for half a dozen planners and other committee members is prohibiting. Finally, the need for interaction and communication while evaluating a scheme and the need for a common reference point when assessing a particular feature would be extremely difficult to achieve in an immersive environment [1]. For such tasks, the Reality Centres that Silicon Graphics Inc. has created are ideal; pseudo-immersive double curvature, wide screen displays with a single operator. On the downside, the assessors' team has to travel to the particular site (only a handful in the whole UK) and the costs involved are outside the budget of a City Council or local authority.

Nevertheless, there are important navigation advantages to be obtained on a single user immersive VE as opposed to a multi-user one.

One of the problems identified in early experiments is the lack of concept of time and distance. Walking on the streets of a virtual city is an effortless exercise in contrast to the real experience. It is possible to fly over a whole city model in a matter of seconds and that can be instrumental on loosing both orientation and sense of presence. Recent attempts in immersive VE tried to introduce body interfaces to navigation. Shaw [16] used a bicycle as the metaphor for navigating in his installation The Legible City. This way, the pedalling speed sets the speed of movement whereas the handlebars determine the direction. Another metaphor used with movement tracking devices is that of walking on the spot [17]. The pattern of body movement relates to pre-computed patterns and determines whether the user walks, runs or steps back. Char Davies has created a synthetic environment called "Osmosis" [8] exploring the relationship between exterior nature and interior self where the immersant’s body movements are triggering series of events that position the body within the VE and even alter the environment itself.

Among the first techniques employed in the Bath model in order to improve sensory cues from the environment was the use of wide angle lens; the argument being that this way a greater part of the city is visible and thus more information is transmitted to the user. However, the results of a small scale case study with university students of architecture and other members of staff familiar with the city where discouraging. Architects identified the problem as being that of "wrong perspective" compressing the depth of the image and generally creating a "false" image of the city. Others could not easily identify the problem but found the environment confusing nevertheless. Consequently it was decided to use only the "normal" lens on all VR projects although it does limit the perceived field of view which in real life is much higher than 45 degrees. Experiments on level of detail degradation in the periphery of head mounted displays (HMD) [19] as well as eye movement and feedback [4] demonstrate that more efficient VR interfaces can be achieved (compared to non-immersive ones) without necessarily hitting on the main VR problem; CPU capabilities.

Another problem faced in non-immersive VR is that of the direction of movement versus direction of sight. Due to the two dimensionality of the majority of input devices used in non-immersive VR, it is assumed that the user looks at the exact direction of the movement. This is quite true in most cases, but when one learns and investigates a new environment, movement and direction of viewing should be dealt as two individual variables. Immersive VR headsets with position and orientation tracking mechanisms are again the easiest and more intuitive solution to this problem.

Therefore the Variable Height Navigation Mode (VHNM) is introduced as another solution to the problem.

4. Variable Height Navigation Mode

The VHNM is based on the fact that at any given position in a urban model, there must be a minimum amount of information - sensory cues - from the external environment available to assist navigation. This can be achieved by adding enough information on the model although this is rarely an option as was discussed earlier. Alternatively it can be achieved by varying the height of navigation according to the amount of sensory cues available on any given position.

As an example, a wide long street with a few landmarks will bring the avatar down close to street level, whereas a narrow twisted street in a housing area will force the avatar high above the roof level.

4.1. Theoretical Model

In the real world, the sun, the wind, a river, the sound emitted from a market or a busy street, a church, a building, a street, a road sign, a set of traffic lights are among the sensory cues accessible and used by people. Furthermore, each person gives different importance to each of them making the whole process of classifying and evaluating visual cues much more difficult. In a VE, which is generally much less densely occupied and furnished by the above-described elements, classification is relatively easier. Although the user must be able to rank the relative importance of the various cues available, the designer of the VE can assign with relative safety what is a sensory cue within the VE.

The theoretical model proposed uses a series of gravity like nodes or attractors [7] that pull the avatar towards the ground when approaching an area of high density in sensory cues. Similarly, in low-density areas a global negative gravity pulls the avatar towards the sky until there are enough sensory cues nodes visible to create equilibrium. It should be noted that the attractors are not true gravity nodes since they can only affect the elevation of the avatar and not its position in plan.

The first emerging issue is that of cues’ visibility. In order to identify the visual cues available from each position the field of view and the direction of viewing is considered. Following, a ray-tracing algorithm is employed in order to assess visual contact from the current position. Considering the size of the object, a series of rays must be calculated to establish the percentage visible from the current position.

Relative importance of the visual cues is essential if the model is to be successful. A general ranking of them can be carried out by the world designer based on the size, form, prominence of spatial location, historic value, users awareness, etc. However, the navigation mode proposed should be flexible enough to accommodate particular needs of the users. Locals will use completely different cues (depending on the culture, could be pubs, churches, prominent buildings, etc.) to the first time visitors who will probably opt for the ones described in their guide books and the ones that possess the main landmark characteristics as defined by Lynch [14].

Another variable relates to the amount of visual cues users received over time while navigating in the VE. The higher the amount the better the mental image of the city they have created. According to Lynch [14], sequential series of landmarks where key details trigger specific moves of the visitor is the standard way that people travel through a city (p.83). VHNM keeps track of the number of visible cues, their distance from the users’ path the speed and direction of movement and finally the time each one was visible. All that can recreate the path followed and should enable the prediction of areas of potential problems and modify the current elevation accordingly. In the event of orientation loss, it would be possible to animate back to a position rich in visual cues, a kind of an 'undo' option. The main question in animating back to a recognisable position is whether the user should be dragged backwards (as in a process exactly reverse to the one they followed) or if the viewing direction should be inverted so that a new perspective of the city is recorded. The argument for the former is that the user will be able to directly relate to the actions taken only minutes ago, whereas the latter has the advantage of showing a new view of the city and the disadvantage that the process of memorisation may break down.

Consequently, the VR application can be trained over a period of time and eventually be able to react and adjust to the habits and patterns followed by each user. Keeping track of the time spend on particular areas of the VE will enable the computer adjust the avatar’s position on the assumption that the recognition process and memorisation will strengthen the more time one spends at a particular place within the model.

The tilt of the viewing should be also briefly considered. Walking at ground level and looking straight ahead is what people are used to, but when one is elevated twenty, thirty or more metres above the ground level the perspective distortion introduced by tilting and looking downwards should be considered. As described earlier, an often unknown perspective of the city is introduced and the effects it may have on different users should be carefully evaluated.

Concluding, the theoretical VHNM model proposed will be quite difficult to implement in real time with existing computing power in large VR urban models. It is currently an extremely difficult task to structure the geometric database alone; setting up all the raytracing calculations and the fairly complex set of rules described above will seriously affect the performance of any VR application. However, some problems may be alleviated by reorganising the scenegraph, which conflicts with the general concept of spatial subdivision of large VR models and their organisation in Levels of Detail as discussed extensively elsewhere [2],[3].

Finally, it should be pointed out that the focus of this paper is limited on the visual cues. Experiments are still to be carried out regarding the employment of auditory cues in an urban VE.

Fig. 1. VHNM mesh on the Bath model. Typical view

4.2. Model Implemented

Having described the ideal system an attempt is made to put it into practice. Bearing in mind the difficulties in implementing the original concept, a collision detection based navigation mode is initially proposed (Fig. 1). A transparent and thus invisible mesh is introduced, "floating" above the urban model. The peaks of this mesh are where the minimum amount of sensory cues are, so in a way it is an inverted 3D plot of the sensory cues against the 2D urban plan. Using a 2D device, such as a mouse, it is possible to navigate in the VE whilst collision detection against the invisible mesh determines the elevation of the viewer (Fig. 2).

a. b.

Fig. 2. VHNM views on a. high and b. low visual cue areas

Early in the development stage, it was decided to represent the VHNM mesh as a visible five-metre wireframe grid, giving users a reference point to the height they are and an extra orientation directional grid. Consequently, the VHNM mesh clearly denotes which areas of the model are lacking sensory cues and by creating visible peaks on them, it actually presents cues of its own to aid orientation and exploration. Following extensive testing, it was concluded that for the Bath model, areas rich in sensory cues can be successfully explored at a height of 25 to 35 metres above street level (10 to 15 metres above the buildings' roofline). Regarding areas that proved problematic in the initial navigation experiments, 40 to 50 metres above street level produced good results. It was attempted to keep the height variation to distance ratio as low as possible to avoid user confusion and make navigation and movement smoother. It should be noted that the values mentioned above were satisfactory in the particular urban model but it is unlikely they will be suitable for a high-rise or a very densely built area. Experimentation will provide the right values and most likely a mathematical model will be developed once the VHNM is tested on London's West End and Gloucester City centre models also developed in CASA.

In order to accommodate for the different needs of varying groups of users, the VHNM mesh can be shifted along the height axis (Z) according to the familiarity of the user to the urban model presented (Figure 3c,d) using on screen controls. It is also possible to alter the scale of the mesh along the Z-axis. Doing so, the viewer drops lower closer to the ground on areas rich in sensory cues and flies higher on poor areas (Figure 3a,b). The first experiments carried out using the manually created VHNM meshes were quite successful in reducing the amount of confusion and orientation loss. However, no statistical analysis of the data obtained has been carried out yet, since controlling such an experiment proved extremely difficult. Consequently, most of experimentation was used for fine-tuning the technique and the variables involved. A series of tasks are currently being developed in order to create a more controlled environment enabling drawing of conclusions on the effectiveness of the proposed method and creating a test bed for evaluation of future work.

Among the limitations of this implementation is that the computationally expensive ray-tracing calculations needed to accurately create the mesh, were approximated in advance manually. Consequently, the already complicated and difficult job of the VR application is not stressed any further by having to compute gravity nodes and search for cues in the VE. However, generating this mesh is a very laborious process and difficult to automate. Furthermore, the VHNM mesh completely disregards the avatar’s viewing direction. In most positions within an urban model, the cues available are strongly related to the viewing direction. Looking north may give no clues whatsoever whereas turning 180 degrees and facing south may reveal the position of the sun (at least in the north hemisphere) and a landmark that will let you pinpoint your position on a map. Users have no option to classify the relative importance of the type of cues available according to their own needs. It is a pre-calculated VE that one is only allowed to navigate in, which is conflicting with the general conceptions of the service computers should be providing [15].

The next step will be to incorporate the visual cues versus time variable in the model and investigate to what extend the behaviour of a visitor can be predicted and augmented. Following, auditory cues will be included while work will be carried out into finding ways to simulate more accurately the theoretical model proposed.

a. b.

c. d.

Fig. 3. VHNM Editing options, a. normal, b. scaled up, c. shifted down, d. shifted up

5. Conclusions

The proposed VHNM can solve certain problems in urban VR environments and if implemented fully could be a very successful navigational tool. However, bearing in mind the complexity of the proposed model, an immersive VR environment would supplement a partial implementation of VHNM more economically, in terms of cost, ease of programming and time needed to construct and fine tune the VR system. The implementation problems of the proposed VHNM can be tackled with different degrees of complexity and accuracy according to the capabilities of the VR platform used.


1. Billinghurst, M. and Savage-Carmona, J. (1995). Directive Interfaces for Virtual Environments. P-95-10 HITL University of Washington.

2. Bourdakis, V. and Day, A. (1997) The VRML Model of the City of Bath, Proceedings of the Sixth International EuropIA Conference, europia Productions.

3. Bourdakis, V. (1996) From CAAD to VRML: London Case Study, The 3rd UK VRSIG Conference; Full Paper Proceedings, De Montfort University.

4. Brelstaff, G. (1995) Visual Displays for Virtual Environments - A review in Proceedings of the Framework for Immersive Virtual Environments Working Group.

5. Carr, K. and England, R. (1995) Simulated and Virtual Realities: Elements of Perception. Taylor and Francis, London.

6. Charitos, D. (1996) Defining Existential Space in Virtual Environments in Virtual Reality World96 Proceedings.

7. Charitos, D. and Rutherford, P. (1997) Ways of aiding navigation in VRML worlds Proceedings of the Sixth International EuropIA Conference, europia Productions.

8. Davies, C. and Harrison, J. (1996) Osmose: Towards Broadening the Aesthetics of Virtual Reality, in ACM Computer Graphics: Virtual Reality (Volume 30, Number 4).

9. Day, A., Bourdakis, V. and Robson, J. (1996) Living with a virtual city ARQ, Vol2.

10. Fritze, T. and Riedel, O. (1996) Simulation of complex architectural issues in virtual environments in Virtual Reality World96 Proceedings.

11. Gibson, J.J. (1986) The Ecological Approach to Visual Perception. London

12. Ihde, D (1993) Postphenomenology, North Western University Press, Minnesota.

13. Kaur, K., Maiden, N. and Sutcliffe, A. (1996) Design practice and usability problems with virtual environments in Virtual Reality World96 Proceedings.

14. Lynch, K. (1960) The image of the city MIT Press, Cambridge, Mass.

15. Negroponte, N. (1995) Being Digital Hodder & Stoughton.

16. Shaw, J. (1994) Keeping Fit, @Home Conference, Doors of Perception 2, Netherlands Design Institute, Amsterdam.

17. Slater, M., Usoh, M. and Steed, A. (1995) Taking Steps: The Influence of a Walking Metaphor on Presence in Virtual Reality, ACM Transactions on Computer-Human-Interaction (TOCHI) Vol.2, No3.

18. Tweed, C. (1997) Sedimented Practices of Reading Design Descriptions: from Paper to Screen Proceedings of the Sixth International EuropIA Conference, europia Productions.

19. Watson, B., Walker, N. and Hodges, L.F. (1995) A User Study Evaluating Level of Detail Degradation in the Periphery of Head-Mounted Displays In Framework for Immersive Virtual Environments FIVE’95 Esprit Working Group 9122, QMW University London.

World Wide Web sites mentioned in the text:

VRML2 Spec:

Bath Model:

London W.E.:

Virtual L.A.:

Virtual Soma: