Technical Report / Hochschule Bonn-Rhein-Sieg University of Applied Sciences. Department of Computer Science
Publisher: Dean Prof. Dr. Sascha Alda
Hochschule Bonn-Rhein-Sieg University of Applied Sciences, Department of Computer Science
Sankt Augustin, Germany
ISSN 1869-5272
Hochschule Bonn-Rhein-Sieg University of Applied Sciences, Department of Computer Science
Sankt Augustin, Germany
ISSN 1869-5272
Refine
H-BRS Bibliography
- yes (50)
Departments, institutes and facilities
Document Type
- Report (42)
- Master's Thesis (8)
Year of publication
Has Fulltext
- yes (50)
Keywords
- Robotik (6)
- Cutting sticks-Problem (3)
- Teilsummenaufteilung (3)
- Virtuelle Realität (3)
- 3D-Scanner (2)
- Active Learning (2)
- Computer Vision (2)
- Deep Learning (2)
- Forschungsbericht (2)
- Gravitation (2)
01-2012
In service robotics, tasks without the involvement of objects are barely applicable, like in searching, fetching or delivering tasks. Service robots are supposed to capture efficiently object related information in real world scenes while for instance considering clutter and noise, and also being flexible and scalable to memorize a large set of objects. Besides object perception tasks like object recognition where the object’s identity is analyzed, object categorization is an important visual object perception cue that associates unknown object instances based on their e.g. appearance or shape to a corresponding category. We present a pipeline from the detection of object candidates in a domestic scene over the description to the final shape categorization of detected candidates. In order to detect object related information in cluttered domestic environments an object detection method is proposed that copes with multiple plane and object occurrences like in cluttered scenes with shelves. Further a surface reconstruction method based on Growing Neural Gas (GNG) in combination with a shape distribution-based descriptor is proposed to reflect shape characteristics of object candidates. Beneficial properties provided by the GNG such as smoothing and denoising effects support a stable description of the object candidates which also leads towards a more stable learning of categories. Based on the presented descriptor a dictionary approach combined with a supervised shape learner is presented to learn prediction models of shape categories.
Experimental results, of different shapes related to domestically appearing object shape categories such as cup, can, box, bottle, bowl, plate and ball, are shown. A classification accuracy of about 90% and a sequential execution time of lesser than two seconds for the categorization of an unknown object is achieved which proves the reasonableness of the proposed system design. Additional results are shown towards object tracking and false positive handling to enhance the robustness of the categorization. Also an initial approach towards incremental shape category learning is proposed that learns a new category based on the set of previously learned shape categories.
02-2012
The ability of detecting people has become a crucial subtask, especially in robotic systems which aim an application in public or domestic environments. Robots already provide their services e.g. in real home improvement markets and guide people to a desired product. In such a scenario many robot internal tasks would benefit from the knowledge of knowing the number and positions of people in the vicinity. The navigation for example could treat them as dynamical moving objects and also predict their next motion directions in order to compute a much safer path. Or the robot could specifically approach customers and offer its services. This requires to detect a person or even a group of people in a reasonable range in front of the robot. Challenges of such a real-world task are e.g. changing lightning conditions, a dynamic environment and different people shapes. In this thesis a 3D people detection approach based on point cloud data provided by the Microsoft Kinect is implemented and integrated on mobile service robot. A Top-Down/Bottom-Up segmentation is applied to increase the systems flexibility and provided the capability to the detect people even if they are partially occluded. A feature set is proposed to detect people in various pose configurations and motions using a machine learning technique. The system can detect people up to a distance of 5 meters. The experimental evaluation compared different machine learning techniques and showed that standing people can be detected with a rate of 87.29% and sitting people with 74.94% using a Random Forest classifier. Certain objects caused several false detections. To elimante those a verification is proposed which further evaluates the persons shape in the 2D space. The detection component has been implemented as s sequential (frame rate of 10 Hz) and a parallel application (frame rate of 16 Hz). Finally, the component has been embedded into complete people search task which explorates the environment, find all people and approach each detected person.
03-2013
The objective of this research project is to develop a user-friendly and cost-effective interactive input device that allows intuitive and efficient manipulation of 3D objects (6 DoF) in virtual reality (VR) visualization environments with flat projections walls. During this project, it was planned to develop an extended version of a laser pointer with multiple laser beams arranged in specific patterns. Using stationary cameras observing projections of these patterns from behind the screens, it is planned to develop an algorithm for reconstruction of the emitter’s absolute position and orientation in space. Laser pointer concept is an intuitive way of interaction that would provide user with a familiar, mobile and efficient navigation though a 3D environment. In order to navigate in a 3D world, it is required to know the absolute position (x, y and z position) and orientation (roll, pitch and yaw angles) of the device, a total of 6 degrees of freedom (DoF). Ordinary laser-based pointers when captured on a flat surface with a video camera system and then processed, will only provide x and y coordinates effectively reducing available input to 2 DoF only. In order to overcome this problem, an additional set of multiple (invisible) laser pointers should be used in the pointing device. These laser pointers should be arranged in a way that the projection of their rays will form one fixed dot pattern when intersected with the flat surface of projection screens. Images of such a pattern will be captured via a real-time camera-based system and then processed using mathematical re-projection algorithms. This would allow the reconstruction of the full absolute 3D pose (6 DoF) of the input device. Additionally, multi-user or collaborative work should be supported by the system, would allow several users to interact with a virtual environment at the same time. Possibilities to port processing algorithms into embedded processors or FPGAs will be investigated during this project as well.
04-2020
A Comparative Study of Uncertainty Estimation Methods in Deep Learning Based Classification Models
(2020)
Deep learning models produce overconfident predictions even for misclassified data. This work aims to improve the safety guarantees of software-intensive systems that use deep learning based classification models for decision making by performing comparative evaluation of different uncertainty estimation methods to identify possible misclassifications.
In this work, uncertainty estimation methods applicable to deep learning models are reviewed and those which can be seamlessly integrated to existing deployed deep learning architectures are selected for evaluation. The different uncertainty estimation methods, deep ensembles, test-time data augmentation and Monte Carlo dropout with its variants, are empirically evaluated on two standard datasets (CIFAR-10 and CIFAR-100) and two custom classification datasets (optical inspection and RoboCup@Work dataset). A relative ranking between the methods is provided by evaluating the deep learning classifiers on various aspects such as uncertainty quality, classifier performance and calibration. Standard metrics like entropy, cross-entropy, mutual information, and variance, combined with a rank histogram based method to identify uncertain predictions by thresholding on these metrics, are used to evaluate uncertainty quality.
The results indicate that Monte Carlo dropout combined with test-time data augmentation outperforms all other methods by identifying more than 95% of the misclassifications and representing uncertainty in the highest number of samples in the test set. It also yields a better classifier performance and calibration in terms of higher accuracy and lower Expected Calibration Error (ECE), respectively. A python based uncertainty estimation library for training and real-time uncertainty estimation of deep learning based classification models is also developed.
01-2008
The research of autonomous artificial agents that adapt to and survive in changing, possibly hostile environments, has gained momentum in recent years. Many of such agents incorporate mechanisms to learn and acquire new knowledge from its environment, a feature that becomes fundamental to enable the desired adaptation, and account for the challenges that the environment poses. The issue of how to trigger such learning, however, has not been as thoroughly studied as its significance suggest. The solution explored is based on the use of surprise (the reaction to unexpected events), as the mechanism that triggers learning. This thesis introduces a computational model of surprise that enables the robotic learner to experience surprise and start the acquisition of knowledge to explain it. A measure of surprise that combines elements from information and probability theory, is presented. Such measure offers a response to surprising situations faced by the robot, that is proportional to the degree of unexpectedness of such event. The concepts of short- and long-term memory are investigated as factors that influence the resulting surprise. Short-term memory enables the robot to habituate to new, repeated surprises, and to “forget” about old ones, allowing them to become surprising again. Long-term memory contains knowledge that is known a priori or that has been previously learned by the robot. Such knowledge influences the surprise mechanism, by applying a subsumption principle: if the available knowledge is able to explain the surprising event, suppress any trigger of surprise. The computational model of robotic surprise has been successfully applied to the domain of a robotic learner, specifically one that learns by experimentation. A brief introduction to the context of such application is provided, as well as a discussion on related issues like the relationship of the surprise mechanism with other components of the robot conceptual architecture, the challenges presented by the specific learning paradigm used, and other components of the motivational structure of the agent.
02-2023
Neuromorphic computing aims to mimic the computational principles of the brain in silico and has motivated research into event-based vision and spiking neural networks (SNNs). Event cameras (ECs) capture local, independent changes in brightness, and offer superior power consumption, response latencies, and dynamic ranges compared to frame-based cameras. SNNs replicate neuronal dynamics observed in biological neurons and propagate information in sparse sequences of ”spikes”. Apart from biological fidelity, SNNs have demonstrated potential as an alternative to conventional artificial neural networks (ANNs), such as in reducing energy expenditure and inference time in visual classification. Although potentially beneficial for robotics, the novel event-driven and spike-based paradigms remain scarcely explored outside the domain of aerial robots.
To investigate the utility of brain-inspired sensing and data processing in a robotics application, we developed a neuromorphic approach to real-time, online obstacle avoidance on a manipulator with an onboard camera. Our approach adapts high-level trajectory plans with reactive maneuvers by processing emulated event data in a convolutional SNN, decoding neural activations into avoidance motions, and adjusting plans in a dynamic motion primitive formulation. We conducted simulated and real experiments with a Kinova Gen3 arm performing simple reaching tasks involving static and dynamic obstacles. Our implementation was systematically tuned, validated, and tested in sets of distinct task scenarios, and compared to a non-adaptive baseline through formalized quantitative metrics and qualitative criteria.
The neuromorphic implementation facilitated reliable avoidance of imminent collisions in most scenarios, with 84% and 92% median success rates in simulated and real experiments, where the baseline consistently failed. Adapted trajectories were qualitatively similar to baseline trajectories, indicating low impacts on safety, predictability and smoothness criteria. Among notable properties of the SNN were the correlation of processing time with the magnitude of perceived motions (captured in events) and robustness to different event emulation methods. Preliminary tests with a DAVIS346 EC showed similar performance, validating our experimental event emulation method. These results motivate future efforts to incorporate SNN learning, utilize neuromorphic processors, and target other robot tasks to further explore this approach.
01-2011
This thesis work presents the implementation and validation of image processing problems in hardware to estimate the performance and precision gain. It compares the implementation for the addressed problem on a Field Programmable Gate Array (FPGA) with a software implementation for a General Purpose Processor (GPP) architecture. For both solutions the implementation costs for their development is an important aspect in the validation. The analysis of the flexibility and extendability that can be achieved by a modular implementation for the FPGA design was another major aspect. This work is based upon approaches from previous work, which included the detection of Binary Large OBjects (BLOBs) in static images and continuous video streams [13, 15]. One addressed problem of this work is the tracking of the detected BLOBs in continuous image material. This has been implemented for the FPGA platform and the GPP architecture. Both approaches have been compared with respect to performance and precision. This research project is motivated by the MI6 project of the Computer Vision research group, which is located at the Bonn-Rhein-Sieg University of Applied Sciences. The intent of the MI6 project is the tracking of a user in an immersive environment. The proposed solution is to attach a light emitting device to the user for tracking the created light dots on the projection surface of the immersive environment. Having the center points of those light dots would allow the estimation of the user’s position and orientation. One major issue that makes Computer Vision problems computationally expensive is the high amount of data that has to be processed in real-time. Therefore, one major target for the implementation was to get a processing speed of more than 30 frames per second. This would allow the system to realize feedback to the user in a response time which is faster than the human visual perception. One problem that comes with the idea of using a light emitting device to represent the user, is the precision error. Dependent on the resolution of the tracked projection surface of the immersive environment, a pixel might have a size in cm2. Having a precision error of only a few pixels, might lead to an offset in the estimated user’s position of several cm. In this research work the development and validation of a detection and tracking system for BLOBs on a Cyclone II FPGA from Altera has been realized. The system supports different input devices for the image acquisition and can perform detection and tracking for five to eight BLOBs. A further extension of the design has been evaluated and is possible with some constraints. Additional modules for compressing the image data based on run-length encoding and sub-pixel precision for the computed BLOB center-points have been designed. For the comparison of the FPGA approach for BLOB tracking a similar implementation in software using a multi-threaded approach has been realized. The system can transmit the detection or tracking results on two available communication interfaces, USB and RS232. The analysis of the hardware solution showed a similar precision for the BLOB detection and tracking as the software approach. One problem is the strong increase of the allocated resources when extending the system to process more BLOBs. With one of the applied target platforms, the DE2-70 board from Altera, the BLOB detection could be extended to process up to thirty BLOBs. The implementation of the tracking approach in hardware required much more effort than the software solution. The design of high level problems in hardware for this case are more expensive than the software implementation. The search and match steps in the tracking approach could be realized more efficiently and reliably in software. The additional pre-processing modules for sub-pixel precision and run-length-encoding helped to increase the system’s performance and precision.
03-2015
Der Einsatz von Agentensystemen ist vielfältig, dennoch sind aktuelle Realisierungen lediglich in der Lage primär regelkonformes oder aber „geskriptetes“ Verhalten auch unter Einsatz von randomisierten Verfahren abzubilden. Für eine realistische Repräsentation sind jedoch auch Abweichungen von den Regeln notwendig, die nicht zufällig sondern kontextbedingt auftreten. Im Rahmen dieses Forschungsprojektes wurde ein realitätsnaher Straßenverkehrssimulator realisiert, der mittels eines detailliert definierten Systems für kognitive Agenten auch diese irregulären Verhaltensweisen generiert und somit ein realistisches Verkehrsverhalten für die Verwendung in VR-Anwendungen simuliert. Durch das Erweitern der Agenten mit psychologischen Persönlichkeitsprofilen, basierend auf dem „Fünf-Faktoren-Modell“, zeigen die Agenten individualisierte und gleichzeitig konsistente Verhaltensmuster. Ein dynamisches Emotionsmodell sorgt zusätzlich für eine situationsbedingte Adaption des Verhaltens, z.B. bei langen Wartezeiten. Da die detaillierte Simulation kognitiver Prozesse, der Persönlichkeitseinflüsse und der emotionalen Zustände erhebliche Rechenleistungen verlangt, wurde ein mehrschichtiger Simulationsansatz entwickelt, der es erlaubt den Detailgrad der Berechnung und Darstellung jedes Agenten während der Simulation stufenweise zu verändern, so dass alle im System befindlichen Agenten konsistent simuliert werden können. Im Rahmen diverser Evaluierungsiterationen in einer bestehenden VR-Anwendung – dem FIVIS-Fahrradfahrsimulator des Antragstellers - konnte eindrucksvoll nachgewiesen werden, dass die realisierten Konzepte die ursprünglich formulierten Forschungsfragestellung überzeugend und effizient lösen.
05-2020
The ability to finely segment different instances of various objects in an environment forms a critical tool in the perception tool-box of any autonomous agent. Traditionally instance segmentation is treated as a multi-label pixel-wise classification problem. This formulation has resulted in networks that are capable of producing high-quality instance masks but are extremely slow for real-world usage, especially on platforms with limited computational capabilities. This thesis investigates an alternate regression-based formulation of instance segmentation to achieve a good trade-off between mask precision and run-time. Particularly the instance masks are parameterized and a CNN is trained to regress to these parameters, analogous to bounding box regression performed by an object detection network.
In this investigation, the instance segmentation masks in the Cityscape dataset are approximated using irregular octagons and an existing object detector network (i.e., SqueezeDet) is modified to regresses to the parameters of these octagonal approximations. The resulting network is referred to as SqueezeDetOcta. At the image boundaries, object instances are only partially visible. Due to the convolutional nature of most object detection networks, special handling of the boundary adhering object instances is warranted. However, the current object detection techniques seem to be unaffected by this and handle all the object instances alike. To this end, this work proposes selectively learning only partial, untainted parameters of the bounding box approximation of the boundary adhering object instances. Anchor-based object detection networks like SqueezeDet and YOLOv2 have a discrepancy between the ground-truth encoding/decoding scheme and the coordinate space used for clustering, to generate the prior anchor shapes. To resolve this disagreement, this work proposes clustering in a space defined by two coordinate axes representing the natural log transformations of the width and height of the ground-truth bounding boxes.
When both SqueezeDet and SqueezeDetOcta were trained from scratch, SqueezeDetOcta lagged behind the SqueezeDet network by a massive ≈ 6.19 mAP. Further analysis revealed that the sparsity of the annotated data was the reason for this lackluster performance of the SqueezeDetOcta network. To mitigate this issue transfer-learning was used to fine-tune the SqueezeDetOcta network starting from the trained weights of the SqueezeDet network. When all the layers of the SqueezeDetOcta were fine-tuned, it outperformed the SqueezeDet network paired with logarithmically extracted anchors by ≈ 0.77 mAP. In addition to this, the forward pass latencies of both SqueezeDet and SqueezeDetOcta are close to ≈ 19ms. Boundary adhesion considerations, during training, resulted in an improvement of ≈ 2.62 mAP of the baseline SqueezeDet network. A SqueezeDet network paired with logarithmically extracted anchors improved the performance of the baseline SqueezeDet network by ≈ 1.85 mAP.
In summary, this work demonstrates that if given sufficient fine instance annotated data, an existing object detection network can be modified to predict much finer approximations (i.e., irregular octagons) of the instance annotations, whilst having the same forward pass latency as that of the bounding box predicting network. The results justify the merits of logarithmically extracted anchors to boost the performance of any anchor-based object detection network. The results also showed that the special handling of image boundary adhering object instances produces more performant object detectors.
05-2017
Das Cutting sticks-Problem ist in seiner allgemeinen Formulierung ein NP-vollständiges Problem mit Anwendungspotenzialen im Bereich der Logistik. Unter der Annahme, dass P ungleich NP (P != NP) ist, existieren keine effizienten, d.h. polynomiellen Algorithmen zur Lösung des allgemeinen Problems.
In diesem Papier werden Ansätze aufgezeigt, mit denen bestimmte Instanzen des Problems effizient berechnet werden können. Für die Berechnung wichtige Parameter werden charakterisiert und deren Beziehung untereinander analysiert.